## ATAC-seq and ASoC

### Background and Significance
Abundant common disease risk variants have been found in various Genome-wide association studies (GWAS) over the past ten years. Most of these variants locate in non-coding regions of human genome, which are likely to be regulatory elements/sequences, such as introns, enhancers and promoters. Functional interpretation for most of these non-coding variants remains unanswered, and it is still challenging to even identify them.

Chromatin's default tight coiling structure limits its accessibility. As a result, gene expression only happens when chromatin is in "opening" state. A new genomic technique, Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq), is widely used in recent years to access sequence on open chromatin, which can be used to infer regions of increased accessibility as well as to map regions of transcription factor binding sites and nucleosome positions (wiki). ATAC-seq takes advantage of next-generation sequencing (NGS) technology - the number of reads of a sequence is mainly determined by the extent of opening status of the chromatin region at a single nucleotide resolution.

The fact that open chromatin regions (OCRs) often overlap with regulatory sequence (5,6) can help detect functional non-coding risk variants for some neuropsychiatric disorders, such as schizophrenia, in induced pluripotent stem cells (iPSC)-derived neurons from adult.

Another promising approach to overcome the challenge is to focus on allele-specific open chromatin (ASoC) variants, characterized by allelic imbalance in sequencing reads at heterozygous single nucleotide polymorphism (SNP) sites. They can be mapped by comparing the chromatin accessibilities of both alleles (SNPs) in the same individual. The main advantage of ASoC mapping compared to another commonly used approach, expression quantitative trait loci (eQTL), lies in its direct identification of putative functional disease variants.

### Aim
Schizophrenia is a type of mental disorder, typically starting to develop in late adolescence or early adulthood. Schizophrenia can be severe due to its cognitive impairment, difficulty in social activities, and even potential to cause disability. Children may also develop schizophrenia. The prevalence of schizophrenia ranges between 0.25% and 0.64% in the United States. (https://www.nimh.nih.gov/health/statistics/schizophrenia.shtml)

The goal of this proposal is to explore the enrichment of schizophrenia GWAS signals, as well as other neurodevelopmental disorders and common phenotypes for comparison, in OCRs and ASoC SNPs obtained in experiments, and to precisely identify causal variants among ASoC variants associated with schizophrenia, then investigate the mechanism of non-coding regulatory elements.

### Data
1). A set of 20 iPSC lines used open chromatin mapping are reprogrammed by collaborators. The cell lines then differentiated into neural progenitor cells (NPC), subsequently to glutamatergic (iN-Glut), GABAergic (iN-GA), and dopaminergic (iN-DN) neurons. ATAC-seq is utilized to call OCR peaks (FDR<0.05) for each cell type. The median length for all 5 types of neuron cells is 335 base pair (bp), and all OCR peaks cover about 4% of the whole human genome.

2). ASoC SNPs are obtained by testing heterozygous SNPs that showed allelic imbalance in ATAC-seq reads in iN-Glut and NPC neurons.

3). Schizophrenia GWAS risk variants from Psychiatric Genomics Consortium (PGC).

### Method
In order to perform SNP-based enrichment analyses, specifically, to explore whether schizophrenia associated signals (SNPs) from GWAS are enriched in certain types of functional genomic annotations, including 2 types of ASoC SNPs, ATAC-seq peaks called from 5 neuron cell types and commonly used functional regions, such as promoters, codings and introns, we will apply TORUS (34), a tool based on a Bayesian hierarchical model, for this purpose. We hypothesize that ASoC variants are even more enriched for functional disease variants.

Ripke et al. (15) identified 108 genome-wide significant schizophrenia loci in 2014. We will employ SuSiE, a newly developed Bayesian variable selection and genetic fine-mapping software package, to determine each SNP of being causal based on Posterior Inclusion Probability (PIP) over the 108 schizophrenia significant loci, incorporated with external linkage disequilibrium (LD) from the 1000 Genomes project and prior inclusion probabilities as functions of genomic features for each SNP.

## CNV

### Background and Significance
Genome-wide association studies (GWAS) 

Copy number variations (CNVs) are a type of structural variation of an organism's chromosome. They are large genomic insertion or deletion events, often spread from thousands??? to kilo- or even mega-bases.

### Aim

### Data

### Method