In this notebook, we process the results of cicero analysis to get active promoter/enhancer DNA peaks.
First, we pick up peaks around the transcription starting site (TSS).
Second, we merge cicero data with the peaks around TSS.
Then we remove peaks that have a weak connection to TSS peak so that the final product includes TSS peaks and peaks that have a strong connection with the TSS peaks. We use this information as an active promoter/enhancer elements.

# 0. Import libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns


import os, sys, shutil, importlib, glob
from tqdm.notebook import tqdm

from celloracle import motif_analysis as ma

In [2]:
%config InlineBackend.figure_format = 'retina'

plt.rcParams['figure.figsize'] = [6, 4.5]
plt.rcParams["savefig.dpi"] = 300

# 1. Load data made with cicero

In [3]:
# Load all peaks
peaks = pd.read_csv("cicero_output/all_peaks.csv", index_col=0)
peaks = peaks.x.values
peaks

array(['chr1_3094484_3095479', 'chr1_3113499_3113979',
       'chr1_3119478_3121690', ..., 'chrY_90804622_90805450',
       'chrY_90808626_90809117', 'chrY_90810560_90811167'], dtype=object)

In [4]:
# Load cicero results
cicero_connections = pd.read_csv("cicero_output/cicero_connections.csv", index_col=0)
cicero_connections.head()

  mask |= (ar1 == a)


Unnamed: 0,Peak1,Peak2,coaccess
2,chr1_3094484_3095479,chr1_3113499_3113979,-0.316289
3,chr1_3094484_3095479,chr1_3119478_3121690,-0.419241
4,chr1_3094484_3095479,chr1_3399730_3400368,-0.050867
5,chr1_3113499_3113979,chr1_3094484_3095479,-0.316289
7,chr1_3113499_3113979,chr1_3119478_3121690,0.370343


# 2. Make TSS annotation
## IMPORTANT: Please make sure that you are setting correct reference genoms.
 If your scATAC-seq data was generated with mm10 reference genome, you can set ref_genome="mm10".
 If you used hg19 human reference genome, please set ref_genome=="hg19"
 
 Currently we support refgenomes below.
{"Human": ['hg38', 'hg19'], 
 "Mouse": ['mm10', 'mm9'], 
 "S.cerevisiae": ["sacCer2", "sacCer3"]}
 
 If your reference genome is not in the list, please send a request through github issue page.

In [5]:

tss_annotated = ma.get_tss_info(peak_str_list=peaks, ref_genome= ) ##!! Set reference genome here


# Check results
tss_annotated.tail()

que bed peaks: 72402
tss peaks in que: 16987


Unnamed: 0,chr,start,end,gene_short_name,strand
16982,chr1,55130650,55132118,Mob4,+
16983,chr6,94499875,94500767,Slc25a26,+
16984,chr19,45659222,45660823,Fbxw4,-
16985,chr12,100898848,100899597,Gpr68,-
16986,chr4,129491262,129492047,Fam229a,-


# 3. Integrate TSS info and cicero connections
The output file after the integration process has three columns; "peak_id", "gene_short_name", and "coaccess".
"peak_id" is either the TSS peak or the peaks that have a connection with the TSS peak.
"gene_short_name" is the gene name that associated with the TSS site. 
"coaccess" is the co-access score between a peak and TSS peak. Note, the TSS peak is indicated by a score of 1.

In [8]:
integrated = ma.integrate_tss_peak_with_cicero(tss_peak=tss_annotated, 
                                               cicero_connections=cicero_connections)
print(integrated.shape)
integrated.head()

(263279, 3)


Unnamed: 0,peak_id,gene_short_name,coaccess
0,chr10_100015291_100017830,Kitl,1.0
1,chr10_100018677_100020384,Kitl,0.086299
2,chr10_100050858_100051762,Kitl,0.034558
3,chr10_100052829_100053395,Kitl,0.167188
4,chr10_100128086_100128882,Tmtc3,0.022341


# 4. Filter peaks
Remove peaks that have weak coaccess score.

In [9]:
peak = integrated[integrated.coaccess >= 0.8]
peak = peak[["peak_id", "gene_short_name"]].reset_index(drop=True)

In [10]:
print(peak.shape)
peak.head()

(15680, 2)


Unnamed: 0,peak_id,gene_short_name
0,chr10_100015291_100017830,Kitl
1,chr10_100486534_100488209,Tmtc3
2,chr10_100588641_100589556,4930430F08Rik
3,chr10_100741247_100742505,Gm35722
4,chr10_101681379_101682124,Mgat4c


# 5. Save data
Save the promoter/enhancer peak.

In [11]:
peak.to_parquet("peak_file.parquet")

-> go to next notebook