<a href="https://colab.research.google.com/github/oliverartz/ChIP_Rinf_Dawlaty/blob/main/20211030_ChIPseq_Rinf_Meelad.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Rinf ChIP-seq analysis
This notebook is to recapitualte the data analysis conducted in a recent publication from the Dawlaty Lab at Albert Einstein in New York. The publication is entitled "Rinf Regulates Pluripotency Network Genes and Tet
Enzymes in Embryonic Stem Cells" (Ravichandran et al., 2019) and was published Cell Reports. 

https://doi.org/10.1016/j.celrep.2019.07.080

All data and analysis tools are publicly available.

##Mount google drive
Since the storage of the Colab runtimes is rather limited, I need to use storage from my google drive.

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


##Download data from GEO

In [None]:
### install conda ###
!wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
!chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh
!bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

!rm Miniconda3-py37_4.8.2-Linux-x86_64.sh

### install SRA toolkit ###
!conda install -y -c bioconda sra-tools

In [None]:
%%bash
### Takes some time to download (10 - 35 min) ###
### download ChIP-seq data ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad
SRR_accessions="SRR9165578 SRR9165579 SRR9165580 SRR9165581 SRR9165582"

for i in $SRR_accessions; do
    fastq-dump $i
done

In [None]:
%%bash
### rename and move samples ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad
mv SRR9165578.fastq IP_WT_1.fastq
mv SRR9165579.fastq IP_WT_2.fastq
mv SRR9165580.fastq IP_KO.fastq
mv SRR9165581.fastq inp_WT.fastq
mv SRR9165582.fastq inp_KO.fastq

mkdir fastq
mv *.fastq fastq

##Quality Control

##Read Mapping

In [None]:
### install bowtie2 ###
!conda config --set channel_priority stric
!conda install -y -c bioconda bowtie2

In [None]:
%%bash
### download pre-build mm10 genome index ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad
mkdir bowtie2_genome_index
cd bowtie2_genome_index
wget https://genome-idx.s3.amazonaws.com/bt/mm10.zip
unzip mm10.zip

rm mm10.zip

In [None]:
%%bash
### takes >5 h to run ###
### read alignment ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/bowtie2_genome_index

bowtie2 -p 8 -x mm10 -U ../fastq/IP_KO.fastq -S IP_KO.sam
bowtie2 -p 8 -x mm10 -U ../fastq/IP_WT_1.fastq -S IP_WT_1.sam
bowtie2 -p 8 -x mm10 -U ../fastq/IP_WT_2.fastq -S IP_WT_2.sam
bowtie2 -p 8 -x mm10 -U ../fastq/WT_inp.fastq -S inp_WT.sam
bowtie2 -p 8 -x mm10 -U ../fastq/KO_inp.fastq -S inp_KO.sam

In [None]:
%%bash
### move aligned reads to new folder ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/
mkdir aligned_reads

cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/bowtie2_genome_index/
mv IP_KO.sam ../aligned_reads/IP_KO.sam
mv IP_WT_1.sam ../aligned_reads/IP_WT_1.sam
mv IP_WT_2.sam ../aligned_reads/IP_WT_2.sam
mv WT_inp.sam ../aligned_reads/inp_WT.sam
mv KO_inp.sam ../aligned_reads/inp_KO.sam

##Filtering bad quality reads and PCR duplicates, convert to bam

In [None]:
### install samtools ###
!conda install -y -c bioconda samtools

In [None]:
%%bash
### takes 30 min to run ###
### filter bad quality reads ###
quality_cutoff=20

cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/aligned_reads

samtools view -bS -q quality_cutoff IP_KO.sam > IP_KO_hiq.bam
samtools view -bS -q quality_cutoff IP_WT_1.sam > IP_WT_1_hiq.bam
samtools view -bS -q quality_cutoff IP_WT_2.sam > IP_WT_2_hiq.bam
samtools view -bS -q quality_cutoff inp_KO.sam > inp_KO_hiq.bam
samtools view -bS -q quality_cutoff inp_WT.sam > inp_WT_hiq.bam

ls -lh *hiq.bam

In [None]:
%%bash
### remove PCR duplicates ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/aligned_reads
samtools rmdup -S IP_KO_hiq.bam IP_KO_filt.bam
samtools rmdup -S IP_WT_1_hiq.bam IP_WT_1_filt.bam
samtools rmdup -S IP_WT_2_hiq.bam IP_WT_2_filt.bam
samtools rmdup -S inp_KO_hiq.bam inp_KO_filt.bam
samtools rmdup -S inp_WT_hiq.bam inp_WT_filt.bam

ls -lh *filt.bam

In [16]:
%%bash
### takes 10 min to run ###
### merge bam for WT ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/aligned_reads
samtools merge IP_WT_filt_merge.bam IP_WT_1_filt.bam IP_WT_2_filt.bam

In [None]:
%%bash
### sort and index ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/aligned_reads
samtools sort IP_WT_filt_merge.bam -o IP_WT_filt_merge_sorted.bam
samtools index IP_WT_filt_merge_sorted.bam

##Generating bedgraph for plotting

In [None]:
%%bash
### install bedtools ###
conda install -y -c bioconda bedtools

In [None]:
%%bash
### make genome file for bedtools ###
conda install -y -c bioconda ucsc-fetchchromsizes
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/genome
fetchChromSizes mm10 > mm10.chrom.sizes

In [None]:
%%bash
### takes 35 min to run ###
### generate bedgraph ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/aligned_reads
genomeCoverageBed -bg -ibam IP_WT_filt_merge_sorted.bam -g ../genome/mm10.chrom.sizes > IP_WT_filt_merge_sorted.bedgraph

##Peak calling

In [None]:
### install MACS2 ###
!conda config --set channel_priority stric
!conda install -y -c bioconda macs2

In [None]:
%%bash
### takes 15 min to run ###
### peak calling ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/aligned_reads
mkdir ../MACS2_peaks

### on filtered BAM files ###
macs2 callpeak -t IP_KO_filt.bam -c inp_KO_filt.bam -f BAM -g mm --outdir ../MACS2_peaks/KO_filtered -n KO
macs2 callpeak -t IP_WT_2_filt.bam -c inp_WT_filt.bam -f BAM -g mm --outdir ../MACS2_peaks/WT_merge_filtered -n WT_merge

##Downstream analysis of peaks and quality control

Please see separate R notebook for more analyses and results.

###Motif analysis

In [None]:
%%bash
### install HOMER and install mm10 ###
conda install -y -c bioconda homer
perl /usr/local/share/homer/.//configureHomer.pl -install mm10

In [None]:
%%bash
### run HOMER ###
cd /content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/MACS2_peaks/
mkdir ../HOMER_motifs

findMotifsGenome.pl WT_merge_filtered/WT_merge_peaks.narrowPeak mm10 ../HOMER_motifs

In [8]:
import IPython
IPython.display.HTML(filename='/content/drive/MyDrive/Colab_data/20211030_ChIP_Meelad/HOMER_motifs/knownResults.html')

0,1,2,3,4,5,6,7,8,9,10,11
Rank,Motif,Name,P-value,log P-pvalue,q-value (Benjamini),# Target Sequences with Motif,% of Targets Sequences with Motif,# Background Sequences with Motif,% of Background Sequences with Motif,Motif File,SVG
1,T  C  A  G  C  G  T  A  A  G  T  C  A  G  C  T  C  G  T  A  A  G  T  C  C  T  G  A  C  G  T  A  A  G  T  C  G  C  A  T  A  G  T  C  A  G  T  C  A  G  T  C  C  T  G  A  A  C  T  G  T  G  C  A  T  C  G  A  C  A  T  G  A  T  C  G  G  A  T  C,Ronin(THAP)/ES-Thap11-ChIP-Seq(GSE51522)/Homer,1e-456,-1.051e+03,0.0000,439.0,12.99%,210.1,0.49%,motif file (matrix),svg
2,T  C  G  A  C  G  T  A  A  G  T  C  A  G  C  T  C  G  T  A  A  G  T  C  T  C  G  A  G  C  T  A  G  A  C  T  C  G  A  T  A  G  T  C  A  G  T  C  A  G  T  C  C  T  G  A  T  C  A  G  T  G  C  A  T  C  G  A  C  A  G  T  A  T  C  G  A  G  T  C,"GFY-Staf(?,Zf)/Promoter/Homer",1e-400,-9.214e+02,0.0000,459.0,13.58%,329.5,0.76%,motif file (matrix),svg
3,T  G  C  A  T  A  G  C  G  A  C  T  T  G  C  A  T  G  A  C  T  G  C  A  C  G  T  A  A  G  C  T  A  G  C  T  A  G  T  C  A  G  T  C  G  T  A  C,GFY(?)/Promoter/Homer,1e-368,-8.493e+02,0.0000,434.0,12.84%,327.2,0.76%,motif file (matrix),svg
4,G  C  T  A  G  C  A  T  G  A  C  T  G  C  A  T  T  C  A  G  G  T  A  C  G  C  T  A  G  C  A  T  C  T  G  A  G  C  T  A  T  A  G  C  G  C  T  A  C  T  G  A  C  G  A  T  C  T  A  G,"OCT4-SOX2-TCF-NANOG(POU,Homeobox,HMG)/mES-Oct4-ChIP-Seq(GSE11431)/Homer",1e-299,-6.896e+02,0.0000,457.0,13.52%,558.4,1.29%,motif file (matrix),svg
5,A  T  G  C  A  G  T  C  G  A  T  C  C  G  T  A  A  C  G  T  A  C  G  T  A  C  T  G  A  C  G  T  A  G  C  T  G  A  T  C,Sox2(HMG)/mES-Sox2-ChIP-Seq(GSE11431)/Homer,1e-124,-2.869e+02,0.0000,703.0,20.80%,3360.2,7.79%,motif file (matrix),svg
6,T  G  C  A  G  C  A  T  A  G  C  T  G  C  A  T  A  G  T  C  A  G  T  C  A  G  T  C  C  T  G  A  A  C  T  G  T  C  G  A  T  C  G  A  C  A  G  T  A  T  C  G  A  G  T  C  G  A  T  C,ZNF143|STAF(Zf)/CUTLL-ZNF143-ChIP-Seq(GSE29600)/Homer,1e-123,-2.838e+02,0.0000,440.0,13.02%,1476.2,3.42%,motif file (matrix),svg
7,C  G  T  A  A  C  G  T  A  G  C  T  C  G  A  T  C  T  A  G  G  T  A  C  C  G  T  A  A  G  C  T  C  G  T  A  G  C  T  A,"Oct4(POU,Homeobox)/mES-Oct4-ChIP-Seq(GSE11431)/Homer",1e-114,-2.627e+02,0.0000,411.0,12.16%,1385.1,3.21%,motif file (matrix),svg
8,A  T  G  C  G  A  T  C  C  G  T  A  A  G  C  T  C  A  G  T  A  T  C  G  G  C  A  T  A  G  C  T  G  A  C  T  A  C  T  G,Sox17(HMG)/Endoderm-Sox17-ChIP-Seq(GSE61475)/Homer,1e-112,-2.584e+02,0.0000,551.0,16.30%,2375.1,5.51%,motif file (matrix),svg
9,A  T  G  C  G  A  T  C  C  G  A  T  A  C  G  T  A  C  G  T  A  C  T  G  C  A  G  T  A  G  C  T,Sox3(HMG)/NPC-Sox3-ChIP-Seq(GSE33059)/Homer,1e-110,-2.549e+02,0.0000,1028.0,30.41%,6520.8,15.12%,motif file (matrix),svg
