# IP-seq bioinformatics analysis human samples

## Data acquisition (hPSCs)

The data were retrived from the article: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6974403/ - 'Data and materials availability' section. (Bioproject ID: PRJNA474076 - https://www.ebi.ac.uk/ena/browser/view/PRJNA474076?show=reads. Then press on the 'Sample Accession' column and see what you need.) \
We need to download these reads:
1. SAMN09291345 (Sample accession) corresponds to **S9.6 DRIP R2** - _SRR7820745_ (Run accession (it corresponds to raw data filename))
2. SAMN09291344 - **S9.6 DRIP R1** - _SRR7820746_ 
3. SAMN12283116 - **IGG Control R1** - _SRR9693734_.

In [None]:
# S9.6 DRIP R1 (replicate 1)
# paired-end reads:
# _1
!wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR782/006/SRR7820746/SRR7820746_1.fastq.gz \
-P hpscs/
# _2
!wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR782/006/SRR7820746/SRR7820746_2.fastq.gz \
-P hpscs/

In [None]:
# S9.6 DRIP R2 (replicate 2)
# paired-end reads:
# _1
!wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR782/005/SRR7820745/SRR7820745_1.fastq.gz \
-P hpscs/
# _2
!wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR782/005/SRR7820745/SRR7820745_2.fastq.gz \
-P hpscs/

In [None]:
# IGG Control R1 (replicate 1)
# _1
!wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR969/004/SRR9693734/SRR9693734_1.fastq.gz \
-P hpscs/
# _2
!wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR969/004/SRR9693734/SRR9693734_2.fastq.gz \
-P hpscs/

## Data acquisition (HAP1 (WT, KO 2B1, KO YF2) lab samples)

HAP1 cells are a cell line derived from the KBM-7 cell line. KBM-7 was found in a patient with chronic myeloid leukemia (CML) (a group of blood cancers). \
HAP1 set contains the 3 subsets: WT, KO (knockout) HNRNPA2B1 and KO YTHDF2 samples. \
The samples can be retrieved from: .. TODO1.

## Quality control (QC) of data

In [None]:
# launch fastqc of hPSCs data
!fastqc hpscs/*.fastq.gz -o hpscs/fastqc_outputs

In [None]:
# launch fastqc of HAP1 data
!fastqc hap1_samples/*.fastq -o hap1_samples/fastqc_outputs

## Data trimming

In [None]:
# HAP1 KO 2B1
# 1 - Control
!trim_galore \
hap1_samples/KO2B1_Control.fastq \
-o hap1_samples/trimmed_files \
--fastqc \
--length 25 \
--clip_R1 3 \
--three_prime_clip_R1 3

In [None]:
# HAP1 KO YF2
# 1 - Control
!trim_galore \
hap1_samples/KOYF2_Control.fastq \
-o hap1_samples/trimmed_files \
--fastqc \
--length 25 \
--clip_R1 3 \
--three_prime_clip_R1 3

## Data mapping to reference genome

In [None]:
# hPSCs
# 1 - S9.6 DRIP R1
!bowtie2 -p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-1 hpscs/SRR7820746_1.fastq.gz \
-2 hpscs/SRR7820746_2.fastq.gz \
-S hpscs/aligned_reads/S9.6_DRIP_R1.sam

In [None]:
# hPSCs
# 2 - S9.6 DRIP R2
!bowtie2 -p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-1 hpscs/SRR7820745_1.fastq.gz \
-2 hpscs/SRR7820745_2.fastq.gz \
-S hpscs/aligned_reads/S9.6_DRIP_R2.sam

In [None]:
# hPSCs
# 3 - IGG Control R1
!bowtie2 -p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-1 hpscs/SRR9693734_1.fastq.gz \
-2 hpscs/SRR9693734_2.fastq.gz \
-S hpscs/aligned_reads/Control_R1.sam

In [None]:
# HAP1 WT
# p.s. should be 'gzip *.fastq' at first for all the .fastq samples
# 1 - S96_R1
!bowtie2 \
-p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-U hap1_samples/S96_WT_1.fastq.gz \
-S hap1_samples/aligned_files/S96_WT_1_aln_unsorted.sam

In [None]:
# HAP1 WT
# 2 - S96_R2
!bowtie2 \
-p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-U hap1_samples/S96_WT_2.fastq.gz \
-S hap1_samples/aligned_files/S96_WT_2_aln_unsorted.sam

In [None]:
# HAP1 WT
# 3 - WT_Control
!bowtie2 -p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-U hap1_samples/WT_Control.fastq.gz \
-S hap1_samples/aligned_files/WT_Control_aln_unsorted.sam

In [None]:
# HAP1 KO 2B1
# 1 - S96_KOA2B1_1
!bowtie2 \
-p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-U hap1_samples/S96_KOA2B1_1.fastq.gz \
-S hap1_samples/aligned_files/S96_KOA2B1_1_aln_unsorted.sam

In [None]:
# HAP1 KO 2B1
# 2 - S96_KOA2B1_2
!bowtie2 \
-p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-U hap1_samples/S96_KOA2B1_2.fastq.gz \
-S hap1_samples/aligned_files/S96_KOA2B1_2_aln_unsorted.sam

In [None]:
# HAP1 KO 2B1
# 3 - Control
!bowtie2 \
-p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-U hap1_samples/trimmed_files/KO2B1_Control_trimmed.fq.gz \
-S hap1_samples/aligned_files/KO2B1_Control_aln_unsorted.sam

In [None]:
# HAP1 KO YF2
# 1 - S96_KOYF2_1
!bowtie2 \
-p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-U hap1_samples/S96_KOYF2_1.fastq.gz \
-S hap1_samples/aligned_files/S96_KOYF2_1_aln_unsorted.sam

In [None]:
# HAP1 KO YF2
# 2 - S96_KOYF2_2
!bowtie2 \
-p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-U hap1_samples/S96_KOYF2_2.fastq.gz \
-S hap1_samples/aligned_files/S96_KOYF2_2_aln_unsorted.sam

In [None]:
# HAP1 KO YF2
# 3 - Control
!bowtie2 \
-p 8 -q --local -x Homo_sapiens_GRCh38_assembly_110 \
-U hap1_samples/trimmed_files/KOYF2_Control_trimmed.fq.gz \
-S hap1_samples/aligned_files/KOYF2_Control_aln_unsorted.sam

## Reads filtering

In [None]:
# hPSCs
# 1 - S9.6 DRIP R1
!samtools view -h -S -b \
-o hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R1.bam \
hpscs/aligned_reads/S9.6_DRIP_R1.sam

In [None]:
# hPSCs
# 2 - S9.6 DRIP R2
!samtools view -h -S -b \
-o hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R2.bam \
hpscs/aligned_reads/S9.6_DRIP_R2.sam

In [None]:
# hPSCs
# 3 - IGG Control R1
!samtools view -h -S -b \
-o hpscs/aligned_reads/filtered_reads/Control_R1.bam \
hpscs/aligned_reads/Control_R1.sam

In [None]:
# hPSCs
# 1 - S9.6 DRIP R1
!samtools sort \
-o hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R1_sorted.bam \
hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R1.bam

In [None]:
# hPSCs
# 2 - S9.6 DRIP R2
!samtools sort \
-o hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R2_sorted.bam \
hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R2.bam

In [None]:
# hPSCs
# 3 - IGG Control R1
!samtools sort \
-o hpscs/aligned_reads/filtered_reads/Control_R1_sorted.bam \
hpscs/aligned_reads/filtered_reads/Control_R1.bam

In [None]:
# hPSCs
# 1 - S9.6 DRIP R1
!samtools view -bq 1 \
hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R1_sorted.bam > \
hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R1_aln.bam

In [None]:
# hPSCs
# 2 - S9.6 DRIP R2
!samtools view -bq 1 \
hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R2_sorted.bam > \
hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R2_aln.bam

In [None]:
# hPSCs
# 3 - IGG Control R1
!samtools view -bq 1 \
hpscs/aligned_reads/filtered_reads/Control_R1_sorted.bam > \
hpscs/aligned_reads/filtered_reads/Control_R1_aln.bam

In [None]:
# hPSCs
# 1 - S9.6 DRIP R1
!samtools index \
hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R1_aln.bam

In [None]:
# hPSCs
# 2 - S9.6 DRIP R2
!samtools index \
hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R2_aln.bam

In [None]:
# hPSCs
# 3 - IGG Control R1 (replicate 1)
!samtools index \
hpscs/aligned_reads/filtered_reads/Control_R1_aln.bam

In [None]:
# HAP1 WT
# 1 - S96_R1
!samtools view -h -S -b \
-o hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_1.bam \
hap1_samples/trimmed_files/aligned_files/S96_WT_1_aln_unsorted.sam

In [None]:
# HAP1 WT
# 2 - S96_R2
!samtools view -h -S -b \
-o hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_2.bam \
hap1_samples/trimmed_files/aligned_files/S96_WT_2_aln_unsorted.sam

In [None]:
# HAP1 WT
# 3 - WT_Control
!samtools view -h -S -b \
-o hap1_samples/trimmed_files/aligned_files/filtered_reads/WT_Control.bam \
hap1_samples/trimmed_files/aligned_files/WT_Control_aln_unsorted.sam

In [None]:
# HAP1 WT
# 1 - S96_R1
!samtools sort \
-o hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_1_sorted.bam \
hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_1.bam

In [None]:
# HAP1 WT
# 2 - S96_R2
!samtools sort \
-o hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_2_sorted.bam \
hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_2.bam

In [None]:
# HAP1 WT
# 3 - WT_Control
!samtools sort \
-o hap1_samples/trimmed_files/aligned_files/filtered_reads/WT_Control_sorted.bam \
hap1_samples/trimmed_files/aligned_files/filtered_reads/WT_Control.bam

In [None]:
# HAP1 WT
# 1 - S96_R1
!samtools view -bq 1 \
hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_1_sorted.bam > \
hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_1_aln.bam

In [None]:
# HAP1 WT
# 2 - S96_R2
!samtools view -bq 1 \
hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_2_sorted.bam > \
hap1_samples/trimmed_files/aligned_files/filtered_reads/S96_WT_2_aln.bam

In [None]:
# HAP1 WT
# 3 - WT_Control
!samtools view -bq 1 \
hap1_samples/trimmed_files/aligned_files/filtered_reads/WT_Control_sorted.bam > \
hap1_samples/trimmed_files/aligned_files/filtered_reads/WT_Control_aln.bam

In [None]:
# HAP1 WT
# 1 - S96_R1
!samtools index \
hap1_samples/aligned_files/filtered_reads/S96_WT_1_aln.bam

In [None]:
# HAP1 WT
# 2 - S96_R2
!samtools index \
hap1_samples/aligned_files/filtered_reads/S96_WT_2_aln.bam

In [None]:
# HAP1 WT
# 3 - WT_Control
!samtools index \
hap1_samples/aligned_files/filtered_reads/WT_Control_aln.bam

In [None]:
# HAP1 KO 2B1
# 1 - S96_KOA2B1_1
!samtools view -h -S -b \
-o hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_1.bam \
hap1_samples/aligned_files/S96_KOA2B1_1_aln_unsorted.sam

In [None]:
# HAP1 KO 2B1
# 2 - S96_KOA2B1_2
!samtools view -h -S -b \
-o hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_2.bam \
hap1_samples/aligned_files/S96_KOA2B1_2_aln_unsorted.sam

In [None]:
# HAP1 KO 2B1
# 3 - Control
!samtools view -h -S -b \
-o hap1_samples/aligned_files/filtered_reads/KO2B1_Control.bam \
hap1_samples/aligned_files/KO2B1_Control_aln_unsorted.sam

In [None]:
# HAP1 KO 2B1
# 1 - S96_KOA2B1_1
!samtools sort \
-o hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_1_sorted.bam \
hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_1.bam

In [None]:
# HAP1 KO 2B1
# 2 - S96_KOA2B1_2
!samtools sort \
-o hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_2_sorted.bam \
hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_2.bam

In [None]:
# HAP1 KO 2B1
# 3 - Control
!samtools sort \
-o hap1_samples/aligned_files/filtered_reads/KO2B1_Control_sorted.bam \
hap1_samples/aligned_files/filtered_reads/KO2B1_Control.bam

In [None]:
# HAP1 KO 2B1
# 1 - S96_KOA2B1_1
!samtools view -bq 1 \
hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_1_sorted.bam > \
hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_1_aln.bam

In [None]:
# HAP1 KO 2B1
# 2 - S96_KOA2B1_2
!samtools view -bq 1 \
hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_2_sorted.bam > \
hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_2_aln.bam

In [None]:
# HAP1 KO 2B1
# 3 - Control
!samtools view -bq 1 \
hap1_samples/aligned_files/filtered_reads/KO2B1_Control_sorted.bam > \
hap1_samples/aligned_files/filtered_reads/KO2B1_Control_aln.bam

In [None]:
# HAP1 KO 2B1
# 1 - S96_KOA2B1_1
!samtools index \
hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_1_aln.bam

In [None]:
# HAP1 KO 2B1
# 2 - S96_KOA2B1_2
!samtools index \
hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_2_aln.bam

In [None]:
# HAP1 KO 2B1
# 3 - Control
!samtools index \
hap1_samples/aligned_files/filtered_reads/KO2B1_Control_aln.bam

In [None]:
# HAP1 KO YF2
# 1 - S96_KOYF2_1
!samtools view -h -S -b \
-o hap1_samples/aligned_files/filtered_reads/S96_KOYF2_1.bam \
hap1_samples/aligned_files/S96_KOYF2_1_aln_unsorted.sam

In [None]:
# HAP1 KO YF2
# 2 - S96_KOYF2_2
!samtools view -h -S -b \
-o hap1_samples/aligned_files/filtered_reads/S96_KOYF2_2.bam \
hap1_samples/aligned_files/S96_KOYF2_2_aln_unsorted.sam

In [None]:
# HAP1 KO YF2
# 3 - Control
!samtools view -h -S -b \
-o hap1_samples/aligned_files/filtered_reads/KOYF2_Control.bam \
hap1_samples/aligned_files/KOYF2_Control_aln_unsorted.sam

In [None]:
# HAP1 KO YF2
# 1 - S96_KOYF2_1
!samtools sort \
-o hap1_samples/aligned_files/filtered_reads/S96_KOYF2_1_sorted.bam \
hap1_samples/aligned_files/filtered_reads/S96_KOYF2_1.bam

In [None]:
# HAP1 KO YF2
# 2 - S96_KOYF2_2
!samtools sort \
-o hap1_samples/aligned_files/filtered_reads/S96_KOYF2_2_sorted.bam \
hap1_samples/aligned_files/filtered_reads/S96_KOYF2_2.bam

In [None]:
# HAP1 KO YF2
# 3 - Control
!samtools sort \
-o hap1_samples/aligned_files/filtered_reads/KOYF2_Control_sorted.bam \
hap1_samples/aligned_files/filtered_reads/KOYF2_Control.bam

In [None]:
# HAP1 KO YF2
# 1 - S96_KOYF2_1
!samtools view -bq 1 \
hap1_samples/aligned_files/filtered_reads/S96_KOYF2_1_sorted.bam > \
hap1_samples/aligned_files/filtered_reads/S96_KOYF2_1_aln.bam

In [None]:
# HAP1 KO YF2
# 2 - S96_KOYF2_2
!samtools view -bq 1 \
hap1_samples/aligned_files/filtered_reads/S96_KOYF2_2_sorted.bam > \
hap1_samples/aligned_files/filtered_reads/S96_KOYF2_2_aln.bam

In [None]:
# HAP1 KO YF2
# 3 - Control
!samtools view -bq 1 \
hap1_samples/aligned_files/filtered_reads/KOYF2_Control_sorted.bam > \
hap1_samples/aligned_files/filtered_reads/KOYF2_Control_aln.bam

In [None]:
# HAP1 KO YF2
# 1 - S96_KOYF2_1
!samtools index \
hap1_samples/aligned_files/filtered_reads/S96_KOYF2_1_aln.bam

In [None]:
# HAP1 KO YF2
# 2 - S96_KOYF2_2
!samtools index \
hap1_samples/aligned_files/filtered_reads/S96_KOYF2_2_aln.bam

In [None]:
# HAP1 KO YF2
# 3 - Control
!samtools index \
hap1_samples/aligned_files/filtered_reads/KOYF2_Control_aln.bam

## Peaks calling (broad peaks)

In [None]:
# hPCSs
# 1 - S9.6 DRIP R1
!macs2 callpeak -t hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R1_aln.bam \
-c hpscs/aligned_reads/filtered_reads/Control_R1_aln.bam \
--broad \
--format BAM -g hs \
--keep-dup all \
--name S9.6_DRIP_R1__IGG_Control_R1 \
--outdir hpscs/aligned_reads/filtered_reads/peak_calling_macs2

In [None]:
# hPCSs
# 2 - S9.6 DRIP R2
!macs2 callpeak -t hpscs/aligned_reads/filtered_reads/S9.6_DRIP_R2_aln.bam \
-c hpscs/aligned_reads/filtered_reads/Control_R1_aln.bam \
--broad \
--format BAM -g hs \
--keep-dup all \
--name S9.6_DRIP_R2__IGG_Control_R1 \
--outdir hpscs/aligned_reads/filtered_reads/peak_calling_macs2

In [None]:
# HAP1 WT
# 1 - S96_R1
!macs2 callpeak -t hap1_samples/aligned_files/filtered_reads/S96_WT_1_aln.bam \
-c hap1_samples/aligned_files/filtered_reads/WT_Control_aln.bam \
--broad \
--format BAM -g hs \
--keep-dup all \
--name S96_WT_1__WT_Control \
--outdir hap1_samples/aligned_files/filtered_reads/peak_calling_macs2

In [None]:
# HAP1 WT
# 2 - S96_R2
!macs2 callpeak -t hap1_samples/aligned_files/filtered_reads/S96_WT_2_aln.bam \
-c hap1_samples/aligned_files/filtered_reads/WT_Control_aln.bam \
--broad \
--format BAM -g hs \
--keep-dup all \
--name S96_WT_2__WT_Control \
--outdir hap1_samples/aligned_files/filtered_reads/peak_calling_macs2

In [None]:
# HAP1 KO 2B1
# 1 - S96_KOA2B1_1
!macs2 callpeak -t hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_1_aln.bam \
-c hap1_samples/aligned_files/filtered_reads/KO2B1_Control_aln.bam \
--broad \
--format BAM -g hs \
--keep-dup all \
--name S96_KOA2B1_1__KO2B1_Control \
--outdir hap1_samples/aligned_files/filtered_reads/peak_calling_macs2

In [None]:
# HAP1 KO 2B1
# 2 - S96_KOA2B1_2
!macs2 callpeak -t hap1_samples/aligned_files/filtered_reads/S96_KOA2B1_2_aln.bam \
-c hap1_samples/aligned_files/filtered_reads/KO2B1_Control_aln.bam \
--broad \
--format BAM -g hs \
--keep-dup all \
--name S96_KOA2B1_2__KO2B1_Control \
--outdir hap1_samples/aligned_files/filtered_reads/peak_calling_macs2

In [None]:
# HAP1 KO YF2
# 1 - S96_KOYF2_1
!macs2 callpeak -t hap1_samples/aligned_files/filtered_reads/S96_KOYF2_1_aln.bam \
-c hap1_samples/aligned_files/filtered_reads/KOYF2_Control_aln.bam \
--broad \
--format BAM -g hs \
--keep-dup all \
--name S96_KOYF2_1__KOYF2_Control \
--outdir hap1_samples/aligned_files/filtered_reads/peak_calling_macs2

In [None]:
# HAP1 KO YF2
# 2 - S96_KOYF2_2
!macs2 callpeak -t hap1_samples/aligned_files/filtered_reads/S96_KOYF2_2_aln.bam \
-c hap1_samples/aligned_files/filtered_reads/KOYF2_Control_aln.bam \
--broad \
--format BAM -g hs \
--keep-dup all \
--name S96_KOYF2_2__KOYF2_Control \
--outdir hap1_samples/aligned_files/filtered_reads/peak_calling_macs2