Skip to content

Detailed Use: Quantifying Peak Coverages from Multiple BAM Files

pongorlorinc edited this page Mar 15, 2019 · 4 revisions

A detailed description can be found here

Quantification of peaks is a one step process with BAMscale. As input, you will need a bed file (with peaks to quantify) as well as one or more BAM files.

Basic Command (example)

./BAMscale cov --bed <preaks.bed> --bam <SAMPLE1.bam> --bam <SAMPLE2.bam> ... --bam <SAMPLEn.bam>

This command will generate for files containing the quantified peaks:

1. Raw Peak Coverages
2. FPKM Normalized Peaks 
3. TPM Normalized Peaks
4. Library-Size Normalized Peaks

Comparing ATAC-seq changes induced from treatment

For input data, we use ATAC-seq from:

"SLFN11 Blocks Stressed Replication Forks Independently of ATR. Mol Cell, 2018"

Pubmed and GEO page.


A detailed walkthrough of the processing can be seen below!


After the alignment, peak-calling, and peak-merging, we are left with 6 BAM (coverage) files and one BED (peak) file. Three BAM files originate from CEM-CCRF cell line, that are either untreated, or treated with CPT for 2 and 4 hours. The other three BAM files are from CB45 (CEM-CCRF SLFN11 gene knockout) cells that were untreated, or treated with CPT for 2 and 4 hours.

Quantifying peaks to compare the global ATAC-seq signal changes in the BAM files can be done with the following command:

./BAMscale cov --bed merged_peaks.bed --prefix ATAC_peaks.quant \
    --bam CEM_CPT0h.sorted.dedup.bam \
    --bam CEM_CPT2h.sorted.dedup.bam \
    --bam CEM_CPT4h.sorted.dedup.bam \
    --bam CB45_CPT0h.sorted.dedup.bam \
    --bam CB45_CPT2h.sorted.dedup.bam \
    --bam CB45_CPT4h.sorted.dedup.bam 

Four files are generated from this step:

ATAC_peaks.quant.FPKM_normalized_coverages.tsv
ATAC_peaks.quant.TPM_normalized_coverages.tsv
ATAC_peaks.quant.Library_normalized_coverages.tsv
ATAC_peaks.quant.raw_coverages.tsv

The generated matrices with scaled peak scores can be used to plot with preferred tools, or with the R script we provided here (here LINK). The script uses ggplot2.



A) SLFN11+ (CEM) cells untreated [x-axis] vs. treated with CPT (2h, 4h) [y-axis]

As we can see from the dotplots (heatmap), there are some ATAC sites that open from treatment.



B) SLFN11- (CB45) cells untreated [x-axis] vs. treated with CPT (2h, 4h) [y-axis]

In case of the SLFN11 negative cells, ATAC-sites do not change from treatment

As we can see from the dotplots (heatmap), a big portion of peaks does not change, but there are some ATAC sites that open from treatment.

Preparing input data for BAMscale

1) Download Files from SRA in fastq Format

The first step is to download the samples using the sratoolkit.

 fasterq-dump SRR5831755 --split-3
 fasterq-dump SRR5831756 --split-3
 fasterq-dump SRR5831757 --split-3
 fasterq-dump SRR5831758 --split-3
 fasterq-dump SRR5831759 --split-3
 fasterq-dump SRR5831760 --split-3

2) Aligning Reads with BWA "mem"

Raw reads are aligned with the BWA mem aligner using default alignment settings (with the exception of multi-threading) to the hg19 human genome. Reads aligned with BWA were directly converted from SAM format to BAM format using samtools.

Sample CEM_CPT0h (SRR5831755)

bwa mem -t 8 <hg19.idx> SRR5831755_1.fastq SRR5831755_2.fastq \
    | samtools view -Sbh > CEM_CPT0h.unsorted.bam

Sample CEM_CPT4h (SRR5831756)

bwa mem -t 8 <hg19.idx> SRR5831756_1.fastq SRR5831756_2.fastq \
    | samtools view -Sbh > CEM_CPT2h.unsorted.bam

Sample CEM_CPT4h (SRR5831757)

bwa mem -t 8 <hg19.idx> SRR5831757_1.fastq SRR5831757_2.fastq \
    | samtools view -Sbh > CEM_CPT4h.unsorted.bam

Sample CEM_CPT0h (SRR5831758)

bwa mem -t 8 <hg19.idx> SRR5831758_1.fastq SRR5831758_2.fastq \
    | samtools view -Sbh > CB45_CPT0h.unsorted.bam

Sample CEM_CPT4h (SRR5831759)

bwa mem -t 8 <hg19.idx> SRR5831759_1.fastq SRR5831759_2.fastq \
    | samtools view -Sbh > CB45_CPT2h.unsorted.bam

Sample CEM_CPT4h (SRR5831760)

bwa mem -t 8 <hg19.idx> SRR5831760_1.fastq SRR5831760_2.fastq \
    | samtools view -Sbh > CB45_PT4h.unsorted.bam

3) Sorting Reads and Marking Duplicates

Next, we sort reads, mark duplicates, and index the BAM files.

Sample CEM_CPT0h (SRR5831755)

samtools sort -o CEM_CPT0h.sorted.bam CEM_CPT0h.unsorted.bam

samtools index CEM_CPT0h.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=CEM_CPT0h.sorted.bam \
      O=CEM_CPT0h.sorted.dedup.bam \
      M=CEM_CPT0h.sorted.dedup.metrics

samtools index CEM_CPT0h.sorted.dedup.bam

Sample CEM_CPT2h (SRR5831756)

samtools sort -o CEM_CPT2h.sorted.bam CEM_CPT0h.unsorted.bam

samtools index CEM_CPT2h.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=CEM_CPT2h.sorted.bam \
      O=CEM_CPT2h.sorted.dedup.bam \
      M=CEM_CPT2h.sorted.dedup.metrics

samtools index CEM_CPT2h.sorted.dedup.bam

Sample CEM_CPT4h (SRR5831757)

samtools sort -o CEM_CPT4h.sorted.bam CEM_CPT4h.unsorted.bam

samtools index CEM_CPT4h.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=CEM_CPT4h.sorted.bam \
      O=CEM_CPT4h.sorted.dedup.bam \
      M=CEM_CPT4h.sorted.dedup.metrics

samtools CEM_CPT4h.sorted.dedup.bam

Sample CB45_CPT0h (SRR5831758)

samtools sort -o CB45_CPT0h.sorted.bam CB45_CPT0h.unsorted.bam

samtools index CB45_CPT0h.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=CB45_CPT0h.sorted.bam \
      O=CB45_CPT0h.sorted.dedup.bam \
      M=CB45_CPT0h.sorted.dedup.metrics

samtools index CB45_CPT0h.sorted.dedup.bam

Sample CB45_CPT2h (SRR5831759)

samtools sort -o CB45_CPT2h.sorted.bam CB45_CPT2h.unsorted.bam

samtools index CB45_CPT2h.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=CB45_CPT2h.sorted.bam \
      O=CB45_CPT2h.sorted.dedup.bam \
      M=CB45_CPT2h.sorted.dedup.metrics

samtools index CB45_CPT2h.sorted.dedup.bam

Sample CB45_CPT4h (SRR5831760)

samtools sort -o CB45_CPT4h.sorted.bam CB45_CPT4h.unsorted.bam

samtools index CB45_CPT4h.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=CB45_CPT4h.sorted.bam \
      O=CB45_CPT4h.sorted.dedup.bam \
      M=CB45_CPT4h.sorted.dedup.metrics

samtools index CB45_CPT4h.sorted.dedup.bam

4) Calling Peaks with MACS

macs2 callpeak -t CEM_CPT0h.sorted.dedup.bam --nomodel -g hs -f BAMPE -q 0.01 -n CEM_CPT0h
macs2 callpeak -t CEM_CPT2h.sorted.dedup.bam --nomodel -g hs -f BAMPE -q 0.01 -n CEM_CPT2h
macs2 callpeak -t CEM_CPT4h.sorted.dedup.bam --nomodel -g hs -f BAMPE -q 0.01 -n CEM_CPT4h
macs2 callpeak -t CB45_CPT0h.sorted.dedup.bam --nomodel -g hs -f BAMPE -q 0.01 -n CB45_CPT0h
macs2 callpeak -t CB45_CPT2h.sorted.dedup.bam --nomodel -g hs -f BAMPE -q 0.01 -n CB45_CPT2h
macs2 callpeak -t CB45_CPT4h.sorted.dedup.bam --nomodel -g hs -f BAMPE -q 0.01 -n CB45_CPT4h

5) Merging Called Peaks with bedtools

In this step we combine the called peaks in the previous step, sort the peaks, and merge them into a unique st of peaks.

cat *.narrowPeak | bedtools sort | bedtools merge > merged_peaks.bed

6) Quantifying Peaks with BAMscale

./BAMscale cov --bed merged_peaks.bed --prefix ATAC_peaks.quant --bam CEM_CPT0h.sorted.dedup.bam --bam CEM_CPT2h.sorted.dedup.bam --bam CEM_CPT4h.sorted.dedup.bam --bam CB45_CPT0h.sorted.dedup.bam --bam CB45_CPT2h.sorted.dedup.bam --bam CB45_CPT4h.sorted.dedup.bam 

4) (OPTIONAL) Generating Scaled Coverage Tracks

Finally, we generate the scaled coverage tracks.

./BAMscale scale --bam CEM_CPT0h.sorted.dedup.bam -t 4
./BAMscale scale --bam CEM_CPT2h.sorted.dedup.bam -t 4
./BAMscale scale --bam CEM_CPT4h.sorted.dedup.bam -t 4
./BAMscale scale --bam CB45_CPT0h.sorted.dedup.bam -t 4
./BAMscale scale --bam CB45_CPT2h.sorted.dedup.bam -t 4
./BAMscale scale --bam CB45_CPT4h.sorted.dedup.bam -t 4

The output of the two commands will be two files in BigWig format. BAMscale "scale" function appends the ".bw" suffix to the end of the input file. For these two examples, the output names will be:

1) CEM_CPT0H.sorted.dedup.bam.bw
2) CEM_CPT2H.sorted.dedup.bam.bw
3) CEM_CPT4H.sorted.dedup.bam.bw
4) CB45_CPT0H.sorted.dedup.bam.bw
5) CB45_CPT2H.sorted.dedup.bam.bw
6) CB45_CPT4H.sorted.dedup.bam.bw