Skip to content

Detailed Use: Generating Scaled Coverage Tracks

pongorlorinc edited this page Mar 15, 2019 · 4 revisions

The BAMscale "scale" program can generate a scaled coverage track directly from a BAM file. By default, scaling is done based on the total no. of aligned bases divided by the genome size. This means that multiple samples can be scaled separately, one after the other.

Basic Command (example)

./BAMscale scale --bam <INPUT.bam> -t 4

This command will generate one file in BigWig format.

Specifying "-t 4" tells BAMscale, to use 4 processing threads, instead of the default 1. Each chromosome is processed by one thread, so in this example, 4 chromosomes are processed in parallel. Note: increasing no. of threads over 4 might not increase performance substantially (benchmarks).

The output of the two command is a BigWig file. BAMscale "scale" function appends the ".bw" suffix to the end of the input file. For a BAM file named "sample.bam" the output will be:

sample.bam.bw

Comparison with Published Dataset

As a comparison, we processed data from:

"SLFN11 Blocks Stressed Replication Forks Independently of ATR. Mol Cell, 2018"

Pubmed and GEO page.



A detailed walkthrough of the processing can be seen below!



In this case, we will compare the ATAC-seq signal in CEM-CCRF cells that are untreated (SRR5831755) or treated with CPT for 4 hours (SRR5831757)

In this example we can see a strong focal increase in the TOP1 gene promoter induced by CPT treatment (lower track), compared to the untreated (upper track) condition.

Note: Since the coverage tracks are scales, it is possible to apply group scaling. In IGV this is done by highlighting the tracks, right-click and selecting group autoscale.




Preparing Input Data for BAMscale

1) Download Files from SRA in fastq Format

The first step is to download the samples using the sratoolkit.

 fasterq-dump SRR5831757 --split-3
 fasterq-dump SRR5831755 --split-3

2) Aligning Reads with BWA "mem"

Raw reads are aligned with the BWA mem aligner using default alignment settings (but multithreaded) to the hg19 human genome. Reads aligned with BWA were directly converted from SAM format to BAM format using samtools.

Sample CEM_CPT0h (SRR5831755)

bwa mem -t 8 <hg19.idx> SRR5831755_1.fastq SRR5831755_2.fastq \
    | samtools view -Sbh > CEM_CPT0h.unsorted.bam

Sample CEM_CPT4h (SRR5831757)

bwa mem -t 8 <hg19.idx> SRR5831757_1.fastq SRR5831757_2.fastq \
    | samtools view -Sbh > CEM_CPT4h.unsorted.bam

3) Sorting Reads and Marking Duplicates

Next, we sort reads, mark duplicates, and index the BAM files.

Sample CEM_CPT0h (SRR5831755)

samtools sort -o CEM_CPT0h.sorted.bam CEM_CPT0h.unsorted.bam

samtools index CEM_CPT0h.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=CEM_CPT0h.sorted.bam \
      O=CEM_CPT0h.sorted.dedup.bam \
      M=CEM_CPT0h.sorted.dedup.metrics

samtools index CEM_CPT0h.sorted.dedup.bam

Sample CEM_CPT4h (SRR5831757)

samtools sort -o CEM_CPT4h.sorted.bam CEM_CPT4h.unsorted.bam

samtools index CEM_CPT4h.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=CEM_CPT4h.sorted.bam \
      O=CEM_CPT4h.sorted.dedup.bam \
      M=CEM_CPT4h.sorted.dedup.metrics

samtools CEM_CPT4h.sorted.dedup.bam

4) Generating Scaled Coverage Tracks

Finally, we generate the scaled coverage tracks

./BAMscale scale --bam CEM_CPT0h.sorted.dedup.bam -t 4
./BAMscale scale --bam CEM_CPT4h.sorted.dedup.bam -t 4

The output of the two commands will be two BigWig files. BAMscale "scale" function appends the ".bw" suffix to the end of the input file. For these two examples, the output names will be:

1) CEM_CPT0H.sorted.dedup.bam.bw
2) CEM_CPT4H.sorted.dedup.bam.bw