Skip to content

Detailed Use: Scaling and smoothening END seq data

grossjm4 edited this page Mar 13, 2019 · 9 revisions

END-seq can be used to identify double-strand breaks at a base pair resolution. Using BAMscale, we can process multiple BAM files from a set of END-seq expresiments, to output scaled coverage tracks for comparison with a genome browser (such as IGV os UCSC genome browser).

In this tutorial, we will process END-seq data from:


"Dual Roles of Poly(dA:dT) Tracts in Replication Initiation and Fork Collapse"

The article can be found here, this is the GEO profile with the data.

The complete analysis steps can be found below.


We will reproduce the results shown in the first figure of the paper, comparing END-seq signal in non-treated (NT) and 10mM HU treated B-cells (using the first replicates).

Untreated Sample END_seq_aB_28h_NT_rep1

Treated Sample END_seq_aB_28h_HU_28h_10mM_rep1

If we have the two BAM files, we can simply generate the coverage tracks by running:

./BAMscale scale --bam END_seq_aB_28h_NT_rep1.sorted.dedup.bam -t 4
./BAMscale scale --bam END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam -t 4

These commands will output two BigWig files:

1) END_seq_aB_28h_NT_rep1.sorted.dedup.bam.bw
2) END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam.bw

As comparison, we can download the BigWig files from the paper that was deposited for the samples END_seq_aB_28h_NT_rep1 and END_seq_aB_28h_HU_28h_10mM_rep1 (bottom of the GEO page). Additionally, we can compare the results to the OK-seq as well (OK_seq_aB_72h_NT_rep1). We have a short tutorial, where we reproduce the OK-seq results.



The upper two (blue) tacks are from the paper, as well as the third OK-seq track. The lower three tracks are the reproduced results here and at the OK-seq example.




Preparing Input Data for BAMscale

1) Download Files from SRA in fastq Format

The first step is to download the samples using the sratoolkit.

 fasterq-dump SRR7440330 --progress --split-3
 fasterq-dump SRR7440327 --progress --split-3

2) Aligning Reads with BWA "mem"

Raw reads are aligned with the BWA mem aligner using default alignment settings (but multithreaded) to the hg19 human genome. Reads aligned with BWA were directly converted from SAM format to BAM format using samtools.

Sample END_seq_aB_28h_NT_rep1 (SRR7440330)

bwa mem -t 8 <mm10.idx> SRR7440330.fastq \
    | samtools view -Sbh > END_seq_aB_28h_NT_rep1.unsorted.bam

Sample END_seq_aB_28h_HU_28h_10mM_rep1 (SRR7440327)

bwa mem -t 8 <mm10.idx> SRR7440327.fastq \
    | samtools view -Sbh > END_seq_aB_28h_HU_28h_10mM_rep1.unsorted.bam

3) Sorting Reads and Marking Duplicates

Next, we sort reads, mark duplicates, and index the BAM files.

Sample END_seq_aB_28h_NT_rep1 (SRR7440330)

samtools sort -o END_seq_aB_28h_NT_rep1.sorted.bam END_seq_aB_28h_NT_rep1.unsorted.bam

samtools index END_seq_aB_28h_NT_rep1.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=END_seq_aB_28h_NT_rep1.sorted.bam \
      O=END_seq_aB_28h_NT_rep1.sorted.dedup.bam \
      M=END_seq_aB_28h_NT_rep1.sorted.dedup.metrics

samtools index END_seq_aB_28h_NT_rep1.sorted.dedup.bam

Sample END_seq_aB_28h_HU_28h_10mM_rep1 (SRR7440327)

samtools sort -o END_seq_aB_28h_HU_28h_10mM_rep1.sorted.bam END_seq_aB_28h_HU_28h_10mM_rep1.unsorted.bam

samtools index END_seq_aB_28h_HU_28h_10mM_rep1.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=END_seq_aB_28h_HU_28h_10mM_rep1.sorted.bam \
      O=END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam \
      M=END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.metrics

samtools END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam

4) Generating Scaled Coverage Tracks

Finally, we generate the scaled coverage tracks.

./BAMscale scale --bam END_seq_aB_28h_NT_rep1.sorted.dedup.bam -t 4
./BAMscale scale --bam END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam -t 4

The output of the two commands will be two BigWig files. BAMscale "scale" function appends the ".bw" suffix to the end of the input file. For these two examples, the output names will be:

1) END_seq_aB_28h_NT_rep1.sorted.dedup.bam.bw
2) END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam.bw