Skip to content

Detailed Use: Processing END seq Data

pongorlorinc edited this page May 1, 2019 · 2 revisions

END-seq can be used to identify double-strand breaks at a base pair resolution. Using BAMscale, we can process multiple BAM files from a set of END-seq experiments, to output scaled coverage tracks for comparison with a genome browser (such as IGV os UCSC genome browser). Additionally we can create strand-specific coverage tracks by using the "--operation endseq" setting, or "--operation endseqr" which negates the reverse strand scores (this is useful if the two tracks are overlayed in the genome browser)

Input data

In this tutorial, we will process END-seq data from:


"Dual Roles of Poly(dA:dT) Tracts in Replication Initiation and Fork Collapse"

The article can be found here along with the GEO profile and attached data.

The complete analysis steps can be found below.


We will reproduce the results shown in the first figure of the paper, comparing END-seq signal in non-treated (NT) and 10mM HU treated B-cells (using the first replicates).

Untreated Sample END_seq_aB_28h_NT_rep1

Treated Sample END_seq_aB_28h_HU_28h_10mM_rep1

Creating stranded coverage tracks

In this section, we will generate stranded coverage tracks for the END-seq data. There are two main approaches, where the main difference is whether the reverse strand scores should be negative or not.

Creating standard stranded coverage tracks

In this case we can use the "--operation endseq", that will output two scaled coverage tracks for the froward and reverse oriented reads.

./BAMscale scale -t 4 --operation endseq --bam END_seq_aB_28h_NT_rep1.sorted.dedup.bam

Creating stranded coverage tracks where the reverse strand is negated

This is helpful in case you would like to overlay the tracks using a visualization tool (such as IGV).

./BAMscale scale -t 4 --operation endseqr --bam END_seq_aB_28h_NT_rep1.sorted.dedup.bam

Visualization of the stranded data

In the visualization there are three tracks. The top track is the coverage track from reads aligning to the forward strand. The middle track is the coverage track for the reads aligning to the negative strand. These two tracks were generated with the "--operation endseq" option.

The third track is an overlay of the two tracks generated with the "--operation endseqr" option, where the negative strand scores were negated. The two tracks were then overlayed by IGV (by selecting the two tracks -> right click -> overlay tracks).




Creating unstranded coverage tracks

If we have the two BAM files, we can simply generate the coverage tracks by running:

./BAMscale scale --bam END_seq_aB_28h_NT_rep1.sorted.dedup.bam -t 4
./BAMscale scale --bam END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam -t 4

These commands will output two BigWig files:

1) END_seq_aB_28h_NT_rep1.sorted.dedup.bam.bw
2) END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam.bw

As comparison, we can download the BigWig files from the paper that was deposited for the samples END_seq_aB_28h_NT_rep1 and END_seq_aB_28h_HU_28h_10mM_rep1 (bottom of the GEO page). Additionally, we can compare the results to the OK-seq as well (OK_seq_aB_72h_NT_rep1). We have a short tutorial, where we reproduce the OK-seq results.



The upper two (blue) tacks are from the paper, as well as the third OK-seq track. The lower three tracks are the reproduced results here and at the OK-seq example.




Preparing Input Data for BAMscale

1) Download Files from SRA in fastq Format

The first step is to download the samples using the sratoolkit.

 fasterq-dump SRR7440330 --progress --split-3
 fasterq-dump SRR7440327 --progress --split-3

2) Aligning Reads with BWA "mem"

Raw reads are aligned with the BWA mem aligner using default alignment settings (but multithreaded) to the hg19 human genome. Reads aligned with BWA were directly converted from SAM format to BAM format using samtools.

Sample END_seq_aB_28h_NT_rep1 (SRR7440330)

bwa mem -t 8 <mm10.idx> SRR7440330.fastq \
    | samtools view -Sbh > END_seq_aB_28h_NT_rep1.unsorted.bam

Sample END_seq_aB_28h_HU_28h_10mM_rep1 (SRR7440327)

bwa mem -t 8 <mm10.idx> SRR7440327.fastq \
    | samtools view -Sbh > END_seq_aB_28h_HU_28h_10mM_rep1.unsorted.bam

3) Sorting Reads and Marking Duplicates

Next, we sort reads, mark duplicates, and index the BAM files.

Sample END_seq_aB_28h_NT_rep1 (SRR7440330)

samtools sort -o END_seq_aB_28h_NT_rep1.sorted.bam END_seq_aB_28h_NT_rep1.unsorted.bam

samtools index END_seq_aB_28h_NT_rep1.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=END_seq_aB_28h_NT_rep1.sorted.bam \
      O=END_seq_aB_28h_NT_rep1.sorted.dedup.bam \
      M=END_seq_aB_28h_NT_rep1.sorted.dedup.metrics

samtools index END_seq_aB_28h_NT_rep1.sorted.dedup.bam

Sample END_seq_aB_28h_HU_28h_10mM_rep1 (SRR7440327)

samtools sort -o END_seq_aB_28h_HU_28h_10mM_rep1.sorted.bam END_seq_aB_28h_HU_28h_10mM_rep1.unsorted.bam

samtools index END_seq_aB_28h_HU_28h_10mM_rep1.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=END_seq_aB_28h_HU_28h_10mM_rep1.sorted.bam \
      O=END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam \
      M=END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.metrics

samtools END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam

4) Generating Scaled Coverage Tracks

Finally, we generate the scaled coverage tracks.

./BAMscale scale --bam END_seq_aB_28h_NT_rep1.sorted.dedup.bam -t 4
./BAMscale scale --bam END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam -t 4

The output of the two commands will be two BigWig files. BAMscale "scale" function appends the ".bw" suffix to the end of the input file. For these two examples, the output names will be:

1) END_seq_aB_28h_NT_rep1.sorted.dedup.bam.bw
2) END_seq_aB_28h_HU_28h_10mM_rep1.sorted.dedup.bam.bw
Clone this wiki locally