Detailed Use: Replication Timing log2 Coverage Ratio from Two BAM Files

Replication timing (or Timex data) sequencing can be used to identify early and late replicating genome regions. This is accomplished by comparing the sequencing depth of G1-phase synchronized cells [2N genome state] to S-phase (or asynchronous) cells [2N-4N genome state]. Early replicating regions will have more copies, and generate higher sequencing depths in the S-phase cells, resulting in "copy-number change"-like events. With BAMscale, the log2 coverage ration of two BAM files can be done in one step. To get better results, it is suggested to use smoothening as well.

In our analyses, we generally use three main parameters: 1) bin size set to 100bp, 2) set the operation to "log2" and 3) smoothening set to 500 bins. This tells BAMscale, to calculate the average of -500 to +500 bins for each bin (of length 100bp), meaning that one bin is the average of 10k bases (500bins * 100binsize * 2).

In this tutorial, we will process Replication timing data from data from:

"Dual Roles of Poly(dA:dT) Tracts in Replication Initiation and Fork Collapse"

The article can be found here, this is the GEO profile with the data.

The complete analysis steps can be found below

G1-phase cells Repli_seq_rB_0h_NT_rep1

S-phase cells Repli_seq_aB_48h_NT_rep1

If we have the two BAM files, we can simply generate the log2 coverage tracks by running:

Important: The second BAM file will be divided by the first BAM!!!

./BAMscale scale --bam Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam \
    --bam Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam \
    -t 8 \
    --operation log2 \
    --binsize 100 \
    --smoothen 500

The command will output three BigWig files:

1) Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam.scaled.bw
2) Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam.scaled.bw
3) Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam_vs_Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam.log2.bw

As a comparison, we will add the END-seq and OK-seq data as well.

The upper track is the replication timing (log2) coverage track. The middle two tracks are from END-seq data, while the (green/red) dotplot is the OK-seq data (meaning where there is a Watson/Crick strand switch in OK-seq,there is END-seq signal as well).

Chromosome 1 (full)

The upper track is the replication timing (log2) coverage track. The while the (green/red) dotplot is the OK-seq data, while the bottom two tacks are from END-seq. As you can see in the figure, this region is early replicating, plus the END-seq and OK-seq coverages have similar signal positions.

Chromosome 1 (region from publication)

Preparing input data for BAMscale

1) Download Files from SRA in fastq Format

The first step is to download the samples using the sratoolkit.

 fasterq-dump SRR7440351 --progress --split-3
 fasterq-dump SRR7440352 --progress --split-3

2) Aligning Reads with BWA "mem"

Raw reads are aligned with the BWA mem aligner using default alignment settings (but multithreaded) to the hg19 human genome. Reads aligned with BWA were directly converted from SAM format to BAM format using samtools.

Sample Repli_seq_rB_0h_NT_rep1 (SRR7440351)

bwa mem -t 8 <mm10.idx> SRR7440351_1.fastq SRR7440351_2.fastq \
    | samtools view -Sbh > Repli_seq_rB_0h_NT_rep1.unsorted.bam

Sample Repli_seq_aB_48h_NT_rep1 (SRR7440352)

bwa mem -t 8 <mm10.idx> SRR7440352_1.fastq SRR7440352_2.fastq \
    | samtools view -Sbh > Repli_seq_aB_48h_NT_rep1.unsorted.bam

3) Sorting Reads and Marking Duplicates

Next, we sort reads, mark duplicates, and index the BAM files.

Sample Repli_seq_rB_0h_NT_rep1 (SRR7440351)

samtools sort -o Repli_seq_rB_0h_NT_rep1.sorted.bam Repli_seq_rB_0h_NT_rep1.unsorted.bam

samtools index Repli_seq_rB_0h_NT_rep1.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=Repli_seq_rB_0h_NT_rep1.sorted.bam \
      O=Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam \
      M=Repli_seq_rB_0h_NT_rep1.sorted.dedup.metrics

samtools index Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam

Sample Repli_seq_aB_48h_NT_rep1 (SRR7440352)

samtools sort -o Repli_seq_aB_48h_NT_rep1.sorted.bam Repli_seq_aB_48h_NT_rep1.unsorted.bam

samtools index Repli_seq_aB_48h_NT_rep1.sorted.bam

java -Xmx8g -jar picardtools.jar MarkDuplicates \
      I=Repli_seq_aB_48h_NT_rep1.sorted.bam \
      O=Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam \
      M=Repli_seq_aB_48h_NT_rep1.sorted.dedup.metrics

samtools Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam

4) Generating Scaled Coverage Tracks

Finally, we generate the log2 track and scaled coverage tracks

./BAMscale scale --bam Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam \
    --bam Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam \
    -t 8 \
    --operation log2 \
    --binsize 100 \
    --smoothen 500

The output of the two commands will be two BigWig files. BAMscale "scale" function appends the ".bw" suffix to the end of the input file. For these two examples, the output names will be:

1) Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam.bw
2) Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam.bw
3) Repli_seq_aB_48h_NT_rep1.sorted.dedup.bam_vs_Repli_seq_rB_0h_NT_rep1.sorted.dedup.bam.log2.bw

Provide feedback

Saved searches