Skip to content

Finding OK seq strand switched from the RFD track

pongorlorinc edited this page Aug 21, 2019 · 3 revisions

In this section we will use a very simple script to find OK-seq strand switched for visualization in IGV. Please note that this script is was written in R using the rtracklayer package, so please cite the package when using it.

WARNING: this is a very simplified approach for getting approximately good results, for more accurate results use the original method proposed in the paper by Petryk et al. 2018.

These are the arguments for the script:

usage: OKseq_switches.R [-h] [-i INPUT] [-b BINSIZE] [-l LEFTMIN]
                    [-r RIGHTMIN] [-d DIFF] [-s SLIDESIZE]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                    Input OK-seq bigWig file.
  -b BINSIZE, --binsize BINSIZE
                    Bin size [default 50000]
  -l LEFTMIN, --leftmin LEFTMIN
                    Mean value of bin on left side [default -0.15]
  -r RIGHTMIN, --rightmin RIGHTMIN
                    Mean value of bin on right side [default 0.15]
  -d DIFF, --diff DIFF  Minimum difference between left and right side means
                    [default 0.4]
  -s SLIDESIZE, --slidesize SLIDESIZE
                    Minimum difference between left and right side means
                    [default 50000]

The input bigWig is specified with "-i ".

With "--binsize" you can set what is the size of one bin. The positions of bins is summarized in the figure below.

In terms of scoring, the "--leftmin" parameter sets the mean maximum value for the bin on the left (meaning the score has to be below this), the "--rightmin" parameter sets the minimum value of the bin on the right. The "--diff" parameter sets what is the minimum score difference between the left- and right bins. A region is accepted if:

1) left-bin score <= leftmin
2) right-bin score >= rightmin
3) (right-bin score) - (left-bin score) > diff

The "--slidesize" parameter sets the sliding window steps, by default every 50k bases are examined.

Requirements

Input data

An RFD track in bigWig format created by BAMscale. In this example we will find OK-seq strand switches obtained form here

Input script

We added an R script to the main page. This script takes as input one argument, which is the RFD track in bigWig format created by BAMscale.

Library

You will need R, and the rtracklayer package. See page for installation.

If you have Bioconductor already installed, then simply open an R session and run:

BiocManager::install("rtracklayer")

Additionally the argparse package is needed to parse the command line arguments of the script. This can be installed in an R session:

install.packages("argparse")

Usage example 1

In this example we will reanalyze OK-seq data from K562

After alignment and RDF track creation with BAMscale, the OK-seq strand switches can be found by running:

Rscript R/OKseq_switches.R -i ERR2760956_K562_OK-seq_BR1.rfd.bw

This command will create a BED file named ERR2760956_K562_OK-seq_BR1.rfd.OKseq_switches.bed

Here is a visualization of the output bigwig file and replication timing segments:

Usage example 2

In this example we will reanalyze OK-seq data from the example in the OK-seq manual or BAMscale.

The OK-seq strand switches for this example can be found by running can be found by running:

Rscript R/OKseq_switches.R -i SRR7440353.sorted.dedup.rfd.bw

This command will create a BED file named SRR7440353.sorted.dedup.rfd.OKseq_switches.bed

Here is a visualization of the output bigwig file and replication timing segments:

Since the approach of the script is simple, you can see in the example that not all regions are identified correctly (left side). This can be fixed by increasing the bin size and adjusting the bin score thresholds.