Skip to content

Per base coverage statistics pipeline to assess the expected versus factual coverage of the reference sequence region by the newly sequenced reads (arrival-like statistics). Plots the per base coverage, helping visualize the possible sequencing\mapping pitfalls and repetative region collapses.

License

Notifications You must be signed in to change notification settings

savytskanatalia/PBC_plot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

PBC_plot

Beta submission! Includes py3 script for per base coverage plot generation. Will be shortly followed by nextflow pipeline for mapping reads to reference, processing and obtaining the input files for the .py plotting script.

Per base coverage statistics pipeline to assess the expected versus factual coverage of the reference sequence region by the newly sequenced reads (arrival-like statistics). Plots the per base coverage, helping visualize the possible sequencing\mapping pitfalls and repetative region collapses.


  • Arrival Statistics is calculated for each position in the reference genome (can be closely related species, if absent). For this the per base coverage file is generated with bedtools genecoverage, based on the reads mapped to reference (using minimap2, e.g.).
  • To calculate mean coverage for short reads the following formula is used:
c=nreads*lreads/lgen

where nreads - n of mapped to ref reads, lreads - length of short reads(hard-coded in beta to 151 in script), lgene - length of reference sequence, e.g.genome

  • To calculate mean coverage for long reads the following formula is used:
c=sum of lengths of all mapped reads/length of reference sequence

The script was first designed to indirectly detect and visualize potential NUMTs-derived contaminants in mitochondrial reads. While the arrival statistics is mostly used to detect the collapsed repetative regions, in case of mitochondrial genomes the spike in coverage of particular loci in mitogenome may be evident of the nuclear pseudogenic derived reads, that may impact the adequacy of mitochondrial genome assembly.

About

Per base coverage statistics pipeline to assess the expected versus factual coverage of the reference sequence region by the newly sequenced reads (arrival-like statistics). Plots the per base coverage, helping visualize the possible sequencing\mapping pitfalls and repetative region collapses.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages