Skip to content

A tool for calculation of pileup mappability for any genome of interest.

License

Notifications You must be signed in to change notification settings

maxgmarin/pupmapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pupmapper logo

License: MIT Static Badge

Pupmapper: A Pileup Mappability Calculator

Table of Contents

Motivation

The Pileup Mappability metric can be used to quickly identify regions which may be more difficult to perform variant calling with short-read WGS data. pupmapper was created to allow users to quickly convert k-mer mappability scores to pileup mappability.

I would recommend running pupmapper on k-mer mappability scores generated by the Genmap software.

NOTE: I just added a subcommand run_all that will run Genmap on the input genome and then calculate pileup mappability scores from the genmap results.

How is pileup mappability calculated from k-mer mappability?

PmapFig

The Pileup mappability of a position is calculated as the mean k-mer mappability of all k-mers overlapping a given position.
**Pileup mappability is useful because it gives a sense of uniquemess of all possible reads (of defined length) that could align to a given position.**

Useful reading for k-mer mappability and pileup mappability:

Derrien, T, (2012). Fast Computation and Applications of Genome Mappability. PLOS ONE 7(1): e30377. https://doi.org/10.1371/journal.pone.0030377

Pockrandt C, (2020) GenMap: ultra-fast computation of genome mappability, Bioinformatics, Volume 36, Issue 12, June 2020, Pages 3687–3692, https://doi.org/10.1093/bioinformatics/btaa222

Lee H, Schatz MC. (2012). Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score, Bioinformatics, Volume 28, Issue 16, August 2012, Pages 2097–2105, https://doi.org/10.1093/bioinformatics/bts330

Installation

Install locally

pupmapper can be installed by cloning this repository and installing with pip.

git clone git@github.com:maxgmarin/pupmapper.git

cd pgqc

pip install . 

pip

🚧 Check back soon 🚧

Basic usage

1) run_all - Run all genmap pre-processing steps (indexing, k-mer mappability) and then calculate pileup mappability

pupmapper run_all -i Input.Genome.fasta -o output_directory/ -k 50 -e 1

The above command will first use genmap to calculate k-mer mappability scores for the input genome and then calculate pileup mappability scores.

2) run_pileup - Calculate pileup mappability from already generated k-mer mappability values

pupmapper run_pileup -i kmap.K50E0.bedgraph -o pupmap.K50E0.bedgraph -k 50

The above command will calculate pileup mappability scores based on input k-mer mappabilities (k= 50 bp, E = 0 mismatches) that were generated using genmap.

Analyzing included test sequence

If you wish to run an pupmapper on a small test sequence (15 bp), you can run the following commands:

cd tests/data/Genmap_Ex1/Ex1_gm_output

pupmapper run_pileup -i Ex1_Kmap_K4E0.bedgraph -o Ex1_Pmap_K4E0.bedgraph -k 4

The input file (Ex1_Kmap_K4E0.bedgraph) was generated by running genmap on the tests/data/Genmap_Ex1/Ex1.genome.fasta with a k-mer size of 4 bp and a max mismatch of 0 (K=4,E=0).

Full usage

$pupmapper run_pileup --help

usage: pupmapper run_pileup [-h] -i INPUT -o OUTPUT -k KMER_LEN

Command for calculating genome wide pileup mappability based on k-mer mappability values

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input k-mer mappability values in bedgraph format (.bedgraph). Ideally, generated with genmap software
  -o OUTPUT, --output OUTPUT
                        Output table of pileup mappability (.bedgraph)
  -k KMER_LEN, --kmer_len KMER_LEN
                        k-mer length (bp) used to generate the input k-mer mappability values
....

FAQ

1) How do I go from my genome of interest to identifying regions with low pileup mappability?

  • 1.1) Use genmap (with your desired parameters) to calculate k-mer mappability for your genome of interest. (Output to .bedgraph)
  • 1.2) Use pupmapper to calculate pileup mappability from k-mer mappability (output to .bedgraph)
  • 1.3) Use awk and bedtools to identify regions of the genome which have pileup mappability < 1 (or below your desired threshold).

2) How do I generate the k-mer mappability values that pupmapper needs?

To calculate pileup mappability with pupmapper you must first generate k-mer mappability values with genmap. Refer to the getting started section of genmap's README for more details.

You can use genmap in two steps:

1) Index your target sequence

$ ./genmap index -F /path/to/fasta.fasta -I /path/to/index/folder

2) Calculate k-mer mappability with desired parameters (in this example k = 30 bp and E = up to 2 mismatches)

$ ./genmap map -K 30 -E 2 -I /path/to/index/folder -O /path/to/output/folder -t -w -bg

About

A tool for calculation of pileup mappability for any genome of interest.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages