A tool for estimating sequences of CNV alleles from multiple individuals
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
cnvalloc
examples
figures
.gitignore
README.md

README.md

CNValloc

CNValloc

Requirements

  • Python 2.6 or 2.7
  • Modules: numpy, scipy, argtools

Description

A tool for estimating sequences of CNV alleles from multiple individuals. The allele ratio of each sample is also inferred.

A simple example

python cnvalloc estimate_alleles -v -K 4 examples/hist.txt

The result of estimation is emitted as JSON format.

Check performance for the number of haplotypes K = 1..10

$ parallel -k 'python cnvalloc estimate_alleles -K {} examples/hist.txt | python cnvalloc evaluate_alleles -r /dev/stdin -a examples/haps.txt' ::: {1..10}
  • examples/haps.txt : True haplotypes for evaluation

    • column 1: Sample id
    • column 2: Allele id
    • column n (n>2): The base of the allele at the n-2 th variant site
  • examples/hist.txt : Observed bases of sequence data at variant sites

    • column 1: Sample id
    • column 2: One of the 'ATCG' bases
    • column n(n>2) : The number of observed bases at n-2 th variant site

Workflow for BAM files

  1. Make pileup histograms from BAM files:

    $ python cnvalloc bam2hist {BAM file n} -r chr1:10000000-100010000 > pileups.n.txt
  2. Import the pileup files to a database such as sqlite3

  3. Select the variable sites to use with some criteria (e.g. minor_count >= 15 for any of the samples)

  4. Create an input file for the cnvalloc estimate_alleles by querying to the database

Future work

  • Consider variant types other than mutations
  • Write tools for step 2-4 of the above workflow

References

T. Mimori et al, 2015 BMC Bioinformatics "Estimating copy numbers of alleles from population-scale high-throughput sequencing data"

http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-16-S1-S4