Skip to content
forked from ANGSD/ngsPSMC

Fixing memory leaks and improving maths

License

Notifications You must be signed in to change notification settings

igmirnyy/ngsPSMC

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ngsPSMC

Build Status

This program implements the PSMC for genotype likelihoods. The program is under development. Consider current version a beta version.

Installation

Dependencies: htslib.

git clone https://github.com/SAMtools/htslib
git clone https://github.com/ANGSD/ngsPSMC
cd ngsPSMC;make HTSSRC=../htslib/;cd ..

Generate Input Files from BAM/CRAM

Before running ngsPSMC you will need to generate the input files. This is done using ANGSD with the command below:

./angsd/angsd -i input.bam -dopsmc 1  -out psmcinput -gl 1 -minq 20 -minmapq 30

Here -gl 1 indicates the SAMtools genotype likelihood model.

Generate Input Files from simulated data

For validating the program we simulated data like below:

msHOT 2 1000 -t 8196 -r 1355 3000000 -l -eN 0.01 0.05 -eN 0.0375 0.5 -eN 1.25 1.0 -T >msraw
./angsd/misc/msHOT2glf -in msraw -out sim.d8 -win 100 -psmcfa 1 -psmc2 1 -do_seq_glf 0 -err 0.005 -depth 8

Run examples

./ngsPSMC input.psmc.idx -p "1*4+25*2+1*4+1*6" -dospline 0 -nthreads 8 -nIter 20 -init 1  -theta 0.000233095 -rho 0.005357

Where input are the output from either sim.d8.psmc.idx or psmcinput.psmc.idx

You can also specify the output from the original PSMC as input:

./ngsPSMC input.psmc.idx -infile output.from.psmc -dospline 0 -nthreads 8 -nIter 20

There are two options: to initialise all the parameters similar to the original PSMC, or to read them from another PSMC file with -infile option (e.g. if you already have another high-coverage sample).

Notice that theta (mutation rate) and maxT (last non-inifinite time point in the time discritisation) are not optimised in our version.

If you do not use -infile option, it is highly recommended to specify theta if you run ngsPSMC with an organism other than humans.

One can calculate it by the simple formula theta=2N_0mu*binsize (N_0 is a reference effective population size, e.g. 10000, mu is the mutation rate per basepair per generation, binsize is the window length.

If -infile is used, there is a possibility to choose which round of optimisation is used for initialisation. To choose the last round use -rd -1.

You can specify the intitial population size with -init popsize

Details and potential issues (BUGS)

  • Optimization of rho is currently disabled.
  • Be carefull with the binsize (default 100) when comparing results between different winsizes. (binsize part of theta)

All (relevant) options

-dolinear [0,1] linear is much faster if there are many states

-tole tolerance for breaking

-winSize window size

-infile filename for PSMC output

-RD which round of optimization to use if using -infile

-nThreads number of threads, program will not be able to utilize more threads than number of chromosomes

-p PSMC style pattern

-dospline [0,1] use splines instead of 'pattern'

-ms debug pars

-nSites debug pars

-seed seed for random number generator

-init initial popsize

-rho (initial) rho

-max_t maximum time, coalescence times will be set similar to PSMC

-nChr debug pars

-nSites debug pars

-r limit analysis to specified chromosome

About

Fixing memory leaks and improving maths

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 90.1%
  • R 4.9%
  • Python 2.4%
  • C 1.5%
  • Other 1.1%