Skip to content

pierpal/DoRIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DoRIS

Optimizes the parameters of a demographic model based on identical-by-descent (IBD) segment sharing. See references for details.

NOTE: code is gradware. I hope to post a new version soon. Get in touch if you'd like to use a preliminary version.

Usage

You can perform demographic inference by either minimizing the squared error of Equation 22 in 10.1016/j.ajhg.2012.08.030, or by maximizing the likelihood of observing a certain number of segments in the samples (Poisson with mean from Equation 14 in 10.1016/j.ajhg.2012.08.030).

One of the following is required

--AverageSharing (or -A) -> file with observed sharing values (for 1 population models)
--AverageSharingPop1 (or -AP1) --AverageSharingAcross (or -AAc) --AverageSharingPop2 (or -AP2) -> files with observed sharing values (for 2 population models)
--SegmentCounts (or -S) -> file with observed counts of segments in each length range (for 1 population models)
--SegmentCountsPop1 (or -SP1) --SegmentCountsAcross (or -AAc) --SegmentCountsPop2 (or -SP2) -> file with observed counts of segments in each length range (for 2 population models)

A demographic model and the space of parameter values to be explored in a grid are required (note: additional models can be implemented in the code). The models are described in the references.

--DemographicModel (or -D) -> Expansion/DoubleExpansion/FounderExpansion/ExpansionFounderExpansion/SplitExpConstAsymMig/SplitExpConstSymMig
--Grid (or -G) -> file with grid specification

Additional parameters (optional or required by one of the above parameters)

--ChromosomeLength (or -C) -> length of the analyzed region (this option is required when using segment counts)
--Pairs (or -P) -> pairs of chromosomes analyzed (e.g. (2*n choose 2) for n diploid samples; this option is required when using segment counts)
--BinsPerCm (or -B) -> number of bins used per cM in calculations (default is 100)
--Verbose (or -V) -> will print details

Examples

one population, expansion, minimizing root mean square error using fraction of genome shared per length bin

java -jar Doris.jar --DemographicModel Expansion --Grid EXAMPLES/grid.10K.40.130K.txt --AverageSharing EXAMPLES/Exp.10K.40.130K.sharingDist.txt 

two populations, split, expansion and migration, minimizing root mean square error using fraction of genome shared per length bin - decrease grid granularity to speed up

java -jar Doris.jar --DemographicModel SplitExpConstAsymMig --Grid EXAMPLES/grid.TwoPop.55K.130K.0.0167.0.0334.txt --AverageSharingPop1 EXAMPLES/TwoPop.55K.130K.0.0167.0.0334.sharingDist.pop1.txt --AverageSharingPop2 EXAMPLES/TwoPop.55K.130K.0.0167.0.0334.sharingDist.pop2.txt --AverageSharingAcross EXAMPLES/TwoPop.55K.130K.0.0167.0.0334.sharingDist.across.txt

one population, contraction, maximum likelihood using segment counts per length bin

java -jar Doris.jar --DemographicModel Expansion --Grid EXAMPLES/grid.10K.40.4K.txt --ChromosomeLength 276.289 --Pairs 499500 --SegmentCounts EXAMPLES/Exp.10K.40.4K.counts.txt

File formats

Individuals are always assumed to be haploid. An inferred population size of 20,000 would correspond to a population of 10,000 diploid individuals. The same holds for the number of samples (also see the getHistograms.sh script on this).

The counts files have format

FramLength	ToLength	NumberOfSegments

One bin per line.

The average sharing files have format

FromLength	ToLength	AverageFractionOfGenomeShared

One bin per line.

To compute these histograms you can run the included script "getHistograms.sh", after modifying the parameters as indicated. Remember to change the "COL" field in the script to match the column where the length of each IBD segment is reported, if needed (e.g. this should be 11 for GERMLINE output http://www.cs.columbia.edu/~gusev/germline/). The default lenfth intervals of 1 cM may be changed.

The grid files have format

ModelParameter  fromValue   interval    toValue

One parameters per line. The demographic models are depicted in the references. The models have the following parameters (also see error messages if any)

Expansion   Current, Generation, Ancestral.
DoubleExpansion   Current, Generation1, Ancestral1, Generation2, Ancestral2.
FounderExpansion    Current, Generation, Ancestral1, Ancestral2.
ExpansionFounderExpansion   Current, Generation, Ancestral1, Ancestral2, Generation2, Ancestral3.
SplitExpConstAsymMig    pop1current, pop1ancestral, pop2current, pop2ancestral, generation, ancestraltot, m12, m21.
SplitExpConstSymMig pop1current, pop1ancestral, pop2current, pop2ancestral, generation, ancestraltot, m.

Dependencies

uses the Apache Math library (http://commons.apache.org/proper/commons-math/) to compute some probabilities.

Contact

ppalama AT hsph DOT harvard DOTAGAIN edu

References

this tool was developed for

  • P. F. Palamara, T. Lencz, A. Darvasi, I. Pe'er. "Length distributions of identity by descent reveal fine-scale demographic history". The American Journal of Human Genetics, 2012.
  • P. F. Palamara, I. Pe'er. "Inference of historical migration rates via haplotype sharing". Bioinformatics, 2013.

About

Demographic reconstruction via IBD sharing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published