Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Code to do data processing and create figures for "The geography of recent genetic ancestry across Europe", http://arxiv.org/abs/1207.3815
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
IBS_vs_dist.R
LICENSE
PCA_vs_IBD.R
README
anonymize.R
beagle-false-positives.R switched GPL to CC0 to be able to upload to dryad
beagle-true-positives.R
blocks-eda.R
clean-sample-info.R
dbgap_In_Europe.out
dbgap_samples.R
endpoints-and-overlap.R
example-Europeans.out
example-POPRES_chr22.beagle.gz
example-europeans.imiss
example-kin.genome.gz
find_close_rels_IBD.R
fit-error-model.R
genetic.maps.R
hapmap-phased-to-beagle.py
ibd-blocklens.csv
ibd-blocks-fns.R
ibd-pca.R
ibd-pop-info.csv
insert-ibd-segments.py
inversion-writeup-plots.R
laplace-inversion-fns.R
make-hapmap-plus-popres.sh
make-lentab.R
makefile
mixup-by-region.sh
mixup-genomes.py
parse-sims-fns.R
plus-process-fibd.py
recode-hapmap-legend-for-popres.R
run-beagle.sh
sensitivity.R
sfig-8-sharing-rates.svg
writeup-plots.R

README

This is the collection of codes used in analysis of european POPRES data
(see Nelson et al, http://www.ncbi.nlm.nih.gov/pubmed/18760391; 
obtain access to data at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000145.v1.p1 .)
for the paper "The geography of recent genetic ancestry across Europe", 
by Peter Ralph and Graham Coop, 
published May 7, 2013, in PLoS Biology, at
http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001555

To see what makes what, see the makefile.

Most of the files are pretty clean, but a few have some clutter.

None will actually do anything without obtaining the POPRES data and running BEAGLE http://faculty.washington.edu/browning/beagle/beagle.html on them.

Please use these if they are useful, and credit us if appropriate.


--------------

If you are looking to use the inference method, look for the functions sinv() and squoosh().

Also, linv() gives the least-squares approximation, useful for finding a good starting point.

For instance, see the block in inversion-writeup-plots.R that sets things up to infer histories for all the pairs of populations:

# Set-up: the operator L and the false positive rate estimates:
gens <- cumsum( rep(1:36, each=10) )
L <- error.disc.trans( lenbins=lenbins, gens=gens, chrlens=.chrlens[setdiff(seq_along(.chrlens),omitchroms)] )) # this takes a while!! do it once and save.
fp <- disc.fp(lenbins, chrlens=.chrlens)
fp <- runmed(fp,k=15)
# Set-up for intial least-squares estimate:
S <- as.vector( smooth( hist( blocks$maplen, breaks=c(lenbins,100), plot=FALSE )$counts ) )
S <- S/mean(S)
S[lenbins>20] <- exp( predict( lm( log(S) ~ lenbins + I(lenbins^2), subset=(lenbins>20) & (S>0) ), newdata=list(lenbins=lenbins[lenbins>20]) ) )
est.Sigma <- function ( np, X, alpha=100 ) {
    # estimate Sigma by combining the population average with the smoothed, observed data
    # where Sigma is the covariance matrix of X/np
    smX <- as.vector( smooth(X) )
    ppp <- (alpha*np/(2543640+alpha*np))  # 2543640 = total # pairs
    Sigma <- ppp*smX + (1-ppp)*sum(smX)*S/sum(S)
    return( Sigma/np^2 )
}
Ksvd <- svd( 1/sqrt(as.vector(S)) * L )  # pre-compute SVD of L
total.npairs <- choose( length(unique(blocks$id1)), 2 )
linverse <- linv( X=lendist/total.npairs, S=as.vector(S), Sigma=est.Sigma(np=total.npairs,X=lendist), Ksvd=Ksvd, maxk=30, fp=fp )
sminverse <- smoothed.nonneg( lS=linverse, Ksvd=Ksvd, bestk=15, lam=1 )

# Here is where we get the actual estimate:
ans <- sinv( lendist, sminverse, L, fp, npairs=total.npairs, lam=0, gamma=1e6 )

# squoosh iterates sinv, looking for the smoothest solution that drops no more than 2 units of log-likelihood
smooth = squoosh( lendist, xinit=ans, L, fp, npairs=ans$npairs, minlen=minlen, gamma=100, lam=0, fitfn="loglik", relthresh=2/ans$npairs, twid=.05 )


--------------

Notes:

* The .gmap files provide a mapping from the endpoint indices that BEAGLE uses to genetic distances (e.g. in centiMorgans).  Important: BEAGLE uses 0-based indices.

* BEAGLE 4.0 is out, and improved over the version we used (3.3).
Something went wrong with that request. Please try again.