Code to do data processing and create figures for "The geography of recent genetic ancestry across Europe", http://arxiv.org/abs/1207.3815
R Python Shell
Switch branches/tags
Nothing to show
Permalink
Failed to load latest commit information.
IBS_vs_dist.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
LICENSE switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
PCA_vs_IBD.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
README some clues for actually finding the relevant code... May 26, 2014
anonymize.R added everything Feb 20, 2013
beagle-false-positives.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
beagle-true-positives.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
blocks-eda.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
clean-sample-info.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
dbgap_In_Europe.out added everything Feb 20, 2013
dbgap_samples.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
endpoints-and-overlap.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
example-Europeans.out added missing example input files Aug 22, 2013
example-POPRES_chr22.beagle.gz added missing example input files Aug 22, 2013
example-europeans.imiss added missing example input files Aug 22, 2013
example-kin.genome.gz added missing example input files Aug 22, 2013
find_close_rels_IBD.R added everything Feb 20, 2013
fit-error-model.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
genetic.maps.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
hapmap-phased-to-beagle.py switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
ibd-blocklens.csv added everything Feb 20, 2013
ibd-blocks-fns.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
ibd-pca.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
ibd-pop-info.csv added everything Feb 20, 2013
insert-ibd-segments.py switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
inversion-writeup-plots.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
laplace-inversion-fns.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
make-hapmap-plus-popres.sh switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
make-lentab.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
makefile added missing example input files Aug 22, 2013
mixup-by-region.sh switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
mixup-genomes.py switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
parse-sims-fns.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
plus-process-fibd.py added everything Feb 20, 2013
recode-hapmap-legend-for-popres.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
run-beagle.sh switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
sensitivity.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013
sfig-8-sharing-rates.svg added everything Feb 20, 2013
writeup-plots.R switched GPL to CC0 to be able to upload to dryad Aug 20, 2013

README

This is the collection of codes used in analysis of european POPRES data
(see Nelson et al, http://www.ncbi.nlm.nih.gov/pubmed/18760391; 
obtain access to data at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000145.v1.p1 .)
for the paper "The geography of recent genetic ancestry across Europe", 
by Peter Ralph and Graham Coop, 
published May 7, 2013, in PLoS Biology, at
http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001555

To see what makes what, see the makefile.

Most of the files are pretty clean, but a few have some clutter.

None will actually do anything without obtaining the POPRES data and running BEAGLE http://faculty.washington.edu/browning/beagle/beagle.html on them.

Please use these if they are useful, and credit us if appropriate.


--------------

If you are looking to use the inference method, look for the functions sinv() and squoosh().

Also, linv() gives the least-squares approximation, useful for finding a good starting point.

For instance, see the block in inversion-writeup-plots.R that sets things up to infer histories for all the pairs of populations:

# Set-up: the operator L and the false positive rate estimates:
gens <- cumsum( rep(1:36, each=10) )
L <- error.disc.trans( lenbins=lenbins, gens=gens, chrlens=.chrlens[setdiff(seq_along(.chrlens),omitchroms)] )) # this takes a while!! do it once and save.
fp <- disc.fp(lenbins, chrlens=.chrlens)
fp <- runmed(fp,k=15)
# Set-up for intial least-squares estimate:
S <- as.vector( smooth( hist( blocks$maplen, breaks=c(lenbins,100), plot=FALSE )$counts ) )
S <- S/mean(S)
S[lenbins>20] <- exp( predict( lm( log(S) ~ lenbins + I(lenbins^2), subset=(lenbins>20) & (S>0) ), newdata=list(lenbins=lenbins[lenbins>20]) ) )
est.Sigma <- function ( np, X, alpha=100 ) {
    # estimate Sigma by combining the population average with the smoothed, observed data
    # where Sigma is the covariance matrix of X/np
    smX <- as.vector( smooth(X) )
    ppp <- (alpha*np/(2543640+alpha*np))  # 2543640 = total # pairs
    Sigma <- ppp*smX + (1-ppp)*sum(smX)*S/sum(S)
    return( Sigma/np^2 )
}
Ksvd <- svd( 1/sqrt(as.vector(S)) * L )  # pre-compute SVD of L
total.npairs <- choose( length(unique(blocks$id1)), 2 )
linverse <- linv( X=lendist/total.npairs, S=as.vector(S), Sigma=est.Sigma(np=total.npairs,X=lendist), Ksvd=Ksvd, maxk=30, fp=fp )
sminverse <- smoothed.nonneg( lS=linverse, Ksvd=Ksvd, bestk=15, lam=1 )

# Here is where we get the actual estimate:
ans <- sinv( lendist, sminverse, L, fp, npairs=total.npairs, lam=0, gamma=1e6 )

# squoosh iterates sinv, looking for the smoothest solution that drops no more than 2 units of log-likelihood
smooth = squoosh( lendist, xinit=ans, L, fp, npairs=ans$npairs, minlen=minlen, gamma=100, lam=0, fitfn="loglik", relthresh=2/ans$npairs, twid=.05 )


--------------

Notes:

* The .gmap files provide a mapping from the endpoint indices that BEAGLE uses to genetic distances (e.g. in centiMorgans).  Important: BEAGLE uses 0-based indices.

* BEAGLE 4.0 is out, and improved over the version we used (3.3).