Skip to content

Latest commit

 

History

History
56 lines (48 loc) · 1.67 KB

index.md

File metadata and controls

56 lines (48 loc) · 1.67 KB

SampleAncestry documentation

The ancestry estimation is based on correlating the sample variants with population-specific SNPs.
For each population (AFR, EUR, SAS, EAS) the 1000 most informative exonic SNPs were selected for this purpose.
A benchmark on the 1000 Genomes variant data assigned 99.81% of the samples to the correct population (2153 of 2157).

Due to different similarity between popultations, the expected scores differ depending on the ancestry of the sample of interest.
This plot shows the score distribution on the 1000 Genomes data:

sample ancestry score distribution

This table shows the score median and median average deviation determined from the 1000 Genomes data and used internally to assign a population:

population AFR median / mad EUR median / mad SAS median / mad EAS median / mad
AFR 0.5002 / 0.0291 0.0553 / 0.0280 0.1061 / 0.0267 0.0895 / 0.0274
EUR 0.0727 / 0.0271 0.3251 / 0.0252 0.1922 / 0.0249 0.0603 / 0.0264
SAS 0.0698 / 0.0264 0.1574 / 0.0295 0.3395 / 0.0291 0.1693 / 0.0288
EAS 0.08415 / 0.0275 0.06725 / 0.0269 0.21495 / 0.0228 0.47035 / 0.0242

Help and ChangeLog

The SampleAncestry command-line help and changelog can be found here.

back to ngs-bits