Skip to content

v0.1.0

Choose a tag to compare

@nebfield nebfield released this 26 Jul 14:44
· 9 commits to master since this release
4901a20

The first release of our fork, which is integrated with pgsc_calc.

Our main aim for the fork is to improve functionality when processing data at scale, e.g. on 500,000 genomes at UK Biobank, and perform QC to make sure that the variants are identical (and oriented correctly) between the reference dataset and the study population you are projecting.

Improvements:

  • Variant QC: added checks and minor fixes for variant matching, orientation, and sort order of ref/study variants to ensure results are consistent between the reference and study datasets
  • refactor original scripts into python package
  • added end to end test with pytest
  • support batch-processing study samples without splitting the original dataset into multiple file (useful to parallelise large datasets)

Fixes:

  • make output PCs have consistent precision
  • deduplicate outputs when projecting study samples after the PCA space has been derived