Skip to content

PGScatalog/pygscatalog

Repository files navigation

pygscatalog

CI Documentation Status codecov Ruff pre-commit

This repository contains Python applications and libraries for working with polygenic scores (PGS 🧬) and the PGS Catalog, an open database of polygenic scores and the relevant metadata required for accurate application and evaluation. It is based on a previous codebase of utilities (pgscatalog_utils) that has been converted to namespace packages for modularity and re-use.

User applications

These CLI applications are used internally by the PGS Catalog Calculator (pgsc_calc) workflow for calculating PGS and performing common adjustments for genetic ancestry.

If you want an automatic method of calculating PGS, including genetic ancestry similarity estimation and PGS normalisation, the workflow is the easiest method.

Application Description Install Link
pgscatalog-download Download scoring files from the PGS Catalog in specific genome builds pipx install pgscatalog-core README
pgscatalog-combine Combine multiple scoring files into a consistent structure pipx install pgscatalog-core README
pgscatalog-relabel Relabel values in a column based on values in a column in another file pipx install pgscatalog-core README
pgscatalog-match Match structured scoring file to variants in target genomes pipx install pgscatalog-match README
pgscatalog-matchmerge Merge variant match results, useful on larger datasets pipx install pgscatalog-match README
pgscatalog-aggregate Aggregate calculated PGS split across multiple files pipx install pgscatalog-calc README
pgscatalog-ancestry-adjust Adjust calculated PGS in the context of genetic ancestry pipx install pgscatalog-calc README

Developer libraries

If you write Python code to work with PGS, the underlying libraries for the apps are documented and available for re-use:

Library Description Link
pgscatalog-core Core classes and functions to work with PGS data API reference
pgscatalog-match Variant matching across scoring files and target genomes API reference
pgscatalog-calc Genetic ancestry similarity estimation and normalisation API reference

Documentation

Full documentation for the applications and libraries is available at https://pygscatalog.readthedocs.io/.

Credits & Licence

pygscatalog(aka pgscatalog_utils) is developed as part of the PGS Catalog project, a collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye, Samuel Lambert) and the European Bioinformatics Institute (Helen Parkinson, Laura Harris).

This package contains code libraries and apps for working with PGS Catalog data and calculating PGS within the PGS Catalog Calculator (pgsc_calc) workflow, and is based on an earlier codebase (pgscatalog_utils) with contributions and input from members of the PGS Catalog team (Samuel Lambert, Benjamin Wingfield, Aoife McMahon Laurent Gil) and Inouye lab (Rodrigo Canovas, Scott Ritchie, Jingqin Wu).

A manuscript describing this package and pgsc_calc pipeline is in preparation. In the meantime if you use the tool we ask you to cite the repo and the paper describing the PGS Catalog resource:

All of our code is open source and permissively licensed with Apache 2.

This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.