Pharmacogenetic analysis of EHRs

Welcome to the Github accompanying the manuscript Leveraging large-scale biobank EHRs to enhance pharmacogenetics of cardiometabolic disease medications which analyses the genetics of drug response for biomarker-medication pairs in the UK Biobank (UKBB).

Citation

If you use scripts from this Github please consider citing the following preprint:

Sadler MC, Apostolov A, Cevallos C, Ribeiro DM, Altman RB, Kutalik Z. Leveraging large-scale biobank EHRs to enhance pharmacogenetics of cardiometabolic disease medications. medRxiv 2024.04.06.24305415

Usage

This Github contains the workflow pipeline that was used to extract medication records and biomarker measures from electronic health records (EHRs) of the UKBB, define pharmacogenetic (PGx) phenotypes and run PGx-GWAS analyses. Code to run whole exome burden tests on the UK Biobank DNAnexus Research Analysis Platform is also provided.

The workflow is organized as a snakemake pipeline. This allows for a reproducible analysis and parallel computations - this is especially useful for parallel GWAS compuations. However, individual scripts can also be run without the snakemake workflow manager by replacing snakemake variables by hard-coded input and output paths.

Input data

Individual-level data from the UK Biobank is needed with access to the primary care records from the following files:

GP prescription records (datafield #42039, gp_scripts.txt)
GP clinical event records (datafield #42040, gp_clinical.txt)
From the all_lkps_maps_v3.xlsx downloadable from Resource 592, the sheet read_v2_drugs_lkp called herein: all_lkps_maps_v3_read_v2_drugs.txt
From the all_lkps_maps_v3.xlsx downloadable from Resource 592, the sheet bnf_lkp called herein: all_lkps_maps_v3_bnf_lkp.txt

Drug response phenotypes

Drug response phenotypes were derived by:

extracting drug prescriptions of the broad medication class and medication of interest (e.g. lipid-lowering medications and statins)
extracting biomarker measures from the EHRs and combining them with biomarker measures from the UKBB assessment visits
combining these two datasets to extract baseline and post-treatment measures using temporal medication and biomarker information. In addition to the presence of baseline and post-treatment measures in the defined time window relative to medication start, individuals had to pass several other QC steps to be included in a PGx cohort such as consistent drug prescriptions with no change in the drug regimen.

Genetic analyses

Genome-wide association analyses (GWAS) were run using regenie in a 2-step procedure. The first step includes a whole genome regression performed on genotyped SNPs passing several QC filters (scripts/GWAS/QC_SNPs.sh). The second step includes the testing of genotyped/imputed SNPs in a LOCO scheme.

Rare variant burden tests on whole exome sequencing data was performed on the UK Biobank DNAnexus research analysis platform. Corresponding scripts are in the scripts/Exome folder.

Software requirements

This workflow has been tested with snakemake v7.30.1, python v3.9.13 and R v4.2.1. Versions of other softwares are mentioned in the Code availability section of the preprint.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
scripts		scripts
README.md		README.md
Snakefile		Snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

scripts

scripts

README.md

README.md

Snakefile

Snakefile

Repository files navigation

Pharmacogenetic analysis of EHRs

Citation

Usage

Input data

Drug response phenotypes

Genetic analyses

Software requirements

License

About

Releases

Packages

Languages

masadler/PGxEHR

Folders and files

Latest commit

History

Repository files navigation

Pharmacogenetic analysis of EHRs

Citation

Usage

Input data

Drug response phenotypes

Genetic analyses

Software requirements

License

About

Resources

Stars

Watchers

Forks

Languages