Skip to content

masadler/PGxEHR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pharmacogenetic analysis of EHRs

Welcome to the Github accompanying the manuscript Leveraging large-scale biobank EHRs to enhance pharmacogenetics of cardiometabolic disease medications which analyses the genetics of drug response for biomarker-medication pairs in the UK Biobank (UKBB).

Citation

If you use scripts from this Github please consider citing the following preprint:

Sadler MC, Apostolov A, Cevallos C, Ribeiro DM, Altman RB, Kutalik Z. Leveraging large-scale biobank EHRs to enhance pharmacogenetics of cardiometabolic disease medications. medRxiv 2024.04.06.24305415

Usage

This Github contains the workflow pipeline that was used to extract medication records and biomarker measures from electronic health records (EHRs) of the UKBB, define pharmacogenetic (PGx) phenotypes and run PGx-GWAS analyses. Code to run whole exome burden tests on the UK Biobank DNAnexus Research Analysis Platform is also provided.

The workflow is organized as a snakemake pipeline. This allows for a reproducible analysis and parallel computations - this is especially useful for parallel GWAS compuations. However, individual scripts can also be run without the snakemake workflow manager by replacing snakemake variables by hard-coded input and output paths.

Input data

Individual-level data from the UK Biobank is needed with access to the primary care records from the following files:

  • GP prescription records (datafield #42039, gp_scripts.txt)
  • GP clinical event records (datafield #42040, gp_clinical.txt)
  • From the all_lkps_maps_v3.xlsx downloadable from Resource 592, the sheet read_v2_drugs_lkp called herein: all_lkps_maps_v3_read_v2_drugs.txt
  • From the all_lkps_maps_v3.xlsx downloadable from Resource 592, the sheet bnf_lkp called herein: all_lkps_maps_v3_bnf_lkp.txt

Drug response phenotypes

Drug response phenotypes were derived by:

  1. extracting drug prescriptions of the broad medication class and medication of interest (e.g. lipid-lowering medications and statins)

  2. extracting biomarker measures from the EHRs and combining them with biomarker measures from the UKBB assessment visits

  3. combining these two datasets to extract baseline and post-treatment measures using temporal medication and biomarker information. In addition to the presence of baseline and post-treatment measures in the defined time window relative to medication start, individuals had to pass several other QC steps to be included in a PGx cohort such as consistent drug prescriptions with no change in the drug regimen.

Genetic analyses

Genome-wide association analyses (GWAS) were run using regenie in a 2-step procedure. The first step includes a whole genome regression performed on genotyped SNPs passing several QC filters (scripts/GWAS/QC_SNPs.sh). The second step includes the testing of genotyped/imputed SNPs in a LOCO scheme.

Rare variant burden tests on whole exome sequencing data was performed on the UK Biobank DNAnexus research analysis platform. Corresponding scripts are in the scripts/Exome folder.

Software requirements

This workflow has been tested with snakemake v7.30.1, python v3.9.13 and R v4.2.1. Versions of other softwares are mentioned in the Code availability section of the preprint.

License

MIT License

About

Workflow to analyze the genetics of drug response phenotypes derived from EHRs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published