Skip to content

mepittman/GCOD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GCOD

Gene Combinations in Oligogenic Disease (GCOD) is an efficient, simulation-based method to detect gene sets that carry co-occurring damaging variants in disease probands at a higher rate than expected given parental genotypes. GCOD can be viewed as an extension of the Transmission Disequilibrium Test (TDT) in which the unit being tested is an observed co-occurrence of variants in a group of genes, rather than observed transmissions at a single genomic location or within a single gene. GCOD also differs from the TDT in its use of simulations to assess significance rather than a chi-squared statistic, enabling the detection of rare gene combinations with very few expected observations.

GCOD takes as input a tab-delimited file of gene variants with proband and parental genotypes, and returns a dataframe with candidate oligogenic combinations, each with a list of carrier IDs and the simulation-based significance of the gene set's co-occurrence within the dataset. It performs simulations for all digenic pairs for which at least two probands carry variant combinations not seen in a parent, as well as higher-order gene sets limited to the greatest set of common genes among distinct family combinations ("highest order"). By default GCOD performs these tests on the full list of variants provided, but can also accept a subset of genes to investigate.

Getting Started

Dependencies:

The necessary packages can be installed in a conda environment with the following commands:

$ conda env create --prefix gcod --file environment.yml
$ conda activate gcod

Example:

GCOD_walkthrough.ipynb demonstrates different use cases for GCOD analysis (pairs only, highest order, and gene subset analyses). This notebook also provides guidance for input file formatting.

GCOD can be launched via the script run_GCOD.py with the input and output file names, followed by the gene set order (either pairs or highest), and optionally the path to a list of genes to restrict the analysis:

python run_GCOD.py <input file path> <output file path> <order mode> <path to gene list file>

The example can be run on the command line as follows:

python run_GCOD.py ./data/input_variants.tsv ./output/example_results.tsv highest

About

Gene Combinations in Oligogenic Disease

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published