Skip to content

monarch-initiative/phenoCompare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Codacy Badge

phenoCompare

Phenotype Compare

We start with a set of case records including HPO terms, and a list of GPI pathway genes labeled early or late depending on which part of the pathway is affected by that gene. Patients are separated into corresponding early and late groups according to which gene is responsible for disease in each case. For each HPO term appearing in any patient record, the software counts the number of early and late patients annotated with that term. These counts are propagated upward in the HPO hierarchy so that a patient annotated with term T is included in the count for any term that subsumes T (is an ancestor of T in the ontology). We calculate the chi-squared statistic to identify HPO terms whose prevalence is significantly different between the early and late groups of patients. To be considered for hypothesis testing, the HPO term must reach an expected value of at least 5 in each cell of its contingency table (early/late patient group vs. has/does not have HPO term). We apply a Bonferroni correction for multiple comparisons to achieve α ≤ 0.05. This analysis covers not only those terms explicitly referenced in case records, but also their supertypes referenced implicitly through the structure of the ontology.

phenoCompare has four command line arguments:

-o     directory containing hp.obo file
-g     txt file containing lists of early, late GPI pathway genes
-p     tsv file of patient records
-r     directory for result files

example usage:

java -jar target/phenoCompare-1.0.0.jar \
> -o src/main/resources -g src/main/resources/gpiGenesTwoGroups.txt \
> -p src/main/resources/gpi_variants2018July06.tsv -r resultsTodaysDate