Skip to content

GADO Command line

Patrick Deelen edited this page Jun 12, 2024 · 22 revisions

The command line version of GADO can be used to prioritize genes based on the HPO terms of a patient. It yields the same results as our online version available at: http://genenetwork.nl/gado

When using our method please cite: https://www.nature.com/articles/s41467-019-10649-4/

Downloads

GADO Command line

Datasets

We recommend using this version:

Version 05-07-2021

Other versions are available here: https://github.com/molgenis/systemsgenetics/wiki/GADO-Command-line-datasets

Workflow

Input data

The input file with HPO terms per case is tab separated. The first column is the sample ID and the subsequent columns contain the HPO terms per case. The number of columns can be different per case if the number of supplied HPO terms is different.

Example:

case1	HP:0001644	HP:0004764				
case2	HP:0001644					
case3	HP:0001644	HP:0001882	HP:0002037	HP:0031123	HP:0001987	

Process input HPO terms

First the HPO terms per case are checked. For terms for which the GeneNetwork predictions are not reliable alternative parent terms are suggested. For details of this process please see: https://www.biorxiv.org/content/10.1101/375766v4

This step will create a file with 6 columns:

Column Description
Sample The case ID, if there are multiple HPO terms they are now distributed over multiple lines
SelectedHpo The HPO term to be used in the prioritization
SelectedHpoDescription Description of the HPO term
OriginalHpo If an alternative HPO term was selected, then here the original term is listed
OriginalHpoDescription Description of the original HPO term
ExcludeFromPrioritisation This column is by default empty. It can be manually set to non empty to ignore a term.

Example command:

java -jar GADO.jar \
 --mode PROCESS \
 --output hpoProcessed.txt \
 --caseHpo hpo.txt \
 --hpoOntology hp.obo \
 --hpoPredictionsInfo hpo_predictions_info_01_02_2018.txt

Prioritize genes

The prioritization step uses the output file of the process HPO terms step. It will rank all genes in GeneNetwork using the selected HPO terms based on the prioritization Z-scores of these HPO terms. It is generally save to simply use all the suggested alternatives and in these cases this second step can be run directly on the output of the process step.

Example command:

java -jar GADO.jar \
 --mode PRIORITIZE \
 --output ./result/ \
 --caseHpoProcessed hpoProcessed.txt \
 --genes hpo_predictions_genes_01_02_2018.txt \
 --hpoPredictions hpo_predictions_sigOnly_spiked_01_02_2018

Output file

The final output is a single file per case with the ranking of all the genes in the prediction matrix. These results can be used to rank the genes that harbor candidate variants of a cases.

Column Description
Ensg Ensembl gene ID
Hgnc Gene symbol
Rank The overall rank of the gene
Zscore The combined prioritization Z-score over the supplied HPO terms for this case. This score is used for the ranking
HP:###### 1 or multiple columns with the prioritization Z-scores for each of the HPO terms supplied for this case

Command line arguments overview

Argument Short Description
--mode -m One of the following modes:
* PROCESS - Process the HPO terms of cases. Suggests parent terms if needed.
* PRIORITIZE - Uses output of PROCESS to prioritize genes
--caseHpo -ch HPO terms per case. Single line per case. First col is case ID, followed by tab separated HPO terms
--caseHpoProcessed -chp The output of mode PROCESS. Type x in the last column to exclude a term.
--output -o The output path. For mode PRIORITIZE supply a folder for the output files per case.
--genes -g File with gene info. col1: geneName (ensg) col2: HGNC symbol
--hpoOntology -ho HPO ontology file, .obo file
--hpoPredictions -hp HPO prediction matrix in binary format (without .dat)
--hpoPredictionsInfo -hpi HPO predictions info

Source code

The source code of GADO commandline can be found here: https://github.com/molgenis/systemsgenetics/tree/master/GadoCommandline

GeneNetwork code

The GeneNetwork code to make a prediction matrix can be found here: https://github.com/molgenis/systemsgenetics/tree/master/GeneNetworkBackend

Clone this wiki locally