vibe-suppl

This repo contains supplemental files regarding the Java application found here. Note that these are in no way needed to use the vibe tool, but were used to generate additional information (such as benchmarking). They were created with the assumption that they are used exactly in the way they are meant to be used, so while certain checks/validations might be present, using these scripts in the wrong way might result in weird behavior.

Paper

Please refer to the README.md at https://zenodo.org/record/3662470 for the exact commits used for the benchmarking. There, all required files for PaperPlots.R can be found as well.

Benchmarking

Data

There are several files used among these scripts. These include:

benchmark_data.tsv
- A dataset with the first column being an ID and the fourth column 1 or more phenotypes separated by a comma (the phenotype names should exist within the Human Phenotype Ontology) .
hp.obo
- The Human Phenotype Ontology used for combining/converting phenotype names with their HPO ID. Note that the benchmark_data.tsv was made compatible for release 2018-03-08 specifically.
hgnc_complete_set.txt
- The HUGO Gene Nomenclature Committee file containing information about genes (primarily used to generate a list containing all genes).
benchmark_file_conversion_data.tsv
- A file generated through genenames.org that contains HGNC gene symbols with their previous symbols and their NCBI gene IDs.

Running the benchmarks

Amelie

Run benchmark:

python3 AmelieBenchmarkRunner.py hp.obo hgnc_complete_set.txt benchmark_data.tsv amelie_output/

Process benchmark output:

python3 AmelieBenchmarkFileGenerator.py amelie_output/ amelie_results.tsv

Convert the HGNC gene symbols to NCBI gene IDS:

python3 BenchmarkFileGeneSymbolToIdConverter.py amelie_results.tsv benchmark_file_conversion_data.tsv 1> amelie.log 2> amelie.err

Exomiser

IMPORTANT: A custom .jar file supplied by the Exomiser team was supplied to run this benchmark without requiring a .vcf file. Exomiser has not yet made a public release of this yet. This custom .jar however is based on the exomiser-rest-prioritiser module of the Exomiser open-source code (release 12.1.0).

hiPHIVE

Run benchmark:

python3 ExomiserBenchmarkRunner.py hp.obo benchmark_data.tsv hiphive hiphive_output/

Process benchmark output:

python3 ExomiserBenchmarkFileGenerator.py hiphive_output/ hiphive_results.tsv

Convert the HGNC gene symbols to NCBI gene IDS:

python3 BenchmarkFileGeneSymbolToIdConverter.py hiphive_results.tsv benchmark_file_conversion_data.tsv 1> hiphive.log 2> hiphive.err

PhenIX

Run benchmark:

python3 ExomiserBenchmarkRunner.py hp.obo benchmark_data.tsv phenix phenix_output/

Process benchmark output:

python3 ExomiserBenchmarkFileGenerator.py phenix_output/ phenix_results.tsv

Convert the HGNC gene symbols to NCBI gene IDS:

python3 BenchmarkFileGeneSymbolToIdConverter.py phenix_results.tsv benchmark_file_conversion_data.tsv 1> phenix.log 2> phenix.err

GADO

We used the stand-alone commandline version GADO (v 1.0.1), available at: https://github.com/molgenis/systemsgenetics/wiki/GADO-Command-line. We accepted all automatically suggested alternative HPO terms in cases that the supplied HPO term could not be used. We have used the prediction matrix hpo_predictions_sigOnly_spiked_01_02_2018. The output was also converted to NCBI gene IDs through the following:

python3 BenchmarkFileGeneSymbolToIdConverter.py gado_results.tsv benchmark_file_conversion_data.tsv 1> gado.log 2> gado.err

Phenomizer

Install query_phenomizer (if not already installed):

git clone https://github.com/svandenhoek/query_phenomizer.git
cd query_phenomizer
pip install --editable .

Run benchmark:

python3 PhenomizerBenchmarkRunner.py username hp.obo benchmark_data.tsv phenomizer_output/

Process benchmark output:

python3 PhenomizerBenchmarkFileGenerator.py phenomizer_output/ phenomizer_results.tsv

Convert the HGNC gene symbols to NCBI gene IDS:

python3 BenchmarkFileGeneSymbolToIdConverter.py phenomizer_results.tsv benchmark_file_conversion_data.tsv 1> phenomizer.log 2> phenimozer.err

Phenotips

IMPORTANT: As of January 2020, Phenotips does not offer a stand-alone downloadable solution anymore and requires a paid cloud subscription to be used (source). While the GitHub repo is currently still online, it seems uncertain whether it will still be updated and the easy-to-use .dmg as offered on the old website is not available anymore. Therefore, this benchmark is deemed obsolete.

PubCaseFinder

Run benchmark:

python3 PubCaseFinderBenchmarkRunner.py hp.obo benchmark_data.tsv pubcasefinder_output/

Process benchmark output:

python3 PubCaseFinderBenchmarkFileGenerator.py pubcasefinder_output/ pubcasefinder_results.tsv

Convert the HGNC gene symbols to NCBI gene IDS:

python3 BenchmarkFileGeneSymbolToIdConverter.py amelie_results.tsv benchmark_file_conversion_data.tsv 1> amelie.log 2> amelie.err

Vibe

Follow steps for running vibe.

Generate the bash scripts:

python3 VibeBenchmarkParallelBashScriptsGenerator.py hp.obo benchmark_data.tsv ./ <MAX RUNS PER BASH SCRIPT>

Move the vibe-with-dependencies.jar, TDB and the bash scripts generated in the previous step (vibe_benchmark_x.sh where "x" indicates a number) to a folder where the benchmark will be done.

Rename/copy the TDB so that there are an equal amount of TDBs as there are benchmark bash scripts:

# Rename the first TDB to be coherent with parallel bash script requirements.
mv TDB/ TDB0/
# Create copies of TDB equal to the number of created bash scripts (might require less/more copies than below).
cp TDB0/ TDB1/
cp TDB0/ TDB2/

Make final preparations for parallel benchmarking:
```
mkdir results
mkdir out
mkdir err
```

Run benchmark:

# Run all generated bash scripts (there might be less/more scripts than in this example).
sh vibe_benchmark_0.sh
sh vibe_benchmark_1.sh
sh vibe_benchmark_2.sh

Process benchmark output:

python3 VibeBenchmarkFileGenerator.py results/ vibe_results.tsv none

LIRICAL

Download/prepare lirical.
Prepare biobesu.

Run biobesu:

biobesu hpo_generank lirical --jar /path/to/LIRICAL.jar --hpo /path/to/hp_2018-03-08.obo --input /path/to/benchmark_data.tsv --output /path/to/output/folder/ --lirical_data /path/to/lirical/data/folder/ --runner_data /path/to/folder/to/store/temporary/data

From the output folder, retrieve lirical_omim_gene_id.tsv, rename it to lirical.tsv and replace the header line with lovd\tsuggested_genes (where \t indicates a tab).

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
benchmarking		benchmarking
benchmarking_results_processing		benchmarking_results_processing
data-exploration		data-exploration
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarking

benchmarking

benchmarking_results_processing

benchmarking_results_processing

data-exploration

data-exploration

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

vibe-suppl

Paper

Benchmarking

Data

Running the benchmarks

Amelie

Exomiser

hiPHIVE

PhenIX

GADO

Phenomizer

Phenotips

PubCaseFinder

Vibe

LIRICAL

About

Releases 1

Packages

Contributors 5

Languages

License

molgenis/vibe-suppl

Folders and files

Latest commit

History

Repository files navigation

vibe-suppl

Paper

Benchmarking

Data

Running the benchmarks

Amelie

Exomiser

hiPHIVE

PhenIX

GADO

Phenomizer

Phenotips

PubCaseFinder

Vibe

LIRICAL

About

Resources

License

Stars

Watchers

Forks

Languages