Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes to rebuild PrimeKG and update the knowledge graph to include database releases up to July 2023. Note that 17 scripts
datasets/processing_scripts/
are re-run or updated to build a new version of PrimeKG, whiledatasets/feature_construction/
scripts may be out-of-date. Re-run or updated primary data sources include Bgee, Comparative Toxicogenomics Database, DisGeNET, DrugBank, DrugCentral, NCBI Gene, Gene Ontology, Human Phenotype Ontology, MONDO, Reactome, SIDER, UBERON, and UMLS.For more information, see
primary_data_resources.sh
. Changes include the following:General
Created script to automatically create directory structure, pull data, and run all necessary processing and feature extraction steps.
vocab/gene_names.csv
andvocab/gene_map.csv
.Bgee
bgee.py
.Comparative Toxicogenomics Database
DisGeNET
DrugBank
parsexml_drugbank.py
. Output to new/parsed
subdirectory. Removed extraneous lines inParsed_feature.ipynb
.drugbank_drug_drug.py
anddrugbank_drug_protein.py
.parsexml_drugbank.py
andParsed_feature.ipynb
may need updates.DrugCentral
drugcentral_queries.txt
to work on O2, the Harvard Medical School high-performance computing cluster.drugcentral_feature.Rmd
may need updates.NCBI Gene
Gene Ontology
-L
flag to follow redirects. No other changes needed.Human Phenotype Ontology
-L
flag to follow redirects. No other changes needed tohpo.py
.hpoa.py
to replace old column names with new column names.MONDO
Reactome
SIDER
UBERON
UMLS
umls.ipynb
may need updates.