WORK IN PROGRESS
Fig. Strain subgraph
Documentation below is obsolete
Prototype to create a Neo4j Knowledge Graph for Coronavirus outbreaks.
The goal of this Knowledge Graph project is to link heterogeneous data from publically available resources relevant to the COVID-19 outbreak. By linking disparate datasets, new insights may be gained.
Currently, this project integrates data from:
- Daily reported COVID-19 cases from Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE
- Real-time tracking of COVID-19 evolution from Nextstrain
- Genome, gene, and protein data from NCBI
The initial focus of this repo is on the Novel Coronavirus COVID-19 (2019-nCoV). In the future we will include data for SARS and MERS.
How to use this project?
- Launch this repo on MyBinder.org. Binder lets you run Jupyter Notebooks in your web browser without software installation.
Once Jupyter Lab launches (this may take a couple of minutes), navigate to the notebooks directory.
Run the following Jupyter Notebooks:
1-PrepareDatasets.ipynb (downloads public data about COVID-19)
2-CreateKnowledgeGraph (creates a Neo4j Knowledge Graph)
3-AddGeneProteinInfo (adds genome, gene, and protein information)
4-ExampleQueries (runs Cypher queries on the Knowledge Graph)
Coronavirus KG Views displayed in Neo4j Browser
Fig. 1: The whole Coronavirus KG
Fig. 2: Outbreaks by Country, State/Province, and City
Fig. 3: Pathogen, Genome, Genes, and Proteins
Fig. 4: Strains found in Hubei province
How can you help?
- Suggest complementary publically accessible datasets to include in this Knowledge Graph
- Suggest queries and analyses
- Report bugs or issues
- Submit a pull request
Please send feedback or feature requests.