Skip to content

ninglab/CTKG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Knowledge Graph of Clinical Trials (CTKG)

Clinical Trials Knowledge Graph (CTKG) is a knowledge graph constructed over the clinical trial data from The Access to Aggregate Content of ClinicalTrials.gov (AACT) database1. CTKG includes nodes representing medical entities in clinical trials (e.g., studies, drugs, conditions), and edges representing the relations among these entities (e.g., drugs used in studies). It includes 1,496,684 nodes belonging to 18 node-types; and 3,667,750 triplets belonging to 21 relation-types. It also provides three notebooks about how to explore and analysis the CTKG using the knowledge graph embeddings.

This work has been published in Scientific Reports (https://www.nature.com/articles/s41598-022-08454-z).

Schema

CTKG dataset

The directory rawdata contains all the entities and relations:

  • attributes.zip : the attributes of entities (e.g., "study").
  • relations.zip : the attributes of relations between two types of entities (e.g., "study"--- study-condition ---"condition").
  • reverse.zip : the attributes of reverse relations between two types of entities (e.g., "condition" --- condition-study --- "study").

Embedding analysis

The directory scripts contains all the jupyter notebooks for the embedding analysis:

  • loading_ctkg_in_dgl.ipynb is a notebook to load CTKG as a graph using the Deep Graph Library (https://www.dgl.ai/).
  • Train_embeddings.ipynb is a notebook to generate the embeddings for nodes and relations in CTKG.
  • Subtype_entity_similarity_analysis.ipynb is a notebook to retrieve similar nodes of a certain node type.
  • Crosstype_entity_similarity_analysis.ipynb is a notebook for the drug repurposing analysis in the manuscript.

Before running the scripts, you need to unzip rawdata/ctkg.zip and rawdata/attributes.zip, and install DGL (https://www.dgl.ai/) and PyTorch. If you are not able to learn embeddings via the command in the notebook, please run the command in a terminal with DGL 0.4.3.

Citation

@Article{ctkg,
  author    = {Ziqi Chen and Bo Peng and Vassilis N. Ioannidis and Mufei Li and George Karypis and Xia Ning},
  journal   = {Scientific Reports},
  title     = {A knowledge graph of clinical trials (CTKG)},
  year      = {2022},
  month     = {mar},
  number    = {1},
  volume    = {12},
  doi       = {10.1038/s41598-022-08454-z},
  publisher = {Springer Science and Business Media {LLC}},
}

Reference

Footnotes

  1. Tasneem A, Aberle L, Ananth H, Chakraborty S, Chiswell K, McCourt BJ, et al. (2012) The Database for Aggregate Analysis of ClinicalTrials.gov (AACT) and Subsequent Regrouping by Clinical Specialty. PLoS ONE 7(3): e33677. https://doi.org/10.1371/journal.pone.0033677

About

The Clinical Trials Knowledge Graph

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published