Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate DisGeNet as a datasource for target disease associations #2601

Closed
ireneisdoomed opened this issue May 24, 2022 · 0 comments
Closed
Labels
Data Relates to Open Targets data team Enhancement Update to existing feature Platform Issues related to Open Targets Platform

Comments

@ireneisdoomed
Copy link

Context

DisGeNET is a discovery platform containing one of the largest publicly available collections of genes and variants associated to human diseases.

This aim is very much in line with that of Open Targets so it would be useful to explore their gene/disease associations dataset. This is an extract of their documentation:

The gene-disease information in DisGeNET is organized according to the types of source databases:
CURATED: GDAs from UniProt, PsyGeNET, Orphanet, the CGI, CTD (human data), ClinGen, and the Genomics England PanelApp.
ANIMAL MODELS: GDAs from RGD, MGD, and CTD (mouse and rat data)
INFERRED: GDAs from the Human Phenotype Ontology, and GDAs inferred from VDAs reported by Clinvar, the GWAS catalog and GWAS db
ALL: GDAs from previous sources and from LHGDN and BeFree

Tasks

  1. Explore the overlap with Open Targets. Many of the sources are already in OT.

To access the data:

  • I've uploaded a copy of their db dump here: gs://ot-team/irene/disgenet/disgenet_2020.db.zip
    Evidence are in the geneDiseaseNetworktable. Entities are not mapped, however can be extracted:
  • gene symbols are found in the geneAttributes table. These can be fetched by joining on geneNID
  • disease labels are found in the diseaseAttributes table. These can be fetched by joining on diseaseNID. We can run OnToma on the labels. Alternatively, they make available a LUT table with cross-references between disease labels and different ontologies - available at gs://ot-team/irene/disgenet/disease_mappings.tsv

Notes

  • The database hasn't been updated since May 2020.
  • Their data is open access under a Attribution-NonCommercial-ShareAlike license.
  • For the case of the overlapping sources, it is important to mention that DisGeNet doesn't capture as much granularity as we do. For the case of PanelApp, for example, they have 20k d/t evidence which is a very similar number to what we have, however they don't report information on inheritance patterns as we do.
@ireneisdoomed ireneisdoomed added Data Relates to Open Targets data team Enhancement Update to existing feature Platform Issues related to Open Targets Platform labels May 24, 2022
@d0choa d0choa closed this as not planned Won't fix, can't repro, duplicate, stale Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Relates to Open Targets data team Enhancement Update to existing feature Platform Issues related to Open Targets Platform
Projects
None yet
Development

No branches or pull requests

2 participants