Skip to content
Pablo Alarcón edited this page Oct 26, 2023 · 16 revisions

Semantic data model of the set of common data elements for rare disease registration

GitHub GitHub tag (latest by date)

To make rare disease registry data Interoperable (the I in FAIR).

In this work we present a semantic data model for the set of common data elements for rare diseases registration recommended by the European commission Joint Research Centre. We proposed a semantic data model for these data elements.

IMPORTANT ANNOUNCEMENT!

This Github repository is currently deprecated, The new stable version of this semantic model is maintained as Clinical And Registry Entries (CARE) Semantic Model. This model is maintained at this different Github registry

CDE modules overview

The figure below gives an overview of upper level concepts and properties used in our cde model.

Figure 1: Common data element overall semantic model


Figure 2: Observation context metadata layer


You can browse different CDE modules by visiting the links below.

Patient personal information:

  • Birthyear - describes patient year of birth
  • Birthdate - describes patient date of birth
  • Sex - describes patient sex at birth
  • Body measurement - describes patient physical measurement of the body.

Participation status:

  • Status - describes patient alive or dead status
  • Deathdate - describes patient date of death

Medical history:

Conditions and medical findings:

Research availability and consent:

  • Biobank - describes availability of subject's samples in a biobank
  • Consent - describes consent given by a subject

Treatment-related interventions:

  • Medications - describes patient medications based on a prescription.
  • Treatment/Therapy - describes any component presented in treatment and therapy procedures.

Clinical trials:

Moving to the new version 2.0.0

While considerable time was spent on the first generation of CDE models, the final published set remained inconsistent in a number of ways:

  1. Nodes had different numbers of ontological annotations, with no justification

  2. The CDE models adopted the high-level CDEs defined by the RD Platform, which were often aggregations of individual data elements. As a consequence:

    a) Registries did not always have all of the individual subcomponents to fulfil the model

    b) It was unclear what to do when a model couldn't be filled

    c) This led to data loss, when those data elements were not FAIR-transformed

  3. Date/time were sometimes included in the model, and sometimes not

  4. The CSV files all had a distinct structure, meaning each one needed fairly specialized code to generate. For more information about how to implement our CDE semantic model, click here.

  5. There was no easy way to aggregate various observations together that might be related (e.g. the observations/interventions made during the course of a COVID infection)

Features of the new version:

  1. The overall model is identical to the original Core CDE model (Figure 1).
  2. Only one data element is modeled at a time; if you do not have that element, you do not use that model
  3. Every element of the model has an "upper ontology" type (e.g. "process") and a domain-specific type (e.g. "blood pressure measurement process"). Exactly two types per node.
  4. Date/Time is now considered metadata of the data model. Even in the case where date/time are the core observation of the model (e.g. date of symptom onset) Thus, all models are identical in structure and metadata (Figure 2).
  5. This metadata takes the form of a "context" node (i.e. an RDF Quad, rather than an RDF Triple), which is annotated with various things. In addition, the context node becomes "part of" a patient's overall timeline, which itself is modeled in RDF and creates a larger grouping of all observations about a patient.
  6. In addition to being "part of" a patient's timeline, context nodes can be grouped into other arbitrary collections reflecting other kinds of groupings (like the COVID-19 infection scenario described above). Its not mandatory to implement this in your model - it is merely made possible by this new model, which was not the case with the Version 1 models.

Cite us

To cite this model please use this publication Semantic modeling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data.

Feedback

Your feedback is more than welcome it will help us improve our semantic data model. Please use github issues to provide your feedback.

Acknowledgement

This work was done in the European Joint Programme on Rare Diseases (EJP RD) project which has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement N°82557.
EU logo

Clone this wiki locally