Skip to content

OpenDDX: an open, distributed database of disease symptom observations

CloCkWeRX edited this page Sep 13, 2012 · 3 revisions

The existence of a open, reliable, global database that associates symptoms with diseases would be foundational resources for medical researchers and would eventually save millions of lives by facilitating physicians’ differential diagnoses and permitting the public to better monitor their own health.

While the benefits of symptom-by-disease databases have been recognized for many years, and a number of proprietary diagnostic tools exist, there is no open, multi-stakeholder data resource that could form the basis of an ecosystem of research, public health and commercial applications.

However, recent developments in informatics, such as ontologies for diseases and symptoms, semantic web concepts and tools, handheld data collection and dissemination platforms and accessible data mining tools now enable the creation of a distributed data network based on academic data mining and citizen science observations, and we believe the reality of a global diagnostic health resource may be only a few years away.

Key features of this system will be:

The data will be generated in two forms:

  • Generic statements of the association of a symptom with a disease, with estimates of frequency and diagnostic importance. These data will be relatively easy to generate, from physicians’ experience and by mining academic papers. Where possible, citations will be listed for each record.
  • Patient-based records of a symptom at a time and place, with patient metadata such as ethnicity, experienced environment and other diseases. These are the core observations from which context-dependent disease-symptom matrices can be generated, but will be more difficult to generate. However, having access to these raw data and applying sophisticated analysis should help alleviate many physicians worries that expert systems that mimic the pattern-recognition abilities of the human brain cannot be built.

The presentation of the data will explain clearly their nature and their limitations. On its own, this combined dataset will not be a diagnostic system, but should be of fundamental value for creating such systems.

The data are distributed. No single database holds or nor institution controls the data. This makes collaboration more likely, since every institution can host its own data, but does add exciting challenges for data integration.

The data come from both within academia and from the public. Citizen science efforts are daily proving their value (e.g., in monitoring climate change) and peoples’ natural interest in their health makes the public a vast resource for meaningful scientific observations. A vital component of this citizen science project will be a careful assessment of which symptoms can be reliably observed by non-physicians, associated with the development of training resources.

More information

Read the concept document: https://docs.google.com/document/d/1T05Ao-7uVtOW0rjn_qZray9DK5VtJiIoZ7aU7Q6tA8k/edit#heading=h.xlsnjbxwmwpx