# Shape of NIH_Clinical_Trials data

This notebook describes describes the shape of the NIH_Clinical_Trials data, including descriptions of entities and possible fields. This is not an assessment of data quality.

Each API response (ie study) has two main blocks, a `ProtocolSection` and a `DerivedSection`. Those conducting the studies must submit information on the former. The latter contains MeSH terms and other keywords describing medical conditions. Apart from the tables, I have also attached an API response that shows all of the available entities.

## Core entities
*e.g. one entity per expected SQL table*
Note that these tables are an estimate and there are probably better splits.


|       Entity name             | [Entity type](https://schema.org/docs/full.html) | Notes  |
|:-----------------------------:|:-------------------------:|:-----------------------------------------------:|
|   Organization    | [Org](https://schema.org/Organization)          |                                                 |
| Study      | [study](https://schema.org/study)          |                                                 |
| Conditions | [medical condition](https://schema.org/MedicalCondition)
| MeSH terms | [medical code](https://schema.org/MedicalCode)

### Fields for Organization
*e.g. one field per expected table column*


|       Fields name             | [Data type](https://schema.org/DataType) | Belongs to another [entity](https://schema.org/docs/full.html) | Notes  |
| ----------------------------- |:-------------------------:|:-----------------:|:-------------------------------:|
|   OrgStudyId    | string          |            |
|    OrgFullName   | string          |            |
|    OrgClass   | string          |            |

### Fields for Study
*e.g. one field per expected table column*


|       Fields name             | [Data type](https://schema.org/DataType) | Belongs to another [entity](https://schema.org/docs/full.html) | Notes  |
| ----------------------------- |:-------------------------:|:-----------------:|:-------------------------------:|
|   BriefTitle    | string          |            |
|    OfficialTitle   | string          |            |
|    Acronym   | string          |            |
|StudyFirstSubmitDate | string
|StudyFirstPostDateStruct.StudyFirstPostDate | string
|OversightModule.OversightHasDMC| string
| OversightModule.IsFDARegulatedDrug | string 
| OversightModule.IsFDARegulatedDevice | string
|DescriptionModule.BriefSummary | string
| DetailedDescription | string
| EligibilityModule.HealthyVolunteers | string
| EligibilityModule.Age | string
| EligibilityModule.Gender | string
|OutcomesModule | json | | contains all study outcomes with text description and time horizon
|InterventionList | json | | contains study's interventions
| DesignModule | json | |contains the type of study, enrollment information, how it was designed (for example, randomised), the intervention model and a description
|ContactsLocationsModule | json | | Personal information of the POC

### Fields for MeSH terms
*e.g. one field per expected table column*


|       Fields name             | [Data type](https://schema.org/DataType) | Belongs to another [entity](https://schema.org/docs/full.html) | Notes  |
| ----------------------------- |:-------------------------:|:-----------------:|:-------------------------------:|
|   ConditionMeshList    | json          |            | contains MeSH terms and their IDs

### Fields for Conditions
*e.g. one field per expected table column*


|       Fields name             | [Data type](https://schema.org/DataType) | Belongs to another [entity](https://schema.org/docs/full.html) | Notes  |
| ----------------------------- |:-------------------------:|:-----------------:|:-------------------------------:|
|    ConditionAncestorList, ConditionBrowseLeafList, ConditionBrowseBranchList   | json          |            | contains keyword terms and their IDs. Not clear how they're linked in a hierarchy

In [4]:
import sys
sys.path.append('../collect/')

from collect import collect_full_studies

In [7]:
studies = collect_full_studies(expr='', min_rnk=1, max_rnk=1)

In [20]:
studies['FullStudiesResponse']['FullStudies'][0]['Study']['ProtocolSection']

{'IdentificationModule': {'NCTId': 'NCT04563832',
  'OrgStudyIdInfo': {'OrgStudyId': 'APHP2020'},
  'Organization': {'OrgFullName': 'Assistance Publique - Hôpitaux de Paris',
   'OrgClass': 'OTHER'},
  'BriefTitle': 'Self-administered Hyperinsufflation Chest Mobilization Randomized Study on the Risk of Low Respiratory Infection in Patients With Multiple Sclerosis With Sputum Capacity Deficit',
  'OfficialTitle': 'Self-administered Hyperinsufflation Chest Mobilization Randomized Study on the Risk of Low Respiratory Infection in Patients With Multiple Sclerosis With Sputum Capacity Deficit',
  'Acronym': 'MS-COUGH'},
 'StatusModule': {'StatusVerifiedDate': 'May 2020',
  'OverallStatus': 'Not yet recruiting',
  'ExpandedAccessInfo': {'HasExpandedAccess': 'No'},
  'StartDateStruct': {'StartDate': 'October 2020',
   'StartDateType': 'Anticipated'},
  'PrimaryCompletionDateStruct': {'PrimaryCompletionDate': 'October 2022',
   'PrimaryCompletionDateType': 'Anticipated'},
  'CompletionDateStru

In [29]:
studies['FullStudiesResponse']['FullStudies'][0]['Study']['DerivedSection']['ConditionBrowseModule']

{'ConditionMeshList': {'ConditionMesh': [{'ConditionMeshId': 'D000012141',
    'ConditionMeshTerm': 'Respiratory Tract Infections'},
   {'ConditionMeshId': 'D000009103',
    'ConditionMeshTerm': 'Multiple Sclerosis'},
   {'ConditionMeshId': 'D000012598', 'ConditionMeshTerm': 'Sclerosis'}]},
 'ConditionAncestorList': {'ConditionAncestor': [{'ConditionAncestorId': 'D000007239',
    'ConditionAncestorTerm': 'Infection'},
   {'ConditionAncestorId': 'D000010335',
    'ConditionAncestorTerm': 'Pathologic Processes'},
   {'ConditionAncestorId': 'D000020278',
    'ConditionAncestorTerm': 'Demyelinating Autoimmune Diseases, CNS'},
   {'ConditionAncestorId': 'D000020274',
    'ConditionAncestorTerm': 'Autoimmune Diseases of the Nervous System'},
   {'ConditionAncestorId': 'D000009422',
    'ConditionAncestorTerm': 'Nervous System Diseases'},
   {'ConditionAncestorId': 'D000003711',
    'ConditionAncestorTerm': 'Demyelinating Diseases'},
   {'ConditionAncestorId': 'D000001327',
    'ConditionAnce