# The CDISC Library API

There's no other way to put it - PDF is a terrible medium for data sharing.  There is a standard, but implementations of it are wide and varied.  Issues around handling tables, paragraphs, etc have made it a very human (and thereby accident prone) approach.  

The CDISC Library exposes the core metadata via a RESTful Web API.  Access is available for members with fair use provisions and also for Open Source Contributors if you are eligible.  To request an account go to [CDISC Library](https://www.cdisc.org/cdisc-library) and click on `Request an account`.  Once you are setup you will be able to generate a token (a character string) that you can use to make requests against the API (this is primarily used to avoid people abusing the service)

In [1]:
# as before, we import our requests library
import requests

# We use the python-dotenv library to merge in the CDISC_LIBRARY_API_TOKEN
#  - this reads from a formatted file and adds the values to the environment
from dotenv import load_dotenv
# this loads in the values from the .env file
load_dotenv()
import os

# prove it worked
print(f"Loaded token {os.environ['CDISC_LIBRARY_API_TOKEN'][:5]}...")

# Supply this globally
CDISC_API_URL = "https://library.cdisc.org/api" 

Loaded token ed516...


In [2]:
# the token we loaded needs to be added to the headers using the name 'api-key'

# if we don't pass the token, what happens?
unauth = requests.get(f"{CDISC_API_URL}/mdr/products")

print(f"Requesting {unauth.url} got status {unauth.status_code}")



Requesting https://library.cdisc.org/api/mdr/products got status 401


In [3]:
# we pass the `api-key` in the headers

response = requests.get(f"{CDISC_API_URL}/mdr/products", headers={'api-key': os.environ['CDISC_LIBRARY_API_TOKEN']})

print(f"Requesting {response.url} got status {response.status_code}")


Requesting https://library.cdisc.org/api/mdr/products got status 200


In [4]:
# The request we made above was to the webservice that returns the list of products

# as we know the response is JSON we can access the model as a dictionary using the `json` method
results = response.json()

# What elements are in the result set?
print("Keys", results.keys())

Keys dict_keys(['_links'])


In [5]:
# Inspection of this object helps us find what we are looking for

links = results.get('_links')

print("Links:", links.keys())

Links: dict_keys(['data-analysis', 'data-collection', 'data-tabulation', 'draft-content', 'qrs', 'self', 'terminology'])


In [6]:
# the _links collection is a common pattern for representing where you can 'go' from a location

# self represents the current resource
print("self -> ", links['self']) 

# this particular resource has sets of related resources (organised by product group)
for product_group in links.keys():
    if product_group == "self":
        continue
    for product, details in links.get(product_group).get('_links').items():
        print(f"{product_group} -> {product} -> {details}")


self ->  {'href': '/mdr/products', 'title': 'CDISC Library Product List', 'type': 'CDISC Library Product List'}
data-analysis -> adam -> [{'href': '/mdr/adam/adam-2-1', 'title': 'Analysis Data Model Version 2.1', 'type': 'Foundational Model'}, {'href': '/mdr/adam/adam-adae-1-0', 'title': 'Analysis Data Model Data Structure for Adverse Event Analysis Version 1.0', 'type': 'Implementation Guide'}, {'href': '/mdr/adam/adam-occds-1-0', 'title': 'ADaM Structure for Occurrence Data (OCCDS) Version 1.0', 'type': 'Implementation Guide'}, {'href': '/mdr/adam/adam-tte-1-0', 'title': 'ADaM Basic Data Structure for Time-to-Event Analyses Version 1.0', 'type': 'Implementation Guide'}, {'href': '/mdr/adam/adamig-1-0', 'title': 'Analysis Data Model Implementation Guide Version 1.0', 'type': 'Implementation Guide'}, {'href': '/mdr/adam/adamig-1-1', 'title': 'Analysis Data Model Implementation Guide Version 1.1', 'type': 'Implementation Guide'}, {'href': '/mdr/adam/adamig-1-2', 'title': 'Analysis Data 

In [7]:
# the href attribute is a value you can pick to resolve the entity
href = links["terminology"]["_links"]["self"]["href"]

response = requests.get(f"{CDISC_API_URL}{href}", headers={'api-key': os.environ['CDISC_LIBRARY_API_TOKEN']})

print(response.json())

{'_links': {'packages': [{'href': '/mdr/ct/packages/adamct-2014-09-26', 'title': 'ADaM Controlled Terminology Package 19 Effective 2014-09-26', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2015-12-18', 'title': 'ADaM Controlled Terminology Package 24 Effective 2015-12-18', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2016-03-25', 'title': 'ADaM Controlled Terminology Package 25 Effective 2016-03-25', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2016-09-30', 'title': 'ADaM Controlled Terminology Package 27 Effective 2016-09-30', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2016-12-16', 'title': 'ADaM Controlled Terminology Package 28 Effective 2016-12-16', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2017-03-31', 'title': 'ADaM Controlled Terminology Package 29 Effective 2017-03-31', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2017-09-29', 'title': 'ADaM Controlled Terminology Package 31 Effective 2017-09-29'

In [8]:
# you can reuse the client authentication details by creating a Session
from requests import Session
client = Session()
client.headers['api-key'] = os.getenv("CDISC_LIBRARY_API_TOKEN")

terminology = client.get(f"{CDISC_API_URL}{href}")
print(f"Packages: {terminology.json().get('_links').get('packages')}")

Packages: [{'href': '/mdr/ct/packages/adamct-2014-09-26', 'title': 'ADaM Controlled Terminology Package 19 Effective 2014-09-26', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2015-12-18', 'title': 'ADaM Controlled Terminology Package 24 Effective 2015-12-18', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2016-03-25', 'title': 'ADaM Controlled Terminology Package 25 Effective 2016-03-25', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2016-09-30', 'title': 'ADaM Controlled Terminology Package 27 Effective 2016-09-30', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2016-12-16', 'title': 'ADaM Controlled Terminology Package 28 Effective 2016-12-16', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2017-03-31', 'title': 'ADaM Controlled Terminology Package 29 Effective 2017-03-31', 'type': 'Terminology'}, {'href': '/mdr/ct/packages/adamct-2017-09-29', 'title': 'ADaM Controlled Terminology Package 31 Effective 2017-09-29', 'type': 'Ter

In [9]:
# let's request a particular standard (say SDTMIG version 3-2)

sdtm_ig_32 = client.get(f"{CDISC_API_URL}/mdr/sdtmig/3-2")

In [10]:
# decode the response
ig = sdtm_ig_32.json()

# inspect the response
print(ig.keys())
# this API call returns a class oriented structure
for ig_class in ig.get('classes'):
    # note, General Observations has no direct datasets, so we need to guard 
    if "datasets" in ig_class:
        # iterate over the datasets
        for dataset in ig_class.get('datasets'):
            # each dataset has a set of variables
            for dataset_variable in dataset.get('datasetVariables'):
                print(f"{ig_class.get('name')} -> {dataset.get('name')} -> [{dataset_variable.get('ordinal')}] {dataset_variable.get('name')}")
        

dict_keys(['_links', 'classes', 'description', 'effectiveDate', 'label', 'name', 'registrationStatus', 'source', 'version'])
Special-Purpose -> CO -> [1] STUDYID
Special-Purpose -> CO -> [2] DOMAIN
Special-Purpose -> CO -> [3] RDOMAIN
Special-Purpose -> CO -> [4] USUBJID
Special-Purpose -> CO -> [5] COSEQ
Special-Purpose -> CO -> [6] IDVAR
Special-Purpose -> CO -> [7] IDVARVAL
Special-Purpose -> CO -> [8] COREF
Special-Purpose -> CO -> [9] COVAL
Special-Purpose -> CO -> [10] COEVAL
Special-Purpose -> CO -> [11] CODTC
Special-Purpose -> DM -> [1] STUDYID
Special-Purpose -> DM -> [2] DOMAIN
Special-Purpose -> DM -> [3] USUBJID
Special-Purpose -> DM -> [4] SUBJID
Special-Purpose -> DM -> [5] RFSTDTC
Special-Purpose -> DM -> [6] RFENDTC
Special-Purpose -> DM -> [7] RFXSTDTC
Special-Purpose -> DM -> [8] RFXENDTC
Special-Purpose -> DM -> [9] RFICDTC
Special-Purpose -> DM -> [10] RFPENDTC
Special-Purpose -> DM -> [11] DTHDTC
Special-Purpose -> DM -> [12] DTHFL
Special-Purpose -> DM -> [13] SI

In [11]:
# can go directly to the datasets
datasets = client.get(f"{CDISC_API_URL}/mdr/sdtmig/3-2/datasets").json()



In [12]:
dm = None
# the href attribute is a link that you can reuse with the BASE URI
for dataset_link in datasets.get('_links').get('datasets'):
    if dataset_link.get('title') == 'Demographics':
        dm = client.get(f"{CDISC_API_URL}{dataset_link.get('href')}").json()



In [13]:
print(dm.get('name'))
print(dm.get('description'))
print(dm.get('_links'))

# pull the variables for the dataset
for dv in dm.get('datasetVariables'):
    print(f"{dm.get('name')} -> {dv.get('name')}")


DM
A special-purpose domain that includes a set of essential standard variables that describe each subject in a clinical study. It is the parent domain for all other observations for human clinical subjects. (Source: CDISC Controlled Terminology, DOMAIN, C49572, 2018-06-29)
{'modelDataset': {'href': '/mdr/sdtm/1-4/datasets/DM', 'title': 'Demographics', 'type': 'SDTM Dataset'}, 'parentClass': {'href': '/mdr/sdtmig/3-2/classes/SpecialPurpose', 'title': 'Special-Purpose Datasets', 'type': 'Class'}, 'parentProduct': {'href': '/mdr/sdtmig/3-2', 'title': 'Study Data Tabulation Model Implementation Guide: Human Clinical Trials Version 3.2 (Final)', 'type': 'Implementation Guide'}, 'priorVersion': {'href': '/mdr/sdtmig/3-1-3/datasets/DM', 'title': 'Demographics', 'type': 'SDTM Dataset'}, 'self': {'href': '/mdr/sdtmig/3-2/datasets/DM', 'title': 'Demographics', 'type': 'SDTM Dataset'}}
DM -> STUDYID
DM -> DOMAIN
DM -> USUBJID
DM -> SUBJID
DM -> RFSTDTC
DM -> RFENDTC
DM -> RFXSTDTC
DM -> RFXENDTC

In [14]:
import pprint
# The dataset variables also have accessible attributes

# pull the variables for the dataset
for dv in dm.get('datasetVariables'):
    if dv.get('name') == 'SEX':
        print(pprint.pprint(dv))

{'_links': {'codelist': [{'href': '/mdr/root/ct/sdtmct/codelists/C66731',
                          'title': 'Version-agnostic anchor resource for '
                                   'codelist C66731',
                          'type': 'Root Value Domain'}],
            'modelDatasetVariable': {'href': '/mdr/sdtm/1-4/datasets/DM/variables/SEX',
                                     'title': 'Sex',
                                     'type': 'SDTM Dataset Variable'},
            'parentDataset': {'href': '/mdr/sdtmig/3-2/datasets/DM',
                              'title': 'Demographics',
                              'type': 'SDTM Dataset'},
            'parentProduct': {'href': '/mdr/sdtmig/3-2',
                              'title': 'Study Data Tabulation Model '
                                       'Implementation Guide: Human Clinical '
                                       'Trials Version 3.2 (Final)',
                              'type': 'Implementation Guide'},
           

# API Specifications

The APIs themselves are sensible in terms of how they are laid out; we go from context (eg SDTM IG) to version (eg 3-3) to dataset (eg DM) to variable (eg SEX).  The API is documented using a standard called OpenAPI (previously known as Swagger) - this is a format for publishing information about an API; information includes:
* Routes (where to get the data)
* Methods (how to access the data, usually using HTTP Verbs)
* Formats (what parameters are required)
* Structures (how the data looks)
* Responses (what outcomes can you expect)

The use of standard representations makes the developer view ([here](https://www.cdisc.org/cdisc-library/api-documentation))

Using a standard OpenAPI based webservice makes it easier for developers to access and understand what the API exposes.  You can use tools to automatically generate clients so code like the following is possible (this was built using [openapi-python-client](https://github.com/openapi-generators/openapi-python-client)):

```python
# create a module and then use the module
from cdisc_library_api_client.api.sdtm.ig.sdtmig_get_dataset import sync_detailed

# URL construction and authentication are handled transparently
client = AuthenticatedClient(base_url="https://library.cdisc.org/api", token=token)

token = os.getenv("CDISC_LIBRARY_TOKEN")

# Specifications
version = "3-3"
domain = "DM"

# make the call
mm = await sync_detailed(client=client, version=version, dataset=domain)
dataset = mm.parsed

# dataset is an instance of an SDTMigDataset
print("Dataset: {dataset.name} ({dataset.label})")
```

