# Introduction

The Integrated Canine Data Commons (ICDC) provides multiple avenues for users to access and use data. In addtion to the [ICDC Website](https://https://caninecommons.cancer.gov/#/), there are REST and GraphQL APIs.  In particular, the GraphQL API allows users to understand what is in the ICDC database and create custom queries that return the exact data that the researcher is intrested in.

In this notebook we go through some simple examples that demonstrate how to access data through the GraphQL API, including data that are not availble in the web interface.

In [14]:
! pip install requests
! pip install pprint
! pip install pandas
! pip install plotly



In [15]:
import requests
import icdcQueries as icdc #This file contains the GraphQL queries used in this demonstration.
import pprint
import pandas as pd
import plotly
import plotly.express as px
import plotly.graph_objects as go

icdc.init()

In [16]:
#General query interface
def runQuery(query):
    
    endpoint = "https://caninecommons.cancer.gov/v1/graphql/"
    
    request = requests.post(endpoint, json={'query': query})
    
    if request.status_code == 200:
        return request.json()
    else:
        raise Exception("Query failed code {}. {}".format(request.status_code,query))

In [17]:
#Flatten a JSON object into a table
def flattenJSON(jsondata):
    flatdata = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i +=1
        else:
            flatdata[name] = x

    flatten(jsondata)
    return flatdata

In [18]:
def getDataframe(query):
    #Provide a GraphQL query and get a pandas dataframe back.  Assumes a query based on case
    
    #Run the GraphQL query
    jsondata = runQuery(query)
    
    #Flatten the JSON and push it into a dataframe
    finaldata = pd.DataFrame(flattenJSON(case) for case in jsondata['data']['case'])
    
    return finaldata

In [29]:
def scatterPlot(xaxis, yaxis):
    #Use Plotly to create a basic scatter plot
    
    dataframe = getDataframe(icdc.demo_query)
    
    figure = px.scatter(dataframe, x=xaxis, y=yaxis)
    figure.show()

In [32]:
def niceTable():
    #Use Plotly to create a presentation quality table
    
    icdc.init()
    variables = None
    
    df = getDataframe(icdc.table_demo)
    
    figure = go.Figure(data=[go.Table(header=dict(values=list(df.columns)), 
                                      cells=dict(values=[df[k].tolist() for k in df.columns]))])
    
    figure.show()

In [21]:
def queryList():
    # Demonstate how to query for all fields in ICDC
    
    data = runQuery(icdc.all_queries)
    pprint.pprint(data)

In [25]:
def dataframePrint():
    
    dataframe = getDataframe(icdc.demo_query)
    pprint.pprint(dataframe)

# GraphQL Introspection
GraphQL provides a service calls **introspection** that allows users to ask the system what fields are available to query.  This service allows users to browse the database schema and construct the queries to return the information they're interested in.

The result below shows all of the fields in ICDC that users can access

In [26]:
queryList()

{'data': {'__schema': {'queryType': {'fields': [{'description': None,
                                                 'name': 'AgeCaseCount'},
                                                {'description': None,
                                                 'name': 'BreedCaseCount'},
                                                {'description': None,
                                                 'name': 'CaseDetail'},
                                                {'description': None,
                                                 'name': 'CaseOverview'},
                                                {'description': None,
                                                 'name': 'DiagnosisCaseCount'},
                                                {'description': None,
                                                 'name': 'DiseaseSiteCaseCount'},
                                                {'description': None,
                                                 'na

# Getting data from the API

Even if  data aren't presented in the ICDC graphical interface, they are avaialble via the API for users to access and analyze.  In the example below, the ICDC API was queried to provide a variety of information about each case in ICDC such as the cohort they were assigned to, the diagnosis, and any associated files. 

Since the ICDC API returns data in JSON format, it is easy to transform the data into commonly used anayltical tools.  In the example below, data about each case was retrieved from ICDC and put into a Pandas dataframe, a commonly used Python data management tool.

In [27]:
dataframePrint()

                 case_id_ cohort_cohort_description_ cohort_cohort_dose_  \
0            COTC007B0201     NSC 725776; 3mg/m2/day          3mg/m2/day   
1            COTC007B0501     NSC 725776; 3mg/m2/day          3mg/m2/day   
2            COTC007B0901     NSC 743400; 8mg/m2/day          8mg/m2/day   
3            COTC007B0502     NSC 725776; 3mg/m2/day          3mg/m2/day   
4            COTC007B0503     NSC 725776; 3mg/m2/day          3mg/m2/day   
..                    ...                        ...                 ...   
139  NCATS-COP01CCB070020        Pulmonary Neoplasms                       
140  NCATS-COP01CCB070034        Pulmonary Neoplasms                       
141  NCATS-COP01CCB070102                   Melanoma                       
142  NCATS-COP01CCB080012                   Melanoma                       
143  NCATS-COP01CCB080018                   Melanoma                       

    diagnoses_0_stage_of_disease_ diagnoses_0_concurrent_disease_  \
0                 

# Analyzing data

Users can combine common tools such as Pandas and Plotly to graph data retrived from the ICDC API.  While this example uses Python, GraphQL is language agnostsic and users can access the IDCD API using the language of their choice.

In [30]:
xaxis = "demographic_breed_"
yaxis = "diagnoses_0_primary_disease_site_"
scatterPlot( xaxis, yaxis)

Similarly, these tools can be used to produced presentation-ready tables of selected data.

In [33]:
niceTable()