# PKDB-REST API
This document provides examples querying data from PK-DB via the REST API. In the following the python `requests` package is used to make the web service requests.

The complete API documentation is available from https://pk-db.com/api/v1/swagger/.

For questions and information please contact konigmatt@googlemail.com

In [60]:
# definition of the base url of the API
base_url = "http://0.0.0.0:8000/api/v1"  # https://pk-db.com/api/v1

In [61]:
import requests
from requests import Response
from pprint import pprint
import pandas as pd


def json_print(r: Response):
    """Simple print for JSON content of response."""
    json = r.json()
    pprint(json, sort_dicts=False)

## Statistics
The `/statistics/` endpoint allows to retrieve a basic overview of the content of PK-DB, consisting of the counts and version information.

To try the query in your browser use  
<a href="https://pk-db.com/api/v1/statistics/?format=json" target="_blank">https://pk-db.com/api/v1/statistics/?format=json</a>

In [72]:
# query endpoint and print results
r = requests.get(f'{base_url}/statistics/')
json_print(r)

{'version': '0.9.2a3',
 'study_count': 507,
 'reference_count': 507,
 'group_count': 1447,
 'individual_count': 6212,
 'intervention_count': 1391,
 'output_count': 72068,
 'output_calculated_count': 11651,
 'timecourse_count': 3036,
 'scatter_count': 37}


## Info nodes
Information in PK-DB is organized as info nodes. Meta-information is encoded in the form of the info nodes which for a given field encodes meta-data such as description, synonyms, annotations and database cross-references. The information in the info nodes can be used to map data to other databases.

### Get info node information
Information on info nodes can be retrieved using the `sid` with the `info_nodes` endpoint. An overview of the existing info nodes is available from the info nodes tab https://pk-db/curation. 

In the following example we query the information for the substance `caffeine` with the`sid=caf`

To try the query in your browser use  
<a href="https://pk-db.com/api/v1/info_nodes/caf/?format=json" target="_blank">https://pk-db.com/api/v1/info_nodes/caf/?format=json</a>

In [62]:
# query caffeine info_node
r = requests.get(f'{base_url}/info_nodes/caf/')
json_print(r)

{'sid': 'caf',
 'name': 'caffeine',
 'label': 'caffeine',
 'deprecated': False,
 'ntype': 'substance',
 'dtype': 'undefined',
 'description': 'A methylxanthine alkaloid found in the seeds, nuts, or leaves '
                'of a number of plants native to South America and East Asia '
                'that is structurally related to adenosine and acts primarily '
                'as an adenosine receptor antagonist with psychotropic and '
                'anti-inflammatory activities.',
 'synonyms': ['1,3,7-TMX',
              '1,3,7-Trimethylxanthine',
              '1,3,7-trimethyl-3,7-dihydro-1H-purine-2,6-dione',
              '1,3,7-trimethylpurine-2,6-dione',
              '1,3,7-trimethylxanthine',
              '1-methyltheobromine',
              '137MX',
              '3,7-Dihydro-1,3,7-trimethyl-1H-purin-2,6-dion',
              '3,7-Dihydro-1,3,7-trimethyl-1H-purine-2,6-dione',
              '7-methyltheophylline',
              'CAF',
              'CAFFEINE',
            

### Search info node
Info nodes can be search via the `search` argument to the `/info_nodes/` endpoint. 

In the following example info nodes containing `caffeine` are searched. The results are paginated and if more then a single page of results exists the results from multiple pages have to be combined. 
The We parse the JSON response in a pandas DataFrame and display `sid`, `name`, `label` and `description` for the top 10 results.

To try the query in your browser use  
<a href="https://pk-db.com/api/v1/info_nodes/?search=caffeine&format=json" target="_blank">https://pk-db.com/api/v1/info_nodes/?search=caffeine&format=json</a>

In [64]:
# query info nodes about caffeine
r = requests.get(f'{base_url}/info_nodes/?search=caffeine')
json = r.json()

# The 'data' key contains all the response data consisting of count and actual data
count = json["data"]["count"]
print(f"Number of info nodes on page: {count}")

# conversion of result data to a pandas DataFrame
data = json["data"]["data"]
df = pd.DataFrame.from_dict(data)

# printing selected columns
df[["sid", "name", "label", "description"]].head(10)

Number of info nodes on page: 41


Unnamed: 0,sid,name,label,description
0,caf,caffeine,caffeine,"A methylxanthine alkaloid found in the seeds, ..."
1,caffeine-citrate,caffeine citrate,caffeine citrate,"Commercial citrate of caffeine, though not a d..."
2,caffeine-monohydrate,caffeine monohydrate,caffeine monohydrate,Caffeine monohydrate.
3,17u,17U,17U,Metabolite of caffeine.
4,px,paraxanthine,paraxanthine,A dimethylxanthine having the two methyl group...
5,tp,theophylline,theophylline,A natural alkaloid derivative of xanthine isol...
6,137mu,137MU,137MU,Metabolite of caffeine.
7,137tmu,137TMU,137TMU,Metabolite of caffeine.
8,13dmu,13DMU,13DMU,Metabolite of caffeine.
9,13mu,13MU,13MU,Metabolite of caffeine.


### Get all info nodes
To retrieve all available info nodes use the `/info_nodes/` endpoint.

To try the query in your browser use  
<a href="https://pk-db.com/api/v1/info_nodes/?format=json" target="_blank">https://pk-db.com/api/v1/info_nodes/?format=json</a>

In [65]:
r = requests.get(f'{base_url}/info_nodes/')
json = r.json()
print(f"Number of info nodes: {json['data']['count']}")

Number of info nodes: 1030


To access the next page of a paginated page use the `page` argument.

For instance to access the page 2 use
<a href="https://pk-db.com/api/v1/info_nodes/?page=2&format=json" target="_blank">https://pk-db.com/api/v1/info_nodes/?page=2&format=json</a>

## Search and filter data
The `/filter/` endpoint is the main endpoint to search and filter data. The endpoint returns a `uuid` to access the information of the results and and overview of the counts. The `studies__*`, `groups__*`, `individuals_*`, ... arguments allow to search and filter on the respective information of the studies. These arguments correspond to the search flags in web search.

In the following example we filter the information for the study with the name `Abernethy1982`. Importantly, the `uuid` is not permanent. To run the following queries

To try the query in your browser use  
<a href="https://pk-db.com/api/v1/info_nodes/filter/?studies__name=Abernethy1982&format=json" target="_blank">https://pk-db.com/api/v1/info_nodes/filter/?studies__name=Abernethy1982&format=json</a>

In [69]:
r = requests.get(f'{base_url}/filter/?studies__name=Abernethy1982')
json_print(r)
uuid = r.json()['uuid']
print(f"\nuuid: {uuid}")

{'uuid': '5873b5e3-7a48-4288-80b4-553f721e3a51',
 'studies': 1,
 'groups': 4,
 'individuals': 46,
 'interventions': 1,
 'outputs': 147,
 'timecourses': 4,
 'scatter': 0}

uuid: 5873b5e3-7a48-4288-80b4-553f721e3a51


### Accessing data for search query 
The `groups`, `individuals`, `interventions`, `outputs`, `timecourses` and `scatters` can now be loaded using the `uuid`

In [75]:
# query information via uuid
for endpoint in ["groups", "individuals", "interventions", "outputs"]:
    url = f"{base_url}/{endpoint}/?uuid={uuid}&format=json"
    print(url)
    r = requests.get(url)
    count = r.json()["data"]["count"]
    print(f"{endpoint}: {count}")
    
# query timecourses and scatters
for data_type in ["timecourse", "scatter"]:
    url = f"{base_url}/subsets/?data_type={data_type}&uuid={uuid}&format=json"
    print(url)
    r = requests.get(url)
    count = r.json()["data"]["count"]
    print(f"{data_type}s: {count}")


http://0.0.0.0:8000/api/v1/groups/?uuid=5873b5e3-7a48-4288-80b4-553f721e3a51&format=json
groups: 4
http://0.0.0.0:8000/api/v1/individuals/?uuid=5873b5e3-7a48-4288-80b4-553f721e3a51&format=json
individuals: 46
http://0.0.0.0:8000/api/v1/interventions/?uuid=5873b5e3-7a48-4288-80b4-553f721e3a51&format=json
interventions: 1
http://0.0.0.0:8000/api/v1/outputs/?uuid=5873b5e3-7a48-4288-80b4-553f721e3a51&format=json
outputs: 147
http://0.0.0.0:8000/api/v1/subsets/?data_type=timecourse&uuid=5873b5e3-7a48-4288-80b4-553f721e3a51&format=json
timecourses: 4
http://0.0.0.0:8000/api/v1/subsets/?data_type=scatter&uuid=5873b5e3-7a48-4288-80b4-553f721e3a51&format=json
scatters: 0


### Download data
Data can be downloaded using the `download` argument returning the information as zip archive.

In [86]:
import os
import requests, zipfile, io
import tempfile

url = f"{base_url}/filter/?studies__name=Abernethy1982&download=true"
print(url)

r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))

with tempfile.TemporaryDirectory() as tmpdir:
    print('created temporary directory', tmpdir)
    z.extractall(tmpdir)
    
    # zip contains information on studies, groups, individuals, interventions, outputs, timecourses, scatters
    print(os.listdir(tmpdir))
    
    # loading the outputs as DataFrame
    df = pd.read_csv(os.path.join(tmpdir, "outputs.csv"), index_col=0)
    print(df)
    

http://0.0.0.0:8000/api/v1/filter/?studies__name=Abernethy1982&download=true
created temporary directory /tmp/tmpj0_b0owr
['studies.csv', 'scatter.csv', 'timecourses.csv', 'groups.csv', 'outputs.csv', 'interventions.csv', 'individuals.csv']
     study_sid     study_name  output_pk  intervention_pk  group_pk  \
0    PKDB00198  Abernethy1982         29                2       6.0   
1    PKDB00198  Abernethy1982         31                2       4.0   
2    PKDB00198  Abernethy1982         23                2       3.0   
3    PKDB00198  Abernethy1982         32                2       4.0   
4    PKDB00198  Abernethy1982         30                2       6.0   
..         ...            ...        ...              ...       ...   
142  PKDB00198  Abernethy1982        288                2       NaN   
143  PKDB00198  Abernethy1982        289                2       NaN   
144  PKDB00198  Abernethy1982        290                2       NaN   
145  PKDB00198  Abernethy1982        293         