## Analysis
This project demonstrated the usage of pkdb for the test substance caffeine.
### Data Sources
- pkdb rest api : "http://0.0.0.0:8000/api/v1/"

### Outputs:
- data/processed/outputs.tsv
- data/processed/timecourses.tsv
- data/processed/interventions.tsv
- data/processed/individuals.tsv
- data/processed/groups.tsv

#### Groups and Individuals 
- data/processed/all_subjects.tsv

### Merged Dataframes
- data/processed/all_subjects.tsv

### Normalized Dataframes
- data/processed/all_subjects.tsv

### Changes
- 11-29-2018 : Started project

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from utils import PkdbModel , Preprocessed
import pandas as pd


In [3]:
queries = ['outputs','timecourses','interventions','individuals','groups',"studies","substances"]
#queries = ["studies","substances"]
#queries = ["substances"]
#queries = ["studies"]
#queries = ["individuals"]

### Query, Process and Save
Json data is loaded, transformed to dataframes, preprocessed and saved as tab seperated files.

In [4]:
for pk_instance in queries:
    pk_instance = PkdbModel(pk_instance)  
    pk_instance.load()
    pk_instance.preprocess()
    pk_instance.save()
    pk_instance.report()

____________________________________________________________
Name: outputs
Loaded: True
Preprocessed: True
saved: True
outputs were succsesfully saved to </home/janekg/Dev/pkdb_analysis/data/1-preprocessed/outputs.tsv>
____________________________________________________________
Name: timecourses
Loaded: True
Preprocessed: True
saved: True
timecourses were succsesfully saved to </home/janekg/Dev/pkdb_analysis/data/1-preprocessed/timecourses.tsv>
____________________________________________________________
Name: interventions
Loaded: True
Preprocessed: True
saved: True
interventions were succsesfully saved to </home/janekg/Dev/pkdb_analysis/data/1-preprocessed/interventions.tsv>
____________________________________________________________
Name: individuals
Loaded: True
Preprocessed: True
saved: True
individuals were succsesfully saved to </home/janekg/Dev/pkdb_analysis/data/1-preprocessed/individuals.tsv>
____________________________________________________________
Name: groups
Loaded: 

In [5]:
studies = PkdbModel('studies',destination='1-preprocessed')
studies.read()
frames = []
for number,row in studies.data.dropna(subset=["substances"]).iterrows():
    substances = [{"study":row["name"],"substance":x.strip()} for x in row["substances"].split(",")]
    frames.append(pd.DataFrame(substances))

study_substance_map = pd.concat(frames)

def agg_studies(x):
    return pd.Series([','.join(x["study"]),len(x["study"])], index = ["studies","study_number"])
    
substance_study = study_substance_map.groupby(['substance']).apply(agg_studies)
substance_study = substance_study.reset_index()

In [6]:
substances = PkdbModel('substances',destination='1-preprocessed')
substances.read()
substances.data = substance_study.merge(substances.data, left_on="substance", right_on="name")



substances.data.sort_values(by="study_number", ascending=False, inplace=True)
substances.save()

In [7]:
this_data = PkdbModel('outputs',destination='1-preprocessed')
this_data.read()
this_data.data.groupby(["calculated"]).apply(len)

calculated
False    7066
True     4170
dtype: int64

In [8]:
this_data = PkdbModel('interventions',destination='1-preprocessed')
this_data.read()
this_data.data.groupby(["unit"]).apply(len)

unit
gram               325
gram / hour          4
gram / kilogram     32
dtype: int64

In [9]:
this_data = PkdbModel('outputs',destination='1-preprocessed')
this_data.read()
this_data.data.groupby(["measurement_type","unit"]).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,access,tissue,interventions,calculated,raw_pk,allowed_users,substance,individual_pk,individual_name,value,...,timecourse_pk,time_unit,time,se,cv,sd,max,min,median,choice
measurement_type,unit,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
age,yr,2,2,2,2,2,2,2,0,0,0,...,0,0,0,0,0,0,2,2,0,0
amount,gram,30,30,30,30,30,30,30,30,30,30,...,0,30,30,0,0,0,0,0,0,0
auc_end,gram * hour / liter,922,922,922,922,922,922,922,387,387,386,...,571,918,918,157,157,132,14,14,11,0
auc_end,hour * mole / liter,4,4,4,4,4,4,4,0,0,0,...,2,4,4,2,2,2,0,0,0,0
auc_inf,gram * hour / liter,1078,1078,1078,1078,1078,1078,1078,537,537,535,...,459,3,3,181,178,130,40,38,27,0
auc_inf,hour * mole / liter,8,8,8,8,8,8,8,0,0,0,...,2,0,0,4,4,2,0,0,0,0
auc_relative,dimensionless,27,27,27,27,27,27,27,24,24,22,...,0,0,0,3,3,3,0,0,0,0
aumc_inf,gram * hour ** 2 / liter,2,2,2,2,2,2,2,0,0,0,...,0,0,0,2,2,2,0,0,0,0
bioavailability,dimensionless,11,11,11,11,11,11,11,8,8,8,...,0,0,0,2,2,0,0,0,0,0
clearance,liter / hour,815,815,815,815,815,815,815,464,464,464,...,386,0,0,72,72,60,18,18,9,0


### Merge
 - merge individuals and groups  --> all_subjects,
 - merge outputs and timecourses --> all_results, 
 - merge all_results with interventions and individuals  --> individuals_complete, 
 - merge all_results with interventions and groups  --> groups_complete, 
 - merge all_results with interventions and all_subjects -->  all_complete

In [10]:
prepocessed = Preprocessed()

In [11]:
prepocessed.read()
prepocessed.merge()
prepocessed.save()

  new_axis = axis.drop(labels, errors=errors)
