# Environment set-up

### Pre-requisite
- python 3.6 or later
- pip python package manager, already available in most systems with a python interpreter installed ([pip installation instructions](https://pip.pypa.io/en/stable/installing/))

### Packages installation

Installation of the packages listed in the `requirements.txt` file, as well as the two components of the PIC-SURE API from GitHub, that is the PIC-SURE adapter and the PIC-SURE Client.

In [None]:
!cat requirements.txt

In [None]:
import sys
!{sys.executable} -m pip install -r requirements.txt

In [None]:
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-adapter-hpds.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-client.git

Import all the external dependencies, as well as user-defined functions stored in the `python_lib` folder

In [None]:
import json
from pprint import pprint

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
from scipy import stats

import PicSureHpdsLib
import PicSureClient

#from python_lib.utils import get_multiIndex_variablesDict, joining_variablesDict_onCol

##### Setting the display parameter for tables and plots

In [None]:
# Pandas DataFrame display options
## new pd.set_option("max.rows", 100)

# Matplotlib display parameters
## new plt.rcParams["figure.figsize"] = (14,8)
font = {'weight' : 'bold',
        'size'   : 12}
plt.rc('font', **font)

## Connecting to a PIC-SURE resource

Several information are required to get access to data through the PIC-SURE API: a network URL, a resource id, and a user-specific security token.

In [None]:
PICSURE_network_URL = "https://precisionlink-biobank4discovery.childrens.harvard.edu/picsure/"
resource_id = "6aa47730-3288-4c45-bfa1-5a8730666016"
token_file = "token.txt"

In [None]:
with open(token_file, "r") as f:
    my_token = f.read()

In [None]:
client = PicSureClient.Client()
connection = client.connect(PICSURE_network_URL, my_token, True)
adapter = PicSureHpdsLib.Adapter(connection)
resource = adapter.useResource()

Two objects are created here: a `connection` and a `resource` object.

As we will only be using one single resource, **the `resource` object is actually the only one we will need to proceed with data analysis hereafter**. 

It is connected to the specific data source ID we specified, and enables to query and retrieve data from this database.

In [None]:
fullVariableDict = resource.dictionary().find("\\Laboratory Results\\").DataFrame()
fullVariableDict.to_csv("data_lab_to_csv.csv", chunksize=1000, index = True)

In [None]:
dfapi = pd.read_csv('data_lab_to_csv.csv' )
dfapi_sort = dfapi.sort_values(by=['KEY'])
dfapi_sort

## Combining the dataframes from API and Database extract.

In [None]:
combined_dfs = pd.concat([dfapi_sort, dfdb_sort])
combined_dfs = combined_dfs.sort_values(by=['KEY'])
combined_dfs

## Getting the difference of the dataframe from API and Database extract

In [None]:
symmetric_difference = combined_dfs.drop_duplicates(keep=False)
symmetric_difference.to_csv('data_lab_to_csv_diff.csv', index = True)
print(symmetric_difference)


In [None]:
#. \Laboratory Results\Laboratory\Therapeutic Drug Monitoring Toxicology\Phenotype\