# Environment set-up

### Pre-requisite
- python 3.6 or later
- pip python package manager, already available in most systems with a python interpreter installed ([pip installation instructions](https://pip.pypa.io/en/stable/installing/))

### Packages installation

Installation of the packages listed in the `requirements.txt` file, as well as the two components of the PIC-SURE API from GitHub, that is the PIC-SURE adapter and the PIC-SURE Client.

In [None]:
!cat requirements.txt

In [None]:
import sys
!{sys.executable} -m pip install -r requirements.txt

In [None]:
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-adapter-hpds.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-client.git

##### Import all the external dependencies.

In [None]:
import json
from pprint import pprint

import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt
from scipy import stats

import PicSureHpdsLib
import PicSureClient


##### Setting the display parameter for tables and plots

In [None]:
# Pandas DataFrame display options
pd.set_option("max.rows", 100)

# Matplotlib display parameters
plt.rcParams["figure.figsize"] = (14,8)
font = {'weight' : 'bold',
        'size'   : 12}
plt.rc('font', **font)

## Connecting to a PIC-SURE resource

Several information are required to get access to data through the PIC-SURE API: a network URL, a resource id, and a user-specific security token.

In [None]:
PICSURE_network_URL = "https://precisionlink-biobank4discovery.childrens.harvard.edu/picsure/"
resource_id = "6aa47730-3288-4c45-bfa1-5a8730666016"
token_file = "token.txt"

In [None]:
with open(token_file, "r") as f:
    my_token = f.read()

In [None]:
client = PicSureClient.Client()
connection = client.connect(PICSURE_network_URL, my_token, True)
adapter = PicSureHpdsLib.Adapter(connection)
resource = adapter.useResource()

# Two objects are created here: a `connection` and a `resource` object.

As we will only be using one single resource, **the `resource` object is actually the only one we will need to proceed with data analysis hereafter**. 

It is connected to the specific data source ID we specified, and enables to query and retrieve data from this database
And print out numerical, categorical and combination of numerical and categorical filter patient counts.


In [None]:
#Validate  numeric filters for Age
dictionary = resource.dictionary()
dictionary_search = dictionary.find("Age")
dictionary_search.DataFrame()

In [None]:
#Validate  numeric filters for Age

my_query = resource.query()
my_query.filter().add("\\Demographics\\Age\\", max=5)
my_count = my_query.getCount()
print( "Patients less than 5 yrs count : " + str( my_count ))


In [None]:
#Validate  numeric filters for Calcium

dictionary = resource.dictionary()
dictionary_search = dictionary.find("Calcium")
dictionary_search.DataFrame()



In [None]:
#Validate  numeric filters for Calcium

my_query = resource.query()
my_query.filter().add("\\Laboratory Results\\Laboratory\\Chemistry\\Calcium\\", max=7)
my_count = my_query.getCount()
print( "Patients with calcium less than 7 count: " + str( my_count ))


In [None]:
#Validate variant CHD8 counts

dictionary = resource.dictionary()
dictionary_search = dictionary.find("CHD8")
dictionary_search.DataFrame()



In [None]:
#Validate variant CHD8 counts

my_query = resource.query()
my_query.filter().add("Gene_with_variant", "CHD8")
my_count = my_query.getCount()
print( "Patients with CHD8 variant counts : " + str( my_count ))

In [None]:
#Validate two numeric filters

dictionary = resource.dictionary()
dictionary_search = dictionary.find("Calcium")
dictionary_search.DataFrame()

In [None]:
#Validate two numeric filters

my_query = resource.query()
my_query.filter().add("\\Laboratory Results\\Laboratory\\Chemistry\\Calcium\\", max=7)
my_count = my_query.getCount()
print( "Patients with calcium less than 7 count: " + str( my_count ))

my_query.filter().add("\\Demographics\\Age\\", max=5)
my_count = my_query.getCount()
print( "Patients with calcium less than 7 and Age lt 5 count: " + str( my_count ))


In [None]:
# Validate categorical filter gender
dictionary = resource.dictionary()
dictionary_search = dictionary.find("\\Demographics\\Gender\\")
dictionary_search.DataFrame()

In [None]:
# Validate categorical filter gender
my_query = resource.query()
my_query.filter().add("\\Demographics\\Gender\\", "Female")
my_count = my_query.getCount()
print( "Female Patients count:  " + str( my_count ))


In [None]:
# Validate using Genomics and Clinical filter

my_query = resource.query()
my_query.filter().add("Gene_with_variant", "CHD8")
my_query.filter().add("\\Demographics\\Gender\\", "Female")
my_count = my_query.getCount()
print( "Female Patients with CHD8 variant count : " + str( my_count ))
