# PIC-SURE 102: A Complete Harmonization-focused PIC-SURE workflow

This tutorial notebook will demonstrate how a cohort developed in the PIC-SURE UI can be used for harmonization via the PIC-SURE API.
This notebook and it's correspoding video walkthrough represents a complete, harmonization focused PIC-SURE workflow integrating the UI and API.

Before running the code in this notebook, the user should follow the steps in this youtube video to create a cohort using the PIC-SURE UI. The video covers:
1. Searching for variables of interest
2. Filtering the cohort based on XX
3. Filtering the cohort based on YY (genomic)
4. Adding additional variables to export which we are interested in harmonizing
5. Copying the resulting query ID for the created cohort

In this notebook, the user will:
1. Use the PIC-SURE API to pull the selected data export into the analysis workspace using the generated query ID
2. Harmonize the variables of interest accross studies (?)
3. Visualize the results. 

---


For a more basic introduction to the python PIC-SURE API, see the `1_PICSURE_API_101.ipynb` notebook.
 
**Before running this notebook, please be sure to get a user-specific security token. For more information about how to proceed, see the \"Get your security token\" instructions in the [README.md](https://github.com/hms-dbmi/Access-to-Data-using-PIC-SURE-API/tree/master/NHLBI_BioData_Catalyst#get-your-security-token).**

## Environment set-up

### System requirements
- Python 3.6 or later
- pip python package manager, already available in most systems with a python interpreter installed

### Install packages

In [None]:
import numpy as np
import pandas as pd
import sys
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-client.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-adapter-hpds.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-biodatacatalyst-python-adapter-hpds.git@new-search

import PicSureClient
import PicSureBdcAdapter

## Connecting to a PIC-SURE network

In [None]:
# Uncomment production URL when testing in production
# PICSURE_network_URL = "https://picsure.biodatacatalyst.nhlbi.nih.gov/picsure"
PICSURE_network_URL = "https://biodatacatalyst.integration.hms.harvard.edu/picsure"
token_file = "token.txt"

with open(token_file, "r") as f:
    my_token = f.read()
    
bdc = PicSureBdcAdapter.Adapter(PICSURE_network_URL, my_token)

## 1. Export data from a query built in the PIC-SURE UI using the Query ID

In [None]:
# To run this using your notebook you must replace it with the ID value of a query that you have run.
queryID = '<<<Paste your Query ID here>>>'

resource = bdc.useDictionary() # Set up the resource
results = resource.retrieveQueryResults(queryID) # Retrieve data from UI

# Do imports and save data as pandas dataframe
from io import StringIO
df = pd.read_csv(StringIO(results), low_memory=False)

# View the first few records in the dataframe
df.head()

## 2. Harmonize the variables of interest accross studies

## 3. Visualise the results