# Export data from User Interface (UI) to analysis workspace
This is a tutorial notebook that walks through the process of exporting selected data from the *NHLBI BioData Catalyst® (BDC) Powered by PIC-SURE* User Interface, or UI, into an analysis workspace. This is done using the PIC-SURE Application Programming Interface, or API.

 ------- 

## Introduction to exporting data into an analysis workspace with PIC-SURE

Two things are needed to export data into an analysis workspace:
1. Personalized access token: a user-specific token that tells PIC-SURE which studies a user is authorized to access
2. Query ID: a token that describes the specific query that was built in the UI, for example, if a user has selected females with body mass index between 18 and 30 from the ARIC study

Using these two components, the API can be used to export the selected data into the analysis workspace (in this case, where this Jupyter Notebook is being run). 

## Step 1: Getting your user-specific security token
**Before running this notebook, please be sure to review the "Get your security token" documentation, which exists in the [`README.md` file](../README.md). It explains how to get a security token, which is mandatory to use the PIC-SURE API.**

To set up your token file, be sure to run the [`Workspace_setup.ipynb` file](./Workspace_setup.ipynb).

## Step 2: Setting up your notebook

### Pre-requisites for the notebook
* python 3.6 or later
* pip python package manager, already available in most systems with a python interpreter installed (link to pip)

### Install packages to connect to the PIC-SURE API
The first step to using the PIC-SURE API is to install the packages needed. The following code installs the PIC-SURE API components from GitHub, specifically:
* PIC-SURE Client
* PIC-SURE Adapter
* *BDC-PIC-SURE* Adapter

**Note that if you are using the dedicated PIC-SURE environment within the *BDC Powered by Seven Bridges* platform, the necessary packages have already been installed.**

In [None]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
# BDC Powered by Terra users uncomment the following line to specify package install location
# sys.path.insert(0, r"/home/jupyter/.local/lib/python3.7/site-packages")

In [None]:
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-client.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-adapter-hpds.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-biodatacatalyst-python-adapter-hpds.git

In [None]:
import PicSureClient
import PicSureBdcAdapter

### Connecting to a PIC-SURE resource

The following is required to get access to the PIC-SURE API:
* a network URL
* a user-specific security token

The following code specifies the network URL as the *BDC-PIC-SURE* URL and references the user-specific token saved as `token.txt`.

If you have not already retrieved your user-specific token, please refer to the "Get your security token" section of the `README.md` and the `Workspace_setup.ipynb` file.

In [None]:
# Set up connection to PIC-SURE API
PICSURE_network_URL = "https://picsure.biodatacatalyst.nhlbi.nih.gov/picsure"
token_file = "token.txt"

with open(token_file, "r") as f:
    my_token = f.read()
    
bdc = PicSureBdcAdapter.Adapter(PICSURE_network_URL, my_token)

## Step 3: Export data from a query built in the PIC-SURE UI using the Query ID

You are able to retrieve the results of a query that you have previously built using the [PIC-SURE Authorized Access UI](https://picsure.biodatacatalyst.nhlbi.nih.gov/psamaui/). After you have built your query and filtered to your cohort of interest, open the **Select and Package Data** tool in the Tool Suite. This will allow you to copy your query ID and bring it in to a Jupyter notebook. **Note that query IDs are not permanent and may expire.**

![alt How to copy PIC-SURE query ID](../imgs/get_query_ID.gif "How to copy PIC-SURE query ID")

*If you cannot view the image above:*
* BDC Powered by Seven Bridges users please view the `get_query_ID.gif` in the `imgs` folder
* BDC Powered by Terra users please [view the image in your browser](https://github.com/hms-dbmi/Access-to-Data-using-PIC-SURE-API/blob/bdc-branding/NHLBI_BioData_Catalyst/imgs/get_query_ID.gif)

In [None]:
# To run this using your notebook you must replace it with the ID value of a query that you have run.
queryID = '<<<Paste your Query ID here>>>'

In [None]:
resource = bdc.useResource('02e23f52-f354-4e8b-992c-d37c8b9ba140') # Set up the resource
dictionary = bdc.useDictionary().dictionary() # Set up the dictionary

results = resource.retrieveQueryResults(queryID) # Retrieve data from UI

# Do imports and save data as pandas dataframe
from io import StringIO
df_UI = pd.read_csv(StringIO(results), low_memory=False)

In [None]:
# View the first few records in the dataframe
df_UI.head()

The data has now been exported as a dataframe saved as `df_UI` and is ready for analysis.

## Bonus: Edit from a query built in the PIC-SURE UI using the Query ID
You can now use the PIC-SURE API to edit queries that were built in the PIC-SURE UI. To do this, follow the same steps outlined above to build a query, package the data, and retrieve a Query ID. 

In [None]:
# To run this using your notebook you must replace it with the ID value of a query that you have run.
queryID = '<<<Paste your Query ID here>>>'
query = resource.getQueryByUUID(queryID)

You can use the following code to view the filters and variables added for export. There are several different fields shown in this output.

| Field | Meaning | Output | Example |
|--------|-------------------|-------|-------|
| Query.select() | All variables included in the list (no record subsetting) | Automatically generated PIC-SURE variables; variables added to export | `\\_Topmed Study Accession with Subject ID\\`, <br />`\\_Parent Study Accession with Subject ID\\`, <br />`\\phs000820\\pht004333\\phv00219059\\sampleID\\` |
| Query.require() | All variables; only records that do not contain null values for input variables | Variables filters to all values (such as selecting both "Is a tumor" and "Is not a tumor" for a "Tumor status" variable) | `\\phs000820\\pht004333\\phv00219063\\is_tumor\\` |
| Query.anyof() | All variables; only records that contain at least one non-null value for input variables | Variables added from the Dataset modal | `\\phs000820\\pht004332\\phv00219058\\AfibYes\\` |
| Query.filter() | Only records that match filter criteria for added variables | Automatically generated PIC-SURE variables; variables that have been filtered (such as selecting only "Male" for a "Sex" variable | `categorical \| \\_consents\\ \| ['phs000820.c1']`,<br /> `categorical \| '\\phs000820\\pht004332\\phv00219057\\sex\\' \| ['Male']`, <br /> `minmax \| \\phs000820\\pht004332\\phv00219056\\age\\ \| 30 to 70` |

In [None]:
# View all of the "Select" fields added to the query
query.select().show()

# Note - this includes PIC-SURE required fields called "\\_Topmed Study Accession with Subject ID\\" and "\\_Parent Study Accession with Subject ID\\"

In [None]:
# View all of the "Require" fields added to the query
query.require().show()

In [None]:
# View all of the "Any Of" fields added to the query
query.anyof().show()

In [None]:
# View all of the "Filter" fields added to the query
query.filter().show()

# Note - this includes a PIC-SURE required field called "\\_consents\\", which informs which studies and consent codes you are authorized to access. For more information about this field, view the "1_PICSURE_API_101" notebook.

To edit the query fields, you can use the PIC-SURE API code to first delete the field, then add the field back to the query with the adjustments. The code to delete the field follows this format:
- `query.select().delete("<Insert Concept Path Here>")`
- `query.require().delete("<Insert Concept Path Here>")`
- `query.anyof().delete("<Insert Concept Path Here>")`
- `query.filter().delete("<Insert Concept Path Here>")`

Once the field has been deleted, you can use the following code to add the variable back to the query. *Note: Please review the `1_PICSURE_API_101` notebook for more information and detailed examples about how to add to a query.*

| Method | Arguments / Input | Output|
|--------|-------------------|-------|
| query.select.add() | variable names (string) or list of strings | all variables included in the list (no record subsetting)|
| query.require.add() | variable names (string) or list of strings | all variables; only records that do not contain null values for input variables |
| query.anyof.add() | variable names (string) or list of strings | all variables; only records that contain at least one non-null value for input variables |
| query.filter.add() | variable name and additional filtering values | input variable; only records that match filter criteria |

In [None]:
# EXAMPLE CODE
# Note: This code may not work with your query and is only intended to show how to set up code. Please adjust to your query and research purposes.

# Let's say we have a "Gender of participant" filtering to "Male" but want to also add "Female". We could accomplish this using the following code:

# First, delete the field
query.filter().delete("\\phs000820\\pht004332\\phv00219057\\sex\\")

# Then, add field with new criteria
query.filter().add("\\phs000820\\pht004332\\phv00219057\\sex\\", ["Male", "Female"])