# BioData Catalyst Powered by PIC-SURE: Using BDC to Search RECOVER Data
Thaweethai T, Jolley SE, Karlson EW, et al. Development of a Definition of Postacute Sequelae of SARS-CoV-2 Infection. JAMA. 2023;329(22):1934–1946. doi:10.1001/jama.2023.8823

PIC-SURE RECOVER Data Dictionary: https://docs.google.com/spreadsheets/d/1A-BGTOjEgaPRG0KqSNWLuFFHMRkflSMh4Y_wYL2AGag/edit?usp=sharing 

## Environment set-up

### Pre-requisites
* python 3.6 or later
* pip python package manager, already available in most systems with a python interpreter installed (link to pip)

### Install packages
The first step to using the PIC-SURE API is to install the packages needed. The following code installs the PIC-SURE API components from GitHub, specifically:
* PIC-SURE Client
* PIC-SURE Adapter
* *BDC-PIC-SURE* Adapter

**Note that if you are using the dedicated PIC-SURE environment within the *BDC Powered by Seven Bridges* platform, the necessary packages have already been installed.**

In [None]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# BDC Powered by Terra users uncomment the following line to specify package install location
# sys.path.insert(0, r"/home/jupyter/.local/lib/python3.7/site-packages")

In [None]:
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-client.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-python-adapter-hpds.git
!{sys.executable} -m pip install --upgrade --force-reinstall git+https://github.com/hms-dbmi/pic-sure-biodatacatalyst-python-adapter-hpds.git

In [None]:
import PicSureClient
import PicSureBdcAdapter

## Connecting to a PIC-SURE resource

The following is required to get access to the PIC-SURE API:
* a network URL
* a user-specific security token

The following code specifies the network URL as the *BDC Powered by PIC-SURE* URL and references the user-specific token saved as `token.txt`.

If you have not already retrieved your user-specific token, please refer to the "Get your security token" section of the `README.md` file and the `Workspace_setup.ipynb` file.

In [None]:
PICSURE_network_URL = "https://picsure.biodatacatalyst.nhlbi.nih.gov/picsure"
token_file = "token.txt"

with open(token_file, "r") as f:
    my_token = f.read()
    
bdc = PicSureBdcAdapter.Adapter(PICSURE_network_URL, my_token)

## Using the RECOVER Data

We can use the PASC score as defined by the publication to filter to participants that had PASC over time. 
PASC score at time of survey, based on definition from Thaweethai et al. (2023)

Brainstorming interesting questions:
* Change in PASC score over time related to COVID severity?
* Change in any PASC symptoms/components over time?

PASC symptoms:
postexertional malaise, fatigue, brain fog, dizziness, gastrointestinal symptoms, palpitations, changes in sexual desire or capacity, loss of or change in smell or taste, thirst, chronic cough, chest pain, and abnormal movements

In [None]:
# Search for derived PASC score
dictionary = bdc.useDictionary().dictionary() # Set up the dictionary
pasc_search = dictionary.find("derived pasc score")
pasc_vars = pasc_search.dataframe()
pasc_vars.head()

In [None]:
# pasc_score_derived_biostats
biostats_pasc_vars = pasc_vars[pasc_vars.derived_var_id.str.contains("pasc_score_biostats_derived")]
biostats_pasc_vars.head()

In [None]:
# Save PASC variables for baseline and first three followups
baseline_pasc = biostats_pasc_vars.HPDS_PATH[biostats_pasc_vars.derived_var_id.str.contains("baseline")].values[0]
f1_pasc = biostats_pasc_vars.HPDS_PATH[biostats_pasc_vars.derived_var_id.str.contains("f1_")].values[0]
f2_pasc = biostats_pasc_vars.HPDS_PATH[biostats_pasc_vars.derived_var_id.str.contains("f2_")].values[0]
f3_pasc = biostats_pasc_vars.HPDS_PATH[biostats_pasc_vars.derived_var_id.str.contains("f3_")].values[0]

In [None]:
# Search for headache variables


In [None]:
headpain_search = dictionary.find("head pain")
headpain_vars = headpain_search.dataframe()
headpain_vars = headpain_vars[headpain_vars.studyId == "phs003463"]
headpain_vars.head()

In [None]:
headpain_now_vars = headpain_vars[headpain_vars.varId.str.contains("pain_head___now")]
headpain_now_vars.head()

In [None]:
# Save head pain variables for baseline and first three followups
baseline_headpain = headpain_now_vars.HPDS_PATH[headpain_now_vars.derived_var_id.str.contains("baseline")].values[0]
f1_headpain = headpain_now_vars.HPDS_PATH[headpain_now_vars.derived_var_id.str.contains("f1_")].values[0]
f2_headpain = headpain_now_vars.HPDS_PATH[headpain_now_vars.derived_var_id.str.contains("f2_")].values[0]
f3_headpain = headpain_now_vars.HPDS_PATH[headpain_now_vars.derived_var_id.str.contains("f3_")].values[0]

In [None]:
# Build a query
authPicSure = bdc.useAuthPicSure()
pasc_headpain_query = authPicSure.query()
pasc_headpain_query.require().add([baseline_pasc, f1_pasc, f2_pasc, f3_pasc, baseline_headpain, f1_headpain, f2_headpain, f3_headpain])

In [None]:
results = pasc_headpain_query.getResultsDataFrame(low_memory = False)

In [None]:
results.head()

In [None]:
# PASC scores over time

survey = ("Baseline", "Followup 1", "Followup 2", "Followup 3") #species

baseline_neg = sum(results[baseline_pasc] < 12)
baseline_pos = sum(results[baseline_pasc] >= 12)
f1_neg = sum(results[f1_pasc] < 12)
f1_pos = sum(results[f1_pasc] >= 12)
f2_neg = sum(results[f2_pasc] < 12)
f2_pos = sum(results[f2_pasc] >= 12)
f3_neg = sum(results[f3_pasc] < 12)
f3_pos = sum(results[f3_pasc] >= 12)


pasc_results = {
    'PASC Negative': (baseline_neg, f1_neg, f2_neg, f3_neg),
    'PASC Positive': (baseline_pos, f1_pos, f2_pos, f3_pos)
}

x = np.arange(len(survey))  # the label locations
width = 1/3  # the width of the bars
multiplier = 0

fig, ax = plt.subplots(layout='constrained')

for attribute, participants in pasc_results.items():
    offset = width * multiplier
    rects = ax.bar(x + offset, participants, width, label=attribute)
    ax.bar_label(rects, padding=3)
    multiplier += 1

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Number of RECOVER participants')
ax.set_title('PASC ')
ax.set_xticks(x + width, survey)
ax.legend(loc='upper left', ncols=3)
ax.set_ylim(0, 8500)

plt.show()

In [None]:
vegetables = ["cucumber", "tomato", "lettuce", "asparagus",
              "potato", "wheat", "barley"]
farmers = ["Farmer Joe", "Upland Bros.", "Smith Gardening",
           "Agrifun", "Organiculture", "BioGoods Ltd.", "Cornylee Corp."]

harvest = np.array([[0.8, 2.4, 2.5, 3.9, 0.0, 4.0, 0.0],
                    [2.4, 0.0, 4.0, 1.0, 2.7, 0.0, 0.0],
                    [1.1, 2.4, 0.8, 4.3, 1.9, 4.4, 0.0],
                    [0.6, 0.0, 0.3, 0.0, 3.1, 0.0, 0.0],
                    [0.7, 1.7, 0.6, 2.6, 2.2, 6.2, 0.0],
                    [1.3, 1.2, 0.0, 0.0, 0.0, 3.2, 5.1],
                    [0.1, 2.0, 0.0, 1.4, 0.0, 1.9, 6.3]])


fig, ax = plt.subplots()
im = ax.imshow(harvest)

# Show all ticks and label them with the respective list entries
ax.set_xticks(np.arange(len(farmers)), labels=farmers)
ax.set_yticks(np.arange(len(vegetables)), labels=vegetables)

# Rotate the tick labels and set their alignment.
plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
         rotation_mode="anchor")

# Loop over data dimensions and create text annotations.
for i in range(len(vegetables)):
    for j in range(len(farmers)):
        text = ax.text(j, i, harvest[i, j],
                       ha="center", va="center", color="w")

ax.set_title("Harvest of local farmers (in tons/year)")
fig.tight_layout()
plt.show()

In [None]:
for i in range(0, 35):
    print(i)

In [None]:
results.shape

In [None]:
7232+1775