# RADx Data Access Requests in dbGaP
This notebook analyzes the authorized requests for datasets submitted by the COVID Rapid Acceleration of Diagnostics [RADx Initiative](https://www.nih.gov/research-training/medical-research-initiatives/radx) projects:

- [RADx Tech](https://www.nih.gov/research-training/medical-research-initiatives/radx/radx-programs#radx-tech)
- [RADx Underserved Populations (RADx-UP)](https://www.nih.gov/research-training/medical-research-initiatives/radx/radx-programs#radx-up)
- [RADx Radical (RADx-rad)](https://www.nih.gov/research-training/medical-research-initiatives/radx/radx-programs#radx-rad)
- [RADx Digital Health Technologies (RADx-DHT)](https://www.nih.gov/news-events/news-releases/nih-awards-contracts-develop-innovative-digital-health-technologies-covid-19)

Author: Peter W. Rose (pwrose@ucsd.edu)

Creation date: 2023-02-23

In [6]:
import ipywidgets as widgets
import pandas as pd
pd.set_option('display.max_colwidth', None)

In [7]:
style = {"description_width": "initial"}
radx_studies = ["RADx-dht", "RADx-rad", "RADx-up", "RADx-tech"]
radx_widget = widgets.Dropdown(options=radx_studies, description="Select study:", value="RADx-rad", style=style)

In [9]:
display(radx_widget)

Dropdown(description='Select study:', index=1, options=('RADx-dht', 'RADx-rad', 'RADx-up', 'RADx-tech'), style…

In [4]:
query = radx_widget.value
print("Query for:", query)

Query for: RADx-rad


In [5]:
filepath = "studies.csv"
download_dbgap_studies(query, filepath)

NameError: name 'download_dbgap_studies' is not defined

In [None]:
studies = pd.read_csv(filepath, usecols=["accession", "name", "description", "Study Design", "Study Consent",])

In [None]:
# List of studies
# TODO automate the studies file download for any query term

# To download the list of studies, run this query and click the "Save Results" button.
# https://www.ncbi.nlm.nih.gov/gap/advanced_search/?TERM=<query_term>

# Examples:
# https://www.ncbi.nlm.nih.gov/gap/advanced_search/?TERM=radx-rad
#study = "https://raw.githubusercontent.com/radxrad/dbgap-reporter/main/data/radx-rad_studies.csv"
# study = "https://raw.githubusercontent.com/radxrad/dbgap-reporter/main/data/radx-up_studies.csv"
# study = "https://raw.githubusercontent.com/radxrad/dbgap-reporter/main/data/radx-tech_studies.csv"
# study = "https://raw.githubusercontent.com/radxrad/dbgap-reporter/main/data/radx-dht_studies.csv"

#studies = pd.read_csv(study, usecols=["accession", "name", "description", "Study Design", "Study Consent",])

## Table of Studies

In [None]:
print("Number of studies", studies.shape[0])
studies

In [None]:
def get_download_url(accession):
    return "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetAuthorizedRequestDownload.cgi?study_id=" + accession

## Create a table of approved requests for datasets

In [None]:
authorized_requests = pd.DataFrame()

for _, row in studies.iterrows():
    df = pd.read_csv(get_download_url(row["accession"]), 
                     usecols=["Requestor", "Affiliation", "Project", "Date of approval", "Request status",  
                              "Public Research Use Statement", "Technical Research Use Statement"],
                     sep="\t")
    df["accession"] = row["accession"]
    df["name"] = row["name"]
    authorized_requests = pd.concat([authorized_requests, df], ignore_index=True)

In [None]:
print("Number of authorized requests :", authorized_requests.shape[0])
print("Number of unique requestors   :", len(authorized_requests["Requestor"].unique()))
print("Number of unique studies      :", len(authorized_requests["accession"].unique()))
authorized_requests