# RADx Reporter
This notebook reports studies submitted by the COVID Rapid Acceleration of Diagnostics [RADx Initiative](https://www.nih.gov/research-training/medical-research-initiatives/radx) to the database of Genotypes and Phenotypes [dbGaP](https://www.ncbi.nlm.nih.gov/gap/) and provides a list of data access requests for the following projects:

- [RADx Radical (RADx-rad)](https://www.nih.gov/research-training/medical-research-initiatives/radx/radx-programs#radx-rad)
- [RADx Digital Health Technologies (RADx-DHT)](https://www.nih.gov/news-events/news-releases/nih-awards-contracts-develop-innovative-digital-health-technologies-covid-19)
- [RADx Tech (RADx-TECH)](https://www.nih.gov/research-training/medical-research-initiatives/radx/radx-programs#radx-tech)
- [RADx Underserved Populations (RADx-UP)](https://www.nih.gov/research-training/medical-research-initiatives/radx/radx-programs#radx-up)



For a selected RADx project, it retrieves the studies related to the project and the data access requests. Access requests used to test the RADx Data Hub functionality are excluded by default, but can be optionally included.

The number of access requests per study is a measure of data reuse. In addition, this notebook queries [Europe PMC](https://europepmc.org/) for publications and preprints that cite or mention dbGaP accession numbers or the RADx project name.

Created: 2023-03-09

Author : Peter W. Rose (pwrose@ucsd.edu)

In [1]:
#@title Select a RADx project and then select **Run All** from the **Runtime** menu { run: "auto", vertical-output: true, form-width: "50%", display-mode: "form" }
#@markdown ### Select a RADx project
project = 'RADx-rad' #@param ["RADx-rad","RADx-DHT","RADx-TECH","RADx-UP"]
print(f"Project: {project}")
query = project

testers = ["Rose, Peter ", "Ciofani, Danielle ", "Krishnamurthy, Ashok ", "Claypool, Kajal ", "Davis-Dusenbery, Brandi "]
#@markdown ### Exclude test requests
exclude_tests = True #@param {type:"boolean"}
print(f"Exclude test requests: {exclude_tests}")

Project: RADx-rad
Exclude test requests: True


In [2]:
%%capture
#@title Installing software on Google Colab
#
# Install Firefox (see https://github.com/googlecolab/colabtools/issues/3861)
#
![ ! -f "/opt/firefox/firefox" ] && wget 'https://download.mozilla.org/?product=firefox-latest&os=linux64&lang=en-US' -O firefox.tar.bz2 && tar -xjf firefox.tar.bz2 && mv firefox /opt/ && ln -s /opt/firefox/firefox /usr/local/bin/firefox

![ ! -f "/usr/local/bin/geckodriver" ] && wget 'https://github.com/mozilla/geckodriver/releases/download/v0.33.0/geckodriver-v0.33.0-linux64.tar.gz' && tar -xzf geckodriver*.tar.gz && !chmod +x geckodriver && mv geckodriver /usr/local/bin/

![ ! -f "/usr/local/bin/geckodriver" ] && apt-get install libxtst6 libdbus-glib-1-2

![ ! -f "/usr/local/bin/geckodriver" ] && pip install selenium webdriver_manager

In [3]:
#@title Importing packages
import os
import shutil
import glob
import time
from tqdm import tqdm
import pandas as pd
from urllib.request import urlopen
import json
from selenium import webdriver
from webdriver_manager.firefox import GeckoDriverManager
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.by import By
from google.colab import data_table

# Add firefox-related .so files to library path
os.environ['LD_LIBRARY_PATH'] += ':/opt/firefox'

In [4]:
#@title Running query
TMP_DIR = "/tmp"
filepath = os.path.join(TMP_DIR, "studies.csv")

def driversetup(download_dir):
    options = Options()
    options.binary_location = '/usr/local/bin/firefox'

    # run Selenium in headless mode
    options.add_argument("--headless")
    options.add_argument("--no-sandbox")

    # set preference to download file to a specified directory
    # https://stackoverflow.com/questions/60170311/how-to-switch-download-directory-using-selenium-firefox-python
    # 0: download to the desktop, 1 download to the default "Downloads" directory, 2 use specified directory
    options.set_preference("browser.download.folderList", 2)
    options.set_preference("browser.download.manager.showWhenStarting", False)
    options.set_preference("browser.download.dir", download_dir)
    options.set_preference("browser.helperApps.neverAsk.saveToDisk", "text/csv")

    # instantiate driver
    service = Service(executable_path=GeckoDriverManager().install(), log_output="geckodriver.log")
    driver = webdriver.Firefox(service=service, options=options)
    driver.implicitly_wait(5)

    return driver

def download_dbgap_studies(query, filepath):
    # clean up any previously downloaded csv files
    files = glob.glob(os.path.join(TMP_DIR, "*.csv"))
    for file in files:
        os.remove(file)

    # run advanced search
    driver = driversetup(TMP_DIR)
    driver.get(f"https://www.ncbi.nlm.nih.gov/gap/advanced_search/?TERM={query}")
    time.sleep(3)

    # download csv file
    button = driver.find_element(By.CLASS_NAME, "usa-button-secondary")
    time.sleep(3)
    button.click()

    # wait until download is completed
    for step in tqdm(range(15)):
        time.sleep(1)

    driver.close()

    # move downloaded csv file to a standard location
    move_studies_file(filepath)

def move_studies_file(filepath):
    """ Move downloaded file to a specified standard location"""
    # the file name of the downloaded csv file is unknown in advance,
    # but there should be only one csv file.
    files = glob.glob(os.path.join(TMP_DIR, "*.csv"))
    if len(files) == 1:
        shutil.move(files[0], filepath)
    else:
        print("query error")

filepath = "studies.csv"
download_dbgap_studies(query, filepath)

studies = pd.read_csv(filepath, usecols=["accession", "name", "description", "Study Design", "Study Consent",])

100%|██████████| 15/15 [00:15<00:00,  1.00s/it]


In [5]:
#@title Table of studies
print(f"Number of studies for {project}:", studies.shape[0])
if studies.shape[0] > 0:
    display(data_table.DataTable(studies, include_index=False, num_rows_per_page=10))

Number of studies for RADx-rad: 48


Unnamed: 0,accession,name,description,Study Design,Study Consent
0,phs002964.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,Self-report of sudden loss of smell or taste s...,Prospective Longitudinal Cohort,GRU --- General Research Use
1,phs002945.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,The primary objective of this study was to des...,Cross-Sectional,"GRU --- General Research Use, GRU-IRB --- Gene..."
2,phs002924.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,This project will develop biomimetic olfaction...,Case Set,GRU --- General Research Use
3,phs002782.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,Infectious disease outbreaks like Coronavirus ...,Case Set,GRU --- General Research Use
4,phs002781.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,SARS-CoV-2 infection exhibits a wide range of ...,Case Set,GRU --- General Research Use
5,phs002778.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,"Robust, efficient, and reliable testing for SA...",Case-Control,HMB --- Health/Medical/Biomedical
6,phs002747.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,"Recent studies, including ours, have suggested...",Methods,GRU --- General Research Use
7,phs002744.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,The ultimate goal of this proposal is to devel...,Methods,GRU --- General Research Use
8,phs002729.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,"Presently, the application of molecular techno...",Case Set,GRU --- General Research Use
9,phs002709.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...,"COVID-19, a global pandemic due to its rapid p...",Methods,GRU --- General Research Use


In [6]:
#@title Summary of data access requests
def get_download_url(accession):
    return "https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetAuthorizedRequestDownload.cgi?study_id=" + accession

def get_authorized_requests(studies):
    authorized_requests = pd.DataFrame()

    for _, row in tqdm(studies.iterrows(), total=studies.shape[0]):
        try:
            df = pd.read_csv(get_download_url(row["accession"]),
                             usecols=["Requestor", "Affiliation", "Project", "Date of approval", "Request status",
                                      "Public Research Use Statement", "Technical Research Use Statement"],
                            sep="\t")
            df["accession"] = row["accession"]
            df["name"] = row["name"]
            authorized_requests = pd.concat([authorized_requests, df], ignore_index=True)
        except:
            print(f"Skipping: {row['accession']} - no data access through dbGaP.")

    return authorized_requests

requests = get_authorized_requests(studies)

# exclude test requests
n_excluded = 0
if exclude_tests:
    n_excluded = requests.shape[0]
    requests = requests[~requests["Requestor"].isin(testers)]
    n_excluded = n_excluded - requests.shape[0]


# group requests to create a summary view
if requests.shape[0] > 0:
    summary = requests.groupby(["Requestor", "Affiliation", "Project", "Date of approval", "Request status",
                                "Public Research Use Statement", "Technical Research Use Statement"],
                                as_index=False)["accession"].agg(', '.join)

    summary["Number of requests"] = summary["accession"].str.count(",") + 1

# show results
print()
print()
if exclude_tests:
        print(f"{n_excluded} test requests have been excluded!")

print("Number of data access requests :", requests.shape[0])

if requests.shape[0] > 0:
    print("Number of unique requestors    :", len(requests["Requestor"].unique()))
    print("Number of unique studies       :", len(requests["accession"].unique()))
    display(data_table.DataTable(summary, include_index=False, num_rows_per_page=10))

100%|██████████| 48/48 [00:36<00:00,  1.32it/s]



49 test requests have been excluded!
Number of data access requests : 4
Number of unique requestors    : 3
Number of unique studies       : 4





Unnamed: 0,Requestor,Affiliation,Project,Date of approval,Request status,Public Research Use Statement,Technical Research Use Statement,accession,Number of requests
0,"Anwar, Mohd Mozharul",NIH,Exploration of Wearable Device Data for COVID-19,"Mar27, 2023",approved,Wearable devices collect various physiological...,The objective of the proposed research is to e...,phs002523.v1.p1,1
1,"Miguez, Maria-Jose",FLORIDA INTERNATIONAL UNIVERSITY,System analysis for COVID humoral response,"Feb10, 2023",approved,Multisystem inflammatory syndrome in children ...,Multisystem inflammatory syndrome in children ...,"phs002945.v1.p1, phs002781.v1.p1",2
2,"Solo-Gabriele, Helena",UNIVERSITY OF MIAMI CORAL GABLES,Request to access SF-RAD: Development of Proof...,"Jul26, 2023",approved,I am the Principal Investigator for the projec...,I am the Principal Investigator for the projec...,phs002525.v1.p1,1


In [7]:
#@title Detailed table of data access requests
if exclude_tests:
    print(f"{n_excluded} test requests have been excluded!")

print("Number of data access requests :", requests.shape[0])

if requests.shape[0] > 0:
    print("Number of unique requestors    :", len(requests["Requestor"].unique()))
    print("Number of unique studies       :", len(requests["accession"].unique()))
    display(data_table.DataTable(requests, include_index=False, num_rows_per_page=10))

49 test requests have been excluded!
Number of data access requests : 4
Number of unique requestors    : 3
Number of unique studies       : 4


Unnamed: 0,Requestor,Affiliation,Project,Date of approval,Request status,Public Research Use Statement,Technical Research Use Statement,accession,name
1,"Miguez, Maria-Jose",FLORIDA INTERNATIONAL UNIVERSITY,System analysis for COVID humoral response,"Feb10, 2023",approved,Multisystem inflammatory syndrome in children ...,Multisystem inflammatory syndrome in children ...,phs002945.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...
5,"Miguez, Maria-Jose",FLORIDA INTERNATIONAL UNIVERSITY,System analysis for COVID humoral response,"Feb10, 2023",approved,Multisystem inflammatory syndrome in children ...,Multisystem inflammatory syndrome in children ...,phs002781.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...
47,"Solo-Gabriele, Helena",UNIVERSITY OF MIAMI CORAL GABLES,Request to access SF-RAD: Development of Proof...,"Jul26, 2023",approved,I am the Principal Investigator for the projec...,I am the Principal Investigator for the projec...,phs002525.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...
49,"Anwar, Mohd Mozharul",NIH,Exploration of Wearable Device Data for COVID-19,"Mar27, 2023",approved,Wearable devices collect various physiological...,The objective of the proposed research is to e...,phs002523.v1.p1,Rapid Acceleration of Diagnostics - Radical (R...


In [8]:
#@title Publications that cite or mention dbGaP accession numbers (source: Europe PMC)
studies["dbgap"] = studies["accession"].apply(lambda s: s.split(".")[0])
# get list of publications from Europe PMC
dbgap_pub = pd.read_csv("ftp://ftp.ebi.ac.uk/pub/databases/pmc/TextMinedTerms/dbgap.csv")
pubs = studies.merge(dbgap_pub, on="dbgap")
pubs.fillna("", inplace=True)
print("Number of publications:", pubs.shape[0])
display(data_table.DataTable(pubs, include_index=False, num_rows_per_page=10))

Number of publications: 0


Unnamed: 0,accession,name,description,Study Design,Study Consent,dbgap,PMCID,EXTID,SOURCE


In [9]:
#@title Publications and preprints that mention  RADx project (source: Europe PMC)
url = f"https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=%22{project}%22%20AND%20(FIRST_PDATE:2020-06-01)%20AND%20sort_date:y&resultType=core&pageSize=1000&format=json&cursorMark=*"

response = urlopen(url)
data_json = json.loads(response.read())

df = pd.json_normalize(data_json["resultList"]["result"])
df.fillna("", inplace=True)

df.rename(columns={"fullTextIdList.fullTextId": "fullTextId"}, inplace=True)

if df.shape[0] > 0:
    df = df[["pmid", "fullTextId", "title", "authorString", "pubYear", "affiliation", "citedByCount"]]

    print(f"Number of publications and preprints containing the keyword: {project}: {df.shape[0]}")
    print(f"Number of citations: {df['citedByCount'].sum()}")
    display(data_table.DataTable(df, include_index=False, num_rows_per_page=10))
else:
    print(f"Number of publications and preprints containing the keyword: {project}: 0")

Number of publications and preprints containing the keyword: RADx-rad: 28
Number of citations: 264


Unnamed: 0,pmid,fullTextId,title,authorString,pubYear,affiliation,citedByCount
0,37429842.0,[PMC10333287],Real-time environmental surveillance of SARS-C...,"Puthussery JV, Ghumra DP, McBrearty KR, Dohert...",2023,"Center for Aerosol Science and Engineering, De...",0
1,,[PPR668764],Geospatially-resolved public-health surveillan...,"Tierney BT, Foox J, Ryon KA, Butler D, Damle N...",2023,,0
2,37208471.0,[PMC10199082],iOBPdb A Database for Experimentally Determine...,"Shukla S, Nakano-Baker O, Godin D, MacKenzie D...",2023,"University of Washington, Seattle, WA, USA. ss...",1
3,36965380.0,[PMC10027305],"Data-driven design of a multiplexed, peptide-s...","Nakano-Baker O, Fong H, Shukla S, Lee RV, Cai ...",2023,University of Washington Dept. of Materials Sc...,0
4,36933724.0,[PMC10017378],Wastewater surveillance uncovers regional dive...,"Fontenele RS, Yang Y, Driver EM, Magge A, Krab...",2023,"National Library of Medicine, National Institu...",0
5,36623667.0,[PMC9817413],Degradation rates influence the ability of com...,"Babler KM, Sharkey ME, Abelson S, Amirali A, B...",2023,"Department of Chemical, Environmental, and Mat...",1
6,36482179.0,[PMC9731983],Predominant SARS-CoV-2 variant impacts accurac...,"McCartney MM, Borras E, Rojas DE, Hicks TL, Ha...",2022,"Mechanical and Aerospace Engineering, UC Davis...",2
7,36493788.0,[PMC9725778],"Leveraging an established neighbourhood-level,...","Bowes DA, Driver EM, Kraberger S, Fontenele RS...",2023,The Biodesign Institute Center for Environment...,3
8,36354449.0,[PMC9688365],An Experimental Framework for Developing Point...,"Ullah SF, Moreira G, Datta SPA, McLamore E, Va...",2022,"Division of Glycoscience, Department of Chemis...",0
9,,[PPR541952],Portable Breath-Based Volatile Organic Compoun...,"Sharma R, Zang W, Tabartehfarahani A, Lam A, H...",2022,,0
