# 1_get_publications_pubmed
This notebook retrieves publications from PubMed that mention a RADx-rad project (grant) number.

**Author**: Peter W Rose ([pwrose@ucsd.edu](mailto:pwrose@ucsd.edu))    
**Date:** 2025-03-13

TODO: Check for additional papers from Semantic Scholar by searching for PIs publications and checking for grant numbers.

In [1]:
import os
import pandas as pd
import ast
from datetime import datetime
import pubmed_query
import nih_utils

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
# pd.set_option('display.max_colwidth', None)

In [2]:
DATA_PATH = "../data"
DERIVED_DATA_PATH = "../derived_data"

In [3]:
current_date = datetime.today().date()
print(f"Last update: {current_date}")

Last update: 2025-06-02


## Get the list of RADx-rad Grants

In [4]:
projects = pd.read_csv(os.path.join(DATA_PATH, "grants.csv"))
projects.query("research_initiative == 'RADx-rad'", inplace=True)

## Retrieve PubMed articles that mention RADx-rad project numbers
Due to inconsistent specification of grant numbers in publication, we only use the last 8 characters of a grant number for the PubMed grant number search. The NIH Reporter refers to this number as the "project_serial_num" and this number is unique.

The RADx-rad projects started towards the end of 2020, so the earliest publications
are most likely to occur in 2021. By specifying the start year, false positives are minimized. 

False positive (non-RADx-rad related) publications arise because some awards share the same project serial number:
1. Some RADx-rad projects are supplements to another award.
2. Center grants fund a variety of projects, including RADx-rad projects.

In [5]:
projects["project_serial_num"] = projects["project_num"].apply(nih_utils.get_project_serial_num)
projects["award_type"] = projects["project_num"].apply(nih_utils.get_award_type)
projects["supplement"] = projects["project_num"].apply(nih_utils.is_supplement)
projects.fillna("", inplace=True)
projects.head()

Unnamed: 0,project_num,research_initiative,sub_project,contact_pi,co_pis,comments,project_serial_num,award_type,supplement
0,1U01DA053976-01,RADx-rad,Wastewater,"Conroy-Ben, Otakuye","Halden, Rolf U|Hamilton, Kerry",AI/AN governed by the Tribal Data Repository,DA053976,U01,False
1,1U24LM013755-01,RADx-rad,Data Coordinating Center,"Ohno-Machado, Lucila","Aronoff-Spencer, Eliah S|Xu, Hua",RADx-rad Data Coordination Center,LM013755,U24,False
2,1U01HL152410-01,RADx-rad,Novel Biosensing and VOC,"Fay, William P","Grant, Sheila Ann|Turpin, William Monroe",,HL152410,U01,False
3,1R01NR020105-01,RADx-rad,Multimodal Surveillance,"Snyder, Michael P",,,NR020105,R01,False
4,1R01DE031114-01,RADx-rad,Multimodal Surveillance,"Jokerst, Jesse Vincent",,,DE031114,R01,False


In [6]:
#start_year = 2020
start_year = 1900
end_year = 2100

In [7]:
grant_list = projects["project_serial_num"].unique()
publications = pubmed_query.get_pubmed_data_new(grant_list, start_year, end_year)
publications.rename(columns={"grant_number": "project_serial_num"}, inplace=True)
publications["doi"] = "doi:" + publications["doi"]

Processed grant DA053976: 2 publications
Processed grant LM013755: 28 publications
Processed grant HL152410: 13 publications
Processed grant NR020105: 5 publications
Processed grant DE031114: 21 publications
Processed grant DA053941: 38 publications
Processed grant LM013129: 24 publications
Processed grant DA053903: 9 publications
Processed grant AA029324: 0 publications
Processed grant TR003778: 30 publications
Processed grant AA029328: 9 publications
Processed grant HD105610: 11 publications
Processed grant DE030841: 0 publications
Processed grant MD016526: 0 publications
Processed grant HD105590: 17 publications
Processed grant TR003787: 20 publications
Processed grant AA029316: 4 publications
Processed grant HD105619: 16 publications
Processed grant DE030829: 2 publications
Processed grant DE030832: 6 publications
Processed grant AA029348: 18 publications
Processed grant HL150852: 42 publications
Processed grant HD105593: 29 publications
Processed grant TR003775: 1 publications
Pro

In [8]:
publications.columns

Index(['pm_id', 'pmc_id', 'doi', 'title', 'abstract', 'keywords', 'mesh_ids',
       'mesh_terms', 'authors', 'journal', 'year', 'publication_date',
       'article_type', 'project_serial_num'],
      dtype='object')

In [9]:
publications.head()

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,mesh_ids,mesh_terms,authors,journal,year,publication_date,article_type,project_serial_num
0,38936269,PMC11657630,doi:10.1016/j.watres.2024.121858,Quantitative microbial risk assessment (QMRA) ...,Wastewater treatment plants (WWTPs) provide vi...,"[Exposure modeling, Pathogenic microorganisms ...","[D018570, D006801, D062065, D016273, D014865, ...","[Risk Assessment, Humans, Wastewater, Occupati...","[Heida, Ashley, Maal-Bared, Rasha, Veillette, ...",Water research,2024,2024-08-15,[Journal Article],DA053976
1,34520666,PMC8495893,doi:10.1021/acs.est.1c02580,Time: A Key Driver of Uncertainty When Assessi...,,"[microplastic, nanoplastic, plastic pollution,...","[D004784, D006801, D010969, D035501, D014874]","[Environmental Monitoring, Humans, Plastics, U...","[Halden, Rolf U, Rolsky, Charles, Khan, Farhan R]",Environmental science & technology,2021,2021-10-05,"[Journal Article, Research Support, N.I.H., Ex...",DA053976
2,40332956,,doi:10.1093/jamia/ocaf064,CDEMapper: enhancing National Institutes of He...,Common Data Elements (CDEs) standardize data c...,"[common data element, data collection, data sh...",[],[],"[Wang, Yan, Huang, Jimin, He, Huan, Zhang, Vin...",Journal of the American Medical Informatics As...,2025,2025-05-07,[Journal Article],LM013755
3,39613846,PMC11607336,doi:10.1038/s41598-024-81170-y,De-identification is not enough: a comparison ...,"For sharing privacy-sensitive data, de-identif...",[],"[D006801, D057286, D003219, D016494, D018907, ...","[Humans, Electronic Health Records, Confidenti...","[Sarkar, Atiquer Rahman, Chuang, Yao-Shun, Moh...",Scientific reports,2024,2024-11-29,"[Journal Article, Comparative Study]",LM013755
4,38832084,PMC11145658,doi:10.1145/3583780.3614739,DiscoverPath: A Knowledge Refinement and Retri...,The exponential growth in scholarly publicatio...,"[Biomedical, Healthcare, Information Retrieval...",[],[],"[Chuang, Yu-Neng, Wang, Guanchu, Chang, Chia-Y...",Proceedings of the ... ACM International Confe...,2023,2023-10-01,[Journal Article],LM013755


### Add Publications that are not in PubMed or do not contain the grant number

In [10]:
non_pubmed_publications = pd.read_csv(os.path.join(DATA_PATH, "publications_other.csv"), dtype=str, keep_default_na=False)

In [11]:
# Convert string representations to lists
def safe_split(value):
    return [item.strip() for item in value.strip("[]").split(",")] if value else []

columns_to_convert = ["keywords", "mesh_ids", "mesh_terms", "authors", "article_type"]
non_pubmed_publications[columns_to_convert] = non_pubmed_publications[columns_to_convert].map(safe_split)

In [12]:
non_pubmed_publications.head()

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,mesh_ids,mesh_terms,authors,journal,article_type,year,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17
0,,,doi:10.1109/ICHI54592.2022.00070,IMI-CDE: an interactive interface for collabor...,The National Institute of Health (NIH) launche...,"[COVID-19, Common Data Element, CDE Mapping, D...",[],[],"[Tao S, Chou WC, Li J, Du J, Ram PM, Abeysingh...",2022 IEEE 10th International Conference on Hea...,[Proceedings],2022,LM013755,,,,,
1,,,doi:10.1021/acs.jpcc.2c06434,Understanding Oligonucleotide Hybridization an...,Semiconducting single-walled carbon nanotubes ...,"[Carbon nanotubes, Genetics, Hybridization, Qu...",[],[],"[Cui J, Gong X, Cho S-Y, Jin X, Yang S, Khosra...",J Phys Chem C,[Journal Article],2023,DE030829,,,,,
2,38222877.0,PMC10784670,doi:10.1016/j.conctc.2023.101246,Moana: Alternate surveillance for COVID-19 in ...,"Objective: Create a longitudinal, multi-modal ...",[],[],[],"[Morgan ER, Dillard D, Lofgren E, Maddison BK,...",Contemp Clin Trials Commun.,[Journal Article],2023,MD016526,,,,,


In [13]:
publications = pd.concat([publications, non_pubmed_publications], ignore_index=True)
publications.fillna("", inplace=True)
print(f"Number of raw publications: {publications.shape[0]}")

Number of raw publications: 723


In [14]:
publications.tail()

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,mesh_ids,mesh_terms,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17
718,34646165.0,PMC8504676,doi:10.3389/fphys.2021.747789,Deciphering the Role of microRNAs in Large-Art...,"Large artery stiffness (LAS) is a major, indep...","[aging, arterial stiffness, miR-181b, microRNA...",[],[],"[Baraban, Jay M, Tuday, Eric, Berkowitz, Dan E...",Frontiers in physiology,2021,2021-01-01,"[Journal Article, Review]",TR003780,,,,,
719,34304585.0,PMC8363557,doi:10.1161/HYPERTENSIONAHA.120.16690,Degradation of Premature-miR-181b by the Trans...,[Figure: see text].,"[aorta, cardiovascular diseases, microRNA degr...","[D000818, D001011, D001127, D004268, D051379, ...","[Animals, Aorta, Arginine Vasopressin, DNA-Bin...","[Tuday, Eric, Nakano, Mitsunori, Akiyoshi, Kei...","Hypertension (Dallas, Tex. : 1979)",2021,2021-09-01,"[Journal Article, Research Support, N.I.H., Ex...",TR003780,,,,,
720,,,doi:10.1109/ICHI54592.2022.00070,IMI-CDE: an interactive interface for collabor...,The National Institute of Health (NIH) launche...,"[COVID-19, Common Data Element, CDE Mapping, D...",[],[],"[Tao S, Chou WC, Li J, Du J, Ram PM, Abeysingh...",2022 IEEE 10th International Conference on Hea...,2022,,[Proceedings],LM013755,,,,,
721,,,doi:10.1021/acs.jpcc.2c06434,Understanding Oligonucleotide Hybridization an...,Semiconducting single-walled carbon nanotubes ...,"[Carbon nanotubes, Genetics, Hybridization, Qu...",[],[],"[Cui J, Gong X, Cho S-Y, Jin X, Yang S, Khosra...",J Phys Chem C,2023,,[Journal Article],DE030829,,,,,
722,38222877.0,PMC10784670,doi:10.1016/j.conctc.2023.101246,Moana: Alternate surveillance for COVID-19 in ...,"Objective: Create a longitudinal, multi-modal ...",[],[],[],"[Morgan ER, Dillard D, Lofgren E, Maddison BK,...",Contemp Clin Trials Commun.,2023,,[Journal Article],MD016526,,,,,


In [15]:
publications = publications.merge(projects[["project_serial_num", "award_type", "supplement", "research_initiative", "sub_project"]], on="project_serial_num")
publications.tail()

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,mesh_ids,mesh_terms,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
736,34646165.0,PMC8504676,doi:10.3389/fphys.2021.747789,Deciphering the Role of microRNAs in Large-Art...,"Large artery stiffness (LAS) is a major, indep...","[aging, arterial stiffness, miR-181b, microRNA...",[],[],"[Baraban, Jay M, Tuday, Eric, Berkowitz, Dan E...",Frontiers in physiology,2021,2021-01-01,"[Journal Article, Review]",TR003780,,,,,,U18,False,RADx-rad,Exosome
737,34304585.0,PMC8363557,doi:10.1161/HYPERTENSIONAHA.120.16690,Degradation of Premature-miR-181b by the Trans...,[Figure: see text].,"[aorta, cardiovascular diseases, microRNA degr...","[D000818, D001011, D001127, D004268, D051379, ...","[Animals, Aorta, Arginine Vasopressin, DNA-Bin...","[Tuday, Eric, Nakano, Mitsunori, Akiyoshi, Kei...","Hypertension (Dallas, Tex. : 1979)",2021,2021-09-01,"[Journal Article, Research Support, N.I.H., Ex...",TR003780,,,,,,U18,False,RADx-rad,Exosome
738,,,doi:10.1109/ICHI54592.2022.00070,IMI-CDE: an interactive interface for collabor...,The National Institute of Health (NIH) launche...,"[COVID-19, Common Data Element, CDE Mapping, D...",[],[],"[Tao S, Chou WC, Li J, Du J, Ram PM, Abeysingh...",2022 IEEE 10th International Conference on Hea...,2022,,[Proceedings],LM013755,,,,,,U24,False,RADx-rad,Data Coordinating Center
739,,,doi:10.1021/acs.jpcc.2c06434,Understanding Oligonucleotide Hybridization an...,Semiconducting single-walled carbon nanotubes ...,"[Carbon nanotubes, Genetics, Hybridization, Qu...",[],[],"[Cui J, Gong X, Cho S-Y, Jin X, Yang S, Khosra...",J Phys Chem C,2023,,[Journal Article],DE030829,,,,,,R42,False,RADx-rad,Novel Biosensing and VOC
740,38222877.0,PMC10784670,doi:10.1016/j.conctc.2023.101246,Moana: Alternate surveillance for COVID-19 in ...,"Objective: Create a longitudinal, multi-modal ...",[],[],[],"[Morgan ER, Dillard D, Lofgren E, Maddison BK,...",Contemp Clin Trials Commun.,2023,,[Journal Article],MD016526,,,,,,R01,False,RADx-rad,Multimodal Surveillance


In [16]:
# Combine the keyword and MeSH term columns and create a vertical bar separated list
publications["keywords"] = publications.apply(lambda row: "|".join(sorted(set(row["keywords"] + row["mesh_terms"]))), axis=1)
publications.drop(columns=["mesh_ids", "mesh_terms"], inplace=True)

In [17]:
publications.fillna("", inplace=True)

In [18]:
# Convert lists into "|"-separated strings
publications["authors"] = publications.apply(lambda row: "|".join(row["authors"]), axis=1)
publications["article_type"] = publications.apply(lambda row: "|".join(sorted(row["article_type"])), axis=1)
publications.head()

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
0,38936269,PMC11657630,doi:10.1016/j.watres.2024.121858,Quantitative microbial risk assessment (QMRA) ...,Wastewater treatment plants (WWTPs) provide vi...,"Exposure modeling|Humans|Models, Theoretical|O...","Heida, Ashley|Maal-Bared, Rasha|Veillette, Mar...",Water research,2024,2024-08-15,Journal Article,DA053976,,,,,,U01,False,RADx-rad,Wastewater
1,34520666,PMC8495893,doi:10.1021/acs.est.1c02580,Time: A Key Driver of Uncertainty When Assessi...,,Environmental Monitoring|Humans|Plastics|Uncer...,"Halden, Rolf U|Rolsky, Charles|Khan, Farhan R",Environmental science & technology,2021,2021-10-05,"Journal Article|Research Support, N.I.H., Extr...",DA053976,,,,,,U01,False,RADx-rad,Wastewater
2,40332956,,doi:10.1093/jamia/ocaf064,CDEMapper: enhancing National Institutes of He...,Common Data Elements (CDEs) standardize data c...,common data element|data collection|data shari...,"Wang, Yan|Huang, Jimin|He, Huan|Zhang, Vincent...",Journal of the American Medical Informatics As...,2025,2025-05-07,Journal Article,LM013755,,,,,,U24,False,RADx-rad,Data Coordinating Center
3,39613846,PMC11607336,doi:10.1038/s41598-024-81170-y,De-identification is not enough: a comparison ...,"For sharing privacy-sensitive data, de-identif...",Algorithms|Computer Security|Confidentiality|E...,"Sarkar, Atiquer Rahman|Chuang, Yao-Shun|Mohamm...",Scientific reports,2024,2024-11-29,Comparative Study|Journal Article,LM013755,,,,,,U24,False,RADx-rad,Data Coordinating Center
4,38832084,PMC11145658,doi:10.1145/3583780.3614739,DiscoverPath: A Knowledge Refinement and Retri...,The exponential growth in scholarly publicatio...,Biomedical|Healthcare|Information Retrieval Sy...,"Chuang, Yu-Neng|Wang, Guanchu|Chang, Chia-Yuan...",Proceedings of the ... ACM International Confe...,2023,2023-10-01,Journal Article,LM013755,,,,,,U24,False,RADx-rad,Data Coordinating Center


In [19]:
publications.drop_duplicates(inplace=True)

### Assign article type for preprints

In [20]:
preprint_journals = {"ArXiv", "bioRxiv", "medRxiv", "Res Sq"}
publications["article_type"] = publications.apply(
    lambda row: "Preprint" if row["journal"] in preprint_journals else row["article_type"],
    axis=1
)

### Order articles in priority order in preparation for duplicate removal
* Journal articles have highest priority
* Other article types have a lower priority (e.g., Letters, Reviews)
* Preprints or article corrections have the lowest priority
* Newer publications (by year) have priority

In [21]:
def get_priority(article_type):
    if "Journal Article" in article_type:
        return 2
    if "Preprint" in article_type:
        return 0
    return 1

publications["type_priority"] = publications["article_type"].apply(get_priority)
publications = publications.sort_values(by=["year", "type_priority"], ascending=[False, False])

In [22]:
print(publications.shape[0])
publications.query("pm_id == '38859985'")

741


Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project,type_priority
493,38859985,PMC11163376,doi:10.1093/ve/veae034,Within-host influenza viral diversity in the p...,Seasonal influenza virus predominantly evolves...,antigenic drift|influenza|next-generation sequ...,"Sobel Leonard, Ashley|Mendoza, Lydia|McFarland...",Virus evolution,2024,2024-01-01,Journal Article,HD105594,,,,,,R61,False,RADx-rad,PreVAIL kIds,2
494,38859985,PMC11163376,doi:10.1093/ve/veae034,Within-host influenza viral diversity in the p...,Seasonal influenza virus predominantly evolves...,antigenic drift|influenza|next-generation sequ...,"Sobel Leonard, Ashley|Mendoza, Lydia|McFarland...",Virus evolution,2024,2024-01-01,Journal Article,HD105594,,,,,,R33,False,RADx-rad,PreVAIL kIds,2


In [23]:
publications.drop_duplicates(inplace=True)
print(publications.shape[0])
publications.query("pm_id == '38859985'") # This publication is related to two award_types: R61 and R33

741


Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project,type_priority
493,38859985,PMC11163376,doi:10.1093/ve/veae034,Within-host influenza viral diversity in the p...,Seasonal influenza virus predominantly evolves...,antigenic drift|influenza|next-generation sequ...,"Sobel Leonard, Ashley|Mendoza, Lydia|McFarland...",Virus evolution,2024,2024-01-01,Journal Article,HD105594,,,,,,R61,False,RADx-rad,PreVAIL kIds,2
494,38859985,PMC11163376,doi:10.1093/ve/veae034,Within-host influenza viral diversity in the p...,Seasonal influenza virus predominantly evolves...,antigenic drift|influenza|next-generation sequ...,"Sobel Leonard, Ashley|Mendoza, Lydia|McFarland...",Virus evolution,2024,2024-01-01,Journal Article,HD105594,,,,,,R33,False,RADx-rad,PreVAIL kIds,2


In [24]:
publications.query("pm_id == '35532905'") # Matches two different grant numbers

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project,type_priority
25,35532905,PMC9112978,doi:10.1021/acs.analchem.2c00554,Aptamer Sandwich Lateral Flow Assay (AptaFlow)...,The COVID-19 pandemic is among the greatest he...,"Antibodies, Viral|Aptamers, Nucleotide|COVID-1...","Yang, Lucy F|Kacherovsky, Nataly|Panpradist, N...",Analytical chemistry,2022,2022-05-24,"Journal Article|Research Support, N.I.H., Extr...",LM013755,,,,,,U24,False,RADx-rad,Data Coordinating Center,2
229,35532905,PMC9112978,doi:10.1021/acs.analchem.2c00554,Aptamer Sandwich Lateral Flow Assay (AptaFlow)...,The COVID-19 pandemic is among the greatest he...,"Antibodies, Viral|Aptamers, Nucleotide|COVID-1...","Yang, Lucy F|Kacherovsky, Nataly|Panpradist, N...",Analytical chemistry,2022,2022-05-24,"Journal Article|Research Support, N.I.H., Extr...",AA029316,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing,2


In [25]:
publications.query("pm_id == '38127053'") # Matches two different grant numbers

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project,type_priority
257,38127053,PMC11001522,doi:10.1021/acs.jcim.3c00713,APIPred: An XGBoost-Based Method for Predictin...,Aptamers are single-stranded DNA or RNA oligos...,"Aptamers, Nucleotide|Molecular Docking Simulat...","Fang, Zheng|Wu, Zhongqi|Wu, Xinbo|Chen, Shixin...",Journal of chemical information and modeling,2024,2024-04-08,"Journal Article|Research Support, N.I.H., Extr...",AA029348,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing,2
603,38127053,PMC11001522,doi:10.1021/acs.jcim.3c00713,APIPred: An XGBoost-Based Method for Predictin...,Aptamers are single-stranded DNA or RNA oligos...,"Aptamers, Nucleotide|Molecular Docking Simulat...","Fang, Zheng|Wu, Zhongqi|Wu, Xinbo|Chen, Shixin...",Journal of chemical information and modeling,2024,2024-04-08,"Journal Article|Research Support, N.I.H., Extr...",DE030852,,,,,,R44,False,RADx-rad,Novel Biosensing and VOC,2


In [26]:
import pandas as pd
from Levenshtein import ratio

# Initialize an empty list to store indices of rows to keep
rows_to_keep = []

# Loop through the DataFrame and check for similar titles
for i, row in publications.iterrows():
    # Check if current title is similar to any titles in rows_to_keep
    similar_found = False
    for keep_idx in rows_to_keep:
        # Calculate similarity ratio
        similarity = ratio(row["title"], publications.loc[keep_idx, "title"])
        # Check if they have the same project number
        same_project_num = row["project_serial_num"] == publications.loc[keep_idx, "project_serial_num"]
        
        # If similar, break out of the loop
        #if similarity > 0.95 and same_project_num:
        if similarity > 0.80 and same_project_num:
            similar_found = True
            print("found similar paper:")
            print(row["title"])
            print(publications.loc[keep_idx, "title"])
            break
    
    # If no similar title was found, add the index to rows_to_keep
    if not similar_found:
        rows_to_keep.append(i)

# Filter the DataFrame to only include rows we want to keep
publications = publications.loc[rows_to_keep].drop(columns="type_priority").reset_index(drop=True)
print(f"Number of raw publications after eliminating duplicate titles: {publications.shape[0]}")

found similar paper:
A genetically modulated Toll-like-receptor-tolerant phenotype in peripheral blood cells of children with multisystem inflammatory syndrome.
A genetically modulated Toll-like receptor-tolerant phenotype in peripheral blood cells of children with multisystem inflammatory syndrome.
found similar paper:
Structurally-informed human interactome reveals proteome-wide perturbations by disease mutations.
A structurally informed human protein-protein interactome reveals proteome-wide perturbations caused by disease mutations.
found similar paper:
Within-host influenza viral diversity in the pediatric population as a function of age, vaccine, and health status.
Within-host influenza viral diversity in the pediatric population as a function of age, vaccine, and health status.
found similar paper:
Thinking Small, Stinking Big: The World of Microbial Odors.
Thinking Small, Stinking Big: The World of Microbial Odors.
found similar paper:
Geospatially-resolved public-health survei

In [27]:
publications[publications["title"].str.contains("Correction")]

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project


In [28]:
publications.query("pm_id == '38859985'")

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
115,38859985,PMC11163376,doi:10.1093/ve/veae034,Within-host influenza viral diversity in the p...,Seasonal influenza virus predominantly evolves...,antigenic drift|influenza|next-generation sequ...,"Sobel Leonard, Ashley|Mendoza, Lydia|McFarland...",Virus evolution,2024,2024-01-01,Journal Article,HD105594,,,,,,R61,False,RADx-rad,PreVAIL kIds


In [29]:
publications[publications["pm_id"].isin(["32511591", "32357959"])]

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
611,32511591,PMC7276018,doi:10.1101/2020.04.17.20069641,An 81 base-pair deletion in SARS-CoV-2 ORF7a i...,,,"Holland, LaRinda A|Kaelin, Emily A|Maqsood, Ra...",medRxiv : the preprint server for health sciences,2020,2020-04-22,Journal Article|Preprint,LM013129,,,,,,U01,True,RADx-rad,Wastewater


In [30]:
publications.sort_values(by="title", inplace=True)
publications

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
408,36275710.0,PMC9581391,doi:10.3389/fimmu.2022.1008390,"""Rogue"" neutrophil-subset [DEspR+CD11b+/CD66b+...",The correlation (Rs > 0.7) of neutrophils expr...,Acute Lung Injury|Animals|Brain Diseases|DEspR...,"Carstensen, Saskia|Müller, Meike|Tan, Glaiza L...",Frontiers in immunology,2022,2022-01-01,"Journal Article|Research Support, N.I.H., Extr...",HL119145,,,,,,U54,False,RADx-rad,Novel Biosensing and VOC
140,38214887.0,PMC10947818,doi:10.1002/anie.202316851,"""Turbo-Charged"" DNA Motors with Optimized Sequ...",DNA motors that consume chemical energy to gen...,DNA|DNA motors|DNA nanotechnology|Molecular Mo...,"Zhang, Luona|Piranej, Selma|Namazi, Arshiya|Na...",Angewandte Chemie (International ed. in English),2024,2024-03-22,"Journal Article|Research Support, N.I.H., Extr...",AA029345,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
599,34654633.0,,doi:10.1053/j.jvca.2021.09.032,2021 Clinical Practice Guidelines for Anesthes...,,Anesthesiologists|Anesthesiology|Blood Transfu...,"Huang, Jiapeng|Firestone, Scott|Moffatt-Bruce,...",Journal of cardiothoracic and vascular anesthesia,2021,2021-12-01,"Editorial|Research Support, N.I.H., Extramural",TR003787,,,,,,U18,False,RADx-rad,SCENT
15,39793745.0,,doi:10.1016/j.actbio.2025.01.006,3D bioprinting approaches for enhancing stem c...,Three-dimensional (3D) bioprinting holds immen...,3D bioprinting|Animals|Bioprinting|Humans|Nerv...,"Bektas, Cemile Kilic|Luo, Jeffrey|Conley, Bria...",Acta biomaterialia,2025,2025-01-24,"Journal Article|Research Support, N.I.H., Extr...",HL150852,,,,,,U01,False,RADx-rad,Novel Biosensing and VOC
163,38516674.0,PMC10956508,doi:10.36922/ijb.0118,3D-printed hydrogels dressings with bioactive ...,Recent advances in additive manufacturing have...,3D printing|Bioactive borate glass|Burn wound ...,"Fayyazbakhsh, Fateme|Khayat, Michael J|Sadler,...",International journal of bioprinting,2023,2023-10-15,Journal Article,HL152410,,,,,,U01,False,RADx-rad,Novel Biosensing and VOC
187,37098909.0,PMC10190252,doi:10.1128/mra.00069-23,<i>Rhizobium</i> Phage-Like Microvirus Genome ...,"We describe the genome (4,696 nucleotides [GC ...",,"Chapman, Ainsley R|Wright, Jillian M|Kaiser, N...",Microbiology resource announcements,2023,2023-05-17,Journal Article,LM013129,,,,,,U01,True,RADx-rad,Wastewater
321,34936725.0,PMC8854333,doi:10.1002/anie.202112995,A Charge-Switchable Zwitterionic Peptide for R...,The transmission of SARS-CoV-2 coronavirus has...,Biomarkers|Breath Tests|COVID-19|Colorimetric ...,"Jin, Zhicheng|Mantri, Yash|Retout, Maurice|Che...",Angewandte Chemie (International ed. in English),2022,2022-02-21,"Journal Article|Research Support, N.I.H., Extr...",DE031114,,,,,,R01,False,RADx-rad,Multimodal Surveillance
145,38507737.0,PMC11219269,doi:10.1158/1535-7163.MCT-23-0540,A Compound That Inhibits Glycolysis in Prostat...,Metastatic castration-resistant prostate cance...,"Animals|Antineoplastic Agents|Cell Line, Tumor...","Uo, Takuma|Ojo, Kayode K|Sprenger, Cynthia C T...",Molecular cancer therapeutics,2024,2024-07-02,Journal Article,HL152401,,,,,,U01,True,RADx-rad,Novel Biosensing and VOC
322,34889013.0,PMC8854376,doi:10.1002/anie.202113617,A Dual-Color Fluorescent Probe Allows Simultan...,The main protease (M<sup>pro</sup> ) and papai...,Color|Coronavirus 3C Proteases|Coronavirus Pap...,"Cheng, Yong|Borum, Raina M|Clark, Alex E|Jin, ...",Angewandte Chemie (International ed. in English),2022,2022-02-21,"Journal Article|Research Support, N.I.H., Extr...",DE031114,,,,,,R01,False,RADx-rad,Multimodal Surveillance
683,27729363.0,PMC5079441,doi:10.1161/CIRCIMAGING.116.005091,A Magnetic Resonance Imaging-Conditional Exter...,Subjects undergoing cardiac arrest within a ma...,"Animals|Defibrillators|Disease Models, Animal|...","Schmidt, Ehud J|Watkins, Ronald D|Zviman, Mene...",Circulation. Cardiovascular imaging,2016,2016-10-01,Journal Article,HL119145,,,,,,U54,False,RADx-rad,Novel Biosensing and VOC


In [31]:
publications["pmc_id"].value_counts()

pmc_id
               38
PMC12045438     3
PMC8816963      2
PMC10538431     2
PMC9397568      2
PMC9344894      2
PMC10922791     2
PMC8658056      2
PMC8616712      2
PMC8426805      2
PMC11406294     2
PMC8604633      2
PMC8905934      2
PMC9112978      2
PMC11001522     2
PMC11952872     2
PMC11465841     1
PMC11665894     1
PMC9619439      1
PMC8774157      1
PMC7359533      1
PMC10984333     1
PMC8404464      1
PMC9047211      1
PMC11823613     1
PMC10121104     1
PMC6193849      1
PMC11865829     1
PMC9581391      1
PMC8126852      1
PMC7497212      1
PMC9578294      1
PMC7904456      1
PMC8250508      1
PMC8949778      1
PMC8642528      1
PMC8483217      1
PMC6223025      1
PMC5963257      1
PMC11874078     1
PMC11823678     1
PMC10576016     1
PMC10858653     1
PMC11794116     1
PMC10855671     1
PMC8442556      1
PMC6616999      1
PMC7997853      1
PMC10393269     1
PMC9975913      1
PMC6851426      1
PMC10330619     1
PMC9106980      1
PMC11102316     1
PMC9076410      1
PMC

In [32]:
publications["article_type"].value_counts()

article_type
Journal Article                                                                                                                                                        249
Journal Article|Research Support, N.I.H., Extramural|Research Support, Non-U.S. Gov't                                                                                  119
Journal Article|Research Support, N.I.H., Extramural                                                                                                                    62
Journal Article|Research Support, N.I.H., Extramural|Research Support, Non-U.S. Gov't|Research Support, U.S. Gov't, Non-P.H.S.                                          36
Journal Article|Preprint                                                                                                                                                30
Journal Article|Review                                                                                                              

In [33]:
print(f"Raw number of publications: {publications.shape[0]}")

Raw number of publications: 692


In [34]:
publications.query("project_serial_num == 'AA029316'")

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
372,35532905,PMC9112978,doi:10.1021/acs.analchem.2c00554,Aptamer Sandwich Lateral Flow Assay (AptaFlow)...,The COVID-19 pandemic is among the greatest he...,"Antibodies, Viral|Aptamers, Nucleotide|COVID-1...","Yang, Lucy F|Kacherovsky, Nataly|Panpradist, N...",Analytical chemistry,2022,2022-05-24,"Journal Article|Research Support, N.I.H., Extr...",AA029316,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
216,37206388,PMC10189874,doi:10.1039/d3sc00439b,Aptamers 101: aptamer discovery and <i>in vitr...,Aptamers are single-stranded nucleic acids tha...,,"Yang, Lucy F|Ling, Melissa|Kacherovsky, Nataly...",Chemical science,2023,2023-05-17,Journal Article|Review,AA029316,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
520,34328683,PMC8426805,doi:10.1002/anie.202107730,Discovery and Characterization of Spike N-Term...,The coronavirus disease 2019 (COVID-19) pandem...,"Aptamers, Nucleotide|COVID-19|Enzyme-Linked Im...","Kacherovsky, Nataly|Yang, Lucy F|Dang, Ha V|Ch...",Angewandte Chemie (International ed. in English),2021,2021-09-20,"Journal Article|Research Support, N.I.H., Extr...",AA029316,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
371,35972202,PMC9397568,doi:10.1021/acs.analchem.2c01993,SCORe: SARS-CoV-2 Omicron Variant RBD-Binding ...,During the COVID-19 (coronavirus disease 2019)...,"Angiotensin-Converting Enzyme 2|Antibodies, Vi...","Yang, Lucy F|Kacherovsky, Nataly|Liang, Joey|S...",Analytical chemistry,2022,2022-09-20,"Journal Article|Research Support, N.I.H., Extr...",AA029316,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing


In [35]:
publications[publications["pm_id"].isin({"34328683", "34607084", "35200361", "36292760", "37160974", "37851606", "38214887"})]

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
140,38214887,PMC10947818,doi:10.1002/anie.202316851,"""Turbo-Charged"" DNA Motors with Optimized Sequ...",DNA motors that consume chemical energy to gen...,DNA|DNA motors|DNA nanotechnology|Molecular Mo...,"Zhang, Luona|Piranej, Selma|Namazi, Arshiya|Na...",Angewandte Chemie (International ed. in English),2024,2024-03-22,"Journal Article|Research Support, N.I.H., Extr...",AA029345,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
249,37851606,PMC10584126,doi:10.1371/journal.pone.0286988,A spatially uniform illumination source for wi...,Illumination uniformity is a critical paramete...,Lighting|Microscopy|Optical Devices,"Çelebi, İris|Aslan, Mete|Ünlü, M Selim",PloS one,2023,2023-01-01,Journal Article,HL119145,,,,,,U54,False,RADx-rad,Novel Biosensing and VOC
358,36292760,PMC9602126,doi:10.3390/genes13101874,Confounding Factors Impacting microRNA Express...,There is growing interest in saliva microRNAs ...,Biomarkers|Humans|MicroRNAs|Reproducibility of...,"Sullivan, Rhea|Montgomery, Austin|Scipioni, An...",Genes,2022,2022-10-16,"Journal Article|Research Support, N.I.H., Extr...",HD105610,,,,,,R61,False,RADx-rad,PreVAIL kIds
473,35200361,PMC8869940,doi:10.3390/bios12020101,Context-Aware Diagnostic Specificity (CADS).,Rapid detection of proteins is critical in a v...,"Humans|Models, Statistical|Proteins|Sensitivit...","McLamore, Eric S|Moreira, Geisianny|Vanegas, D...",Biosensors,2022,2022-02-07,Editorial,AA029328,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
489,34328683,PMC8426805,doi:10.1002/anie.202107730,Discovery and Characterization of Spike N-Term...,The coronavirus disease 2019 (COVID-19) pandem...,"Aptamers, Nucleotide|COVID-19|Enzyme-Linked Im...","Kacherovsky, Nataly|Yang, Lucy F|Dang, Ha V|Ch...",Angewandte Chemie (International ed. in English),2021,2021-09-20,"Journal Article|Research Support, N.I.H., Extr...",LM013755,,,,,,U24,False,RADx-rad,Data Coordinating Center
520,34328683,PMC8426805,doi:10.1002/anie.202107730,Discovery and Characterization of Spike N-Term...,The coronavirus disease 2019 (COVID-19) pandem...,"Aptamers, Nucleotide|COVID-19|Enzyme-Linked Im...","Kacherovsky, Nataly|Yang, Lucy F|Dang, Ha V|Ch...",Angewandte Chemie (International ed. in English),2021,2021-09-20,"Journal Article|Research Support, N.I.H., Extr...",AA029316,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
505,34607084,PMC8464352,doi:10.1016/j.watres.2021.117710,High-throughput sequencing of SARS-CoV-2 in wa...,Severe acute respiratory syndrome coronavirus ...,COVID-19|High-Throughput Nucleotide Sequencing...,"Fontenele, Rafaela S|Kraberger, Simona|Hadfiel...",Water research,2021,2021-10-15,Journal Article,LM013129,,,,,,U01,True,RADx-rad,Wastewater
183,37160974,PMC10169181,doi:10.1038/s41467-023-38400-0,Highly host-linked viromes in the built enviro...,Viruses in built environments (BEs) raise publ...,Alkanesulfonic Acids|Built Environment|Microbi...,"Du, Shicong|Tong, Xinzhao|Lai, Alvin C K|Chan,...",Nature communications,2023,2023-05-09,"Journal Article|Research Support, N.I.H., Extr...",DA053941,,,,,,U01,False,RADx-rad,Wastewater


In [36]:
publications[publications["doi"].isin({"doi:10.1016/j.watres.2021.117710", "doi:10.3390/genes13101874"})]

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
358,36292760,PMC9602126,doi:10.3390/genes13101874,Confounding Factors Impacting microRNA Express...,There is growing interest in saliva microRNAs ...,Biomarkers|Humans|MicroRNAs|Reproducibility of...,"Sullivan, Rhea|Montgomery, Austin|Scipioni, An...",Genes,2022,2022-10-16,"Journal Article|Research Support, N.I.H., Extr...",HD105610,,,,,,R61,False,RADx-rad,PreVAIL kIds
505,34607084,PMC8464352,doi:10.1016/j.watres.2021.117710,High-throughput sequencing of SARS-CoV-2 in wa...,Severe acute respiratory syndrome coronavirus ...,COVID-19|High-Throughput Nucleotide Sequencing...,"Fontenele, Rafaela S|Kraberger, Simona|Hadfiel...",Water research,2021,2021-10-15,Journal Article,LM013129,,,,,,U01,True,RADx-rad,Wastewater


In [37]:
# Remove duplicate DOI, project serial number pairs.
# The same publication may be related to multiple grants, 
# so we need make sure not to drop the project serial numbers.
publications.drop_duplicates(["doi", "project_serial_num"], inplace=True)
print(f"Raw number of de-duplicated publications: {publications.shape[0]}")

Raw number of de-duplicated publications: 689


In [38]:
publications[publications["doi"].isin({"doi:10.1016/j.watres.2021.117710", "doi:10.3390/genes13101874"})]

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
358,36292760,PMC9602126,doi:10.3390/genes13101874,Confounding Factors Impacting microRNA Express...,There is growing interest in saliva microRNAs ...,Biomarkers|Humans|MicroRNAs|Reproducibility of...,"Sullivan, Rhea|Montgomery, Austin|Scipioni, An...",Genes,2022,2022-10-16,"Journal Article|Research Support, N.I.H., Extr...",HD105610,,,,,,R61,False,RADx-rad,PreVAIL kIds
505,34607084,PMC8464352,doi:10.1016/j.watres.2021.117710,High-throughput sequencing of SARS-CoV-2 in wa...,Severe acute respiratory syndrome coronavirus ...,COVID-19|High-Throughput Nucleotide Sequencing...,"Fontenele, Rafaela S|Kraberger, Simona|Hadfiel...",Water research,2021,2021-10-15,Journal Article,LM013129,,,,,,U01,True,RADx-rad,Wastewater


In [39]:
publications[publications["title"].str.startswith("High-throughput sequencing of SARS-CoV-2")]

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
505,34607084,PMC8464352,doi:10.1016/j.watres.2021.117710,High-throughput sequencing of SARS-CoV-2 in wa...,Severe acute respiratory syndrome coronavirus ...,COVID-19|High-Throughput Nucleotide Sequencing...,"Fontenele, Rafaela S|Kraberger, Simona|Hadfiel...",Water research,2021,2021-10-15,Journal Article,LM013129,,,,,,U01,True,RADx-rad,Wastewater


In [40]:
# Drop rows with duplicate PMC Ids (keep the publications with a later publication date)
publications.sort_values(by="year", ascending=False, inplace=True)

In [41]:
# Drop rows with duplicate title and project serial number
publications.drop_duplicates(["title", "project_serial_num"], inplace=True)

In [42]:
print(f"Number of de-duplicated publications with DOI: {publications.shape[0]}")

Number of de-duplicated publications with DOI: 689


In [43]:
publications.to_csv(os.path.join(DERIVED_DATA_PATH, "publications_pubmed_raw.csv"), index=False)

In [44]:
publications[publications["pm_id"].isin({"34328683", "34607084", "35200361", "36292760", "37160974", "37851606", "38214887"})]

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
140,38214887,PMC10947818,doi:10.1002/anie.202316851,"""Turbo-Charged"" DNA Motors with Optimized Sequ...",DNA motors that consume chemical energy to gen...,DNA|DNA motors|DNA nanotechnology|Molecular Mo...,"Zhang, Luona|Piranej, Selma|Namazi, Arshiya|Na...",Angewandte Chemie (International ed. in English),2024,2024-03-22,"Journal Article|Research Support, N.I.H., Extr...",AA029345,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
183,37160974,PMC10169181,doi:10.1038/s41467-023-38400-0,Highly host-linked viromes in the built enviro...,Viruses in built environments (BEs) raise publ...,Alkanesulfonic Acids|Built Environment|Microbi...,"Du, Shicong|Tong, Xinzhao|Lai, Alvin C K|Chan,...",Nature communications,2023,2023-05-09,"Journal Article|Research Support, N.I.H., Extr...",DA053941,,,,,,U01,False,RADx-rad,Wastewater
249,37851606,PMC10584126,doi:10.1371/journal.pone.0286988,A spatially uniform illumination source for wi...,Illumination uniformity is a critical paramete...,Lighting|Microscopy|Optical Devices,"Çelebi, İris|Aslan, Mete|Ünlü, M Selim",PloS one,2023,2023-01-01,Journal Article,HL119145,,,,,,U54,False,RADx-rad,Novel Biosensing and VOC
473,35200361,PMC8869940,doi:10.3390/bios12020101,Context-Aware Diagnostic Specificity (CADS).,Rapid detection of proteins is critical in a v...,"Humans|Models, Statistical|Proteins|Sensitivit...","McLamore, Eric S|Moreira, Geisianny|Vanegas, D...",Biosensors,2022,2022-02-07,Editorial,AA029328,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
358,36292760,PMC9602126,doi:10.3390/genes13101874,Confounding Factors Impacting microRNA Express...,There is growing interest in saliva microRNAs ...,Biomarkers|Humans|MicroRNAs|Reproducibility of...,"Sullivan, Rhea|Montgomery, Austin|Scipioni, An...",Genes,2022,2022-10-16,"Journal Article|Research Support, N.I.H., Extr...",HD105610,,,,,,R61,False,RADx-rad,PreVAIL kIds
505,34607084,PMC8464352,doi:10.1016/j.watres.2021.117710,High-throughput sequencing of SARS-CoV-2 in wa...,Severe acute respiratory syndrome coronavirus ...,COVID-19|High-Throughput Nucleotide Sequencing...,"Fontenele, Rafaela S|Kraberger, Simona|Hadfiel...",Water research,2021,2021-10-15,Journal Article,LM013129,,,,,,U01,True,RADx-rad,Wastewater
520,34328683,PMC8426805,doi:10.1002/anie.202107730,Discovery and Characterization of Spike N-Term...,The coronavirus disease 2019 (COVID-19) pandem...,"Aptamers, Nucleotide|COVID-19|Enzyme-Linked Im...","Kacherovsky, Nataly|Yang, Lucy F|Dang, Ha V|Ch...",Angewandte Chemie (International ed. in English),2021,2021-09-20,"Journal Article|Research Support, N.I.H., Extr...",AA029316,,,,,,U01,False,RADx-rad,Automatic Detection & Tracing
489,34328683,PMC8426805,doi:10.1002/anie.202107730,Discovery and Characterization of Spike N-Term...,The coronavirus disease 2019 (COVID-19) pandem...,"Aptamers, Nucleotide|COVID-19|Enzyme-Linked Im...","Kacherovsky, Nataly|Yang, Lucy F|Dang, Ha V|Ch...",Angewandte Chemie (International ed. in English),2021,2021-09-20,"Journal Article|Research Support, N.I.H., Extr...",LM013755,,,,,,U24,False,RADx-rad,Data Coordinating Center


In [45]:
publications.tail()

Unnamed: 0,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,year,publication_date,article_type,project_serial_num,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,award_type,supplement,research_initiative,sub_project
686,27114509,PMC4868492,doi:10.1073/pnas.1525388113,Targeted erythropoietin selectively stimulates...,The design of cell-targeted protein therapeuti...,Anemia|Animals|Drug Design|Erythropoiesis|Eryt...,"Burrill, Devin R|Vernet, Andyna|Collins, James...",Proceedings of the National Academy of Science...,2016,2016-05-10,"Journal Article|Research Support, N.I.H., Extr...",HL119145,,,,,,U54,False,RADx-rad,Novel Biosensing and VOC
689,26506125,PMC4674364,doi:10.1016/j.bbadis.2015.10.019,S-adenosylhomocysteine induces inflammation th...,S-adenosylhomocysteine (SAH) can induce endoth...,Adhesion molecules|Cell Line|EZH2|Endothelial ...,"Barroso, Madalena|Kao, Derrick|Blom, Henk J|Ta...",Biochimica et biophysica acta,2016,2016-01-01,"Journal Article|Research Support, N.I.H., Extr...",HL119145,,,,,,U54,False,RADx-rad,Novel Biosensing and VOC
688,26968099,PMC5375104,doi:10.1016/j.bbalip.2016.03.007,miRNA regulation of LDL-cholesterol metabolism.,"In the past decade, microRNAs (miRNAs) have em...","Animals|Cardiovascular Diseases|Cholesterol, L...","Goedeke, Leigh|Wagschal, Alexandre|Fernández-H...",Biochimica et biophysica acta,2016,2016-12-01,"Journal Article|Research Support, N.I.H., Extr...",HL119145,,,,,,U54,False,RADx-rad,Novel Biosensing and VOC
691,26294733,PMC4666336,doi:10.3324/haematol.2015.132449,Homozygous knockout of the piezo1 gene in the ...,,Anemia|Animals|Gene Knockout Techniques|Homozy...,"Shmukler, Boris E|Huston, Nicholas C|Thon, Jon...",Haematologica,2015,2015-12-01,"Letter|Research Support, N.I.H., Extramural",HL119145,,,,,,U54,False,RADx-rad,Novel Biosensing and VOC
690,26149051,PMC5565795,doi:10.1111/jth.12942,Road blocks in making platelets for transfusion.,The production of laboratory-generated human p...,Bioreactors|Blood Platelets|Cell Culture Techn...,"Thon, J N|Medvetz, D A|Karlsson, S M|Italiano,...",Journal of thrombosis and haemostasis : JTH,2015,2015-06-01,"Journal Article|Research Support, N.I.H., Extr...",HL119145,,,,,,U54,False,RADx-rad,Novel Biosensing and VOC
