## Selection of Relevant Publications Using Likert Scoring

This notebook identifies and saves publications relevant to the objectives of the [RADx-rad program](https://www.radxrad.org/) based on Likert scale evaluations provided by multiple Large Language Models (LLMs).

## Workflow Overview

1. **Integration of Likert Scores**  
   - Load and merge Likert scores and associated scoring rationales from previous classification results ([classification notebook](../notebooks/2_classify_publications_likert.ipynb)).

2. **Exclusion of Special Cases**  
   - Filter out publications published before the RADx-rad project's initiation (2020).
   - Remove publications associated with multi-study grants unrelated directly to RADx-rad objectives.

3. **Grouping the Results from Multiple LLMs**
   - Group the Likert scores and rationales obtained by multiple LLMs for each publication.

4. **Selection and saving of RADx-rad Relevant Publications**
   - Publications are selected based on their mean Likert scores computed across LLM evaluations. 
   - Export selected publications meeting or exceeding the threshold as relevant.
   - Separately save publications not meeting the threshold for transparency and further analysis.

**Author:** Peter W. Rose ([pwrose@ucsd.edu](mailto:pwrose@ucsd.edu))  
**Date:** 2025-03-12

In [1]:
import os
import glob
import pandas as pd

In [2]:
ANNOTATION = "annotation_full_text_5" # The full text of the FOA was used to evaluate the relevance of publications to the RADx-rad program
THRESHOLD = 3.5 # mean Likert score threshold (range 1 - 5) to select relevant publications
LAST_UPDATE = "2025-06-17" # Date publications were last retrieved from PubMed

In [3]:
PUBLICATIONS = "../publications"

## 1. Data Integration: Likert Scores and Rationales
This section covers the loading and concatenating of Likert-scale evaluations from previous analyses.

In [4]:
# Models used to calculate Likert score and annotations
models = ["Llama-3.3-70B-Instruct", "DeepSeek-R1-Distill-Qwen-32B"]

In [5]:
files = glob.glob(f"../results/*/{ANNOTATION}/*.csv")
# Filter files to include only those containing one of the specified models in their path
files = [file for file in files if any(model in file for model in models)]
df = pd.concat([pd.read_csv(f, keep_default_na=False, dtype=str) for f in files], ignore_index=True)
df["result"] = df["result"].astype(int) # Likert score
df["year"] = df["year"].astype(int)

# Create a composite key (note, the pm_id is missing for a few entries)
df["identifier"] = df["pm_id"] + "_" + df["dbgap_accession"]

## 2. Exclusion of Special Cases
Publications published prior to 2020 and those from multi-study grants irrelevant to RADx-rad, are filtered out here to maintain data integrity.

In [6]:
# Exclude any publications before 2020. The RADx-rad program started in 2020.
df = df[df["year"] >= 2020]

# Exclude publications from grant HL119145 which supported many studies unrelated to RADx-rad.
# Only the following PI's and dbGaP studies are associated with the RADx-rad program:
studies = [
    ("Shafiee", "phs002561.v1.p1"),
    ("Ünlü", "phs002602.v1.p1"),
]

# Create a mask that marks rows matching any of the exceptions.
mask_exceptions = pd.concat(
    [(df["authors"].str.contains(author, case=False)) & (df["dbgap_accession"] == accession)
     for author, accession in studies],
    axis=1
).any(axis=1)

# Keep rows that do not cite grant HL119145, except if they meet the exception criteria.
df = df[(df["project_serial_num"] != "HL119145") | mask_exceptions]

## 3. Grouping the Results from Multiple LLMs
Group the Likert scores and rationales obtained by multiple LLMs for each publication.

In [7]:
# Identify the columns that define the "group"
group_cols = [
    "identifier", "pm_id", "pmc_id", "doi", "title", "abstract", "keywords", "authors", "journal", 
    "article_type", "year", "award_type", "supplement", "project_serial_num",
    "sub_project", "project_num", "dbgap_accession", "research_initiative",
    "dbgap_title", "focus", "dbgap_abstract", "id", "name", "url", "summary"
]

# Identify the columns to pivot (the ones to transform into wide format)
value_cols = [
    "result", "explanation", "prompt_tokens", 
    "completion_tokens", "elapsed_time", "cost"
]

# Pivot the DataFrame so that each model creates new columns for the value columns above
grouped_df = df.pivot_table(
    index=group_cols,            # these become the index
    columns="model",             # these become the pivoted columns
    values=value_cols,           # these are the values to fill
    aggfunc="first"              # how to aggregate if multiple rows exist
)

# Rename the multi-level columns so that each becomes e.g. "result_gpt-4o"
grouped_df.columns = [
    f"{val_col}_{model}" for val_col, model in grouped_df.columns
]

grouped_df.columns = grouped_df.columns.str.removeprefix("result_")

# Reset the index to make it a regular column
grouped_df = grouped_df.reset_index()

In [8]:
print("Number of publications - study pairs:", grouped_df.shape[0])
grouped_df.head()

Number of publications - study pairs: 521


Unnamed: 0,identifier,pm_id,pmc_id,doi,title,abstract,keywords,authors,journal,article_type,...,cost_DeepSeek-R1-Distill-Qwen-32B,cost_Llama-3.3-70B-Instruct,elapsed_time_DeepSeek-R1-Distill-Qwen-32B,elapsed_time_Llama-3.3-70B-Instruct,explanation_DeepSeek-R1-Distill-Qwen-32B,explanation_Llama-3.3-70B-Instruct,prompt_tokens_DeepSeek-R1-Distill-Qwen-32B,prompt_tokens_Llama-3.3-70B-Instruct,DeepSeek-R1-Distill-Qwen-32B,Llama-3.3-70B-Instruct
0,31521672_phs002689.v1.p1,31521672,PMC7005378,doi:10.1016/j.chest.2019.08.2185,"Estimated Ventricular Size, Asthma Severity, a...",Relative enlargement of the pulmonary artery (...,Adult|Aorta|Asthma|CT imaging|Case-Control Stu...,"Ash, Samuel Y|Sanchez-Ferrero, Gonzalo Vegas|S...",Chest,"Journal Article|Research Support, N.I.H., Extr...",...,0.0,0.0,2.2,13.9,The publication abstract discusses a study on ...,The publication abstract is not related to the...,17202,16770,1,1
1,31604088_phs002689.v1.p1,31604088,PMC6949388,doi:10.1016/j.jaci.2019.09.018,Development and initial validation of the Asth...,Tools for quantification of asthma severity ar...,Adolescent|Adult|Asthma|Asthma control|Child|F...,"Fitzpatrick, Anne M|Szefler, Stanley J|Mauger,...",The Journal of allergy and clinical immunology,"Clinical Trial, Phase III|Journal Article|Rand...",...,0.0,0.0,1.8,11.6,The publication abstract focuses on the develo...,The publication abstract is not related to the...,17149,16727,1,1
2,32032631_phs002689.v1.p1,32032631,PMC7343602,doi:10.1016/j.jaci.2020.01.039,Baseline sputum eosinophil + neutrophil subgro...,Combined elevated sputum eosinophils+neutrophi...,Adult|Aged|Asthma|Cohort Studies|Eosinophils|F...,"Hastie, Annette T|Mauger, David T|Denlinger, L...",The Journal of allergy and clinical immunology,Clinical Trial|Letter|Multicenter Study|Resear...,...,0.0,0.0,2.0,10.9,The publication abstract discusses baseline sp...,The publication abstract is not related to the...,16835,16421,1,1
3,32479111_phs002689.v1.p1,32479111,PMC7528796,doi:10.1164/rccm.201909-1813OC,Evidence for Exacerbation-Prone Asthma and Pre...,<b>Rationale:</b> Cross-sectional studies sugg...,Adult|Asthma|Biomarkers|Comorbidity|Diabetes M...,"Peters, Michael C|Mauger, David|Ross, Kristie ...",American journal of respiratory and critical c...,"Journal Article|Research Support, N.I.H., Extr...",...,0.0,0.0,2.0,10.2,The publication abstract discusses asthma exac...,The publication abstract is not related to the...,17210,16783,1,1
4,32511591_phs002527.v1.p1,32511591,PMC7276018,doi:10.1101/2020.04.17.20069641,An 81 base-pair deletion in SARS-CoV-2 ORF7a i...,,,"Holland, LaRinda A|Kaelin, Emily A|Maqsood, Ra...",medRxiv : the preprint server for health sciences,Journal Article|Preprint,...,0.0,0.0,2.5,15.8,The publication abstract describes the identif...,The publication abstract is directly related t...,12016,11758,2,5


### 4. Selection and saving of RADx-rad Relevant Publications
Publications are selected based on their mean Likert scores computed across LLM evaluations.

In [9]:
# Calculate the mean Likert score
grouped_df["likert_score"] = grouped_df[models].sum(axis=1)/len(models)

In [10]:
export_columns = ['identifier', 'pm_id', 'pmc_id', 'doi', 'title', 'abstract', 'keywords',
                  'authors', 'journal', 'article_type', 'year',
                  'project_serial_num', 'project_num',
                  'dbgap_accession', 'research_initiative', 'sub_project',
                  'likert_score', 'explanation_DeepSeek-R1-Distill-Qwen-32B',
                  'explanation_Llama-3.3-70B-Instruct']

#### 4.1. Relevant Publications (Mean Likert Score ≥ 3.5)
Publications deemed relevant to RADx-rad goals are selected and saved.

In [11]:
selected_df = grouped_df[grouped_df["likert_score"] >= THRESHOLD]
selected_df = selected_df[export_columns]

selected_path = os.path.join(PUBLICATIONS, f"radx_rad_related_publications_{LAST_UPDATE}.csv")
selected_df.to_csv(selected_path, index=False)
print(f"{selected_df.shape[0]} RADX-rad relevant publication - study pairs saved to: {selected_path}")

277 RADX-rad relevant publication - study pairs saved to: ../publications/radx_rad_related_publications_2025-06-17.csv


#### 4.2. Non-Relevant Publications (Mean Likert Score < 3.5)
Publications below the relevance threshold are stored separately for potential future review or transparency.

In [12]:
other_df = grouped_df[grouped_df["likert_score"] < THRESHOLD]
other_df = other_df[export_columns]
other_path = os.path.join(PUBLICATIONS, f"radx_rad_other_publications_{LAST_UPDATE}.csv")
other_df.to_csv(other_path, index=False)
print(f"{other_df.shape[0]} RADX-rad non-relevant publication - study pairs saved to: {other_path}")

244 RADX-rad non-relevant publication - study pairs saved to: ../publications/radx_rad_other_publications_2025-06-17.csv
