# PubMedQA: A Dataset for Biomedical Research Question Answering

Group: TLDR

* Federica Maria Laudizi
* Francesca Visalli
* Margherita Marino
* Tomaz Maia Suller

This notebook provides functions for performing inference on Large Language Models (LLMs) and saving resulting predictions on the PubMedQA labelled dataset for later analysis.

THis notebook is kept separate from the others mainly due to environment issues with the vLLM library, as some members of the group were not able to install it. Leaving them separate also gave us more flexibility to run experiments in an independent manner.

## Environment setup

We use [vLLM](https://vllm.ai) to perform inference based on the models we load from HuggingFace, as that provided the best inference performance when generating text in batches, largely due to its optimisation of the model before execution, its prefix cache -- which caches activations for the system prompt -- and its ability to seamlessly use more than 1 GPU (inference was performed on Kaggle with 2 T4 GPUs).

We explored the optimum runtime as well, including with quantisation, but it provided significantly worse performance overall and less control over caching and system resource usage.

In [None]:
!pip install vllm

In [None]:
import pickle
from functools import partial

import torch
from datasets import load_dataset
from vllm import LLM, SamplingParams
from torch.utils.data import DataLoader
from tqdm.auto import tqdm

## Prompts

All prompts start by providing the model a role

```
You are an expect biomedical researcher who specializes in answering questions about research papers given their abstracts.
Use the provided abstract to answer the question in the end, and complement it with your own domain knowledge and expertise.
```

Prompts for zero-shot, non-reasoning provided only additional instructions regarding how to answer, and finished with `ANSWER:` to stimulate the model to produce an answer instead of rambling until its token limit was exhausted.

```
Write ANSWER: and then your answer.
Be extremely concise in your answer: questions must be answered only with "yes", "no" or "maybe". Avoid answering "maybe" whenever possible, but use it when you are not sure of the answer.
```

Meanwhile, prompts meant to stimulate reasoning added a request to "think step-by-step" at the beginning, as well as an opening `<think>` tag in the end.

```
Take your time to think step-by-step. After you are done thinking, write ANSWER: [...]
```

Some prompts provide options for the model to choose from to make decoding answers easier.

Finally, RAG prompts incorporate in the `context` field the text and answer of the most similar question in the artificial dataset to the one we provide for inference. Similarity was measured using dot product similarity between embeddings computed by BioSentVec.

In [None]:
COT_PROMPT_TEMPLATE = """
You are an expect biomedical resercher who specializes in answering questions about research papers given their abstracts.
Use the provided abstract to answer the question in the end, and complement it with your own domain knowledge and expertise.
Take your time to think step-by-step. After you are done thinking, write ANSWER: and then your answer.
Be extremely concise in your answers: questions must be answered only with "yes", "no" or "maybe". Avoid answering "maybe" whenever possible, but use it when you are not sure of the answer.

{abstract}

QUESTION
{question}

<think>
""".strip()

COT_RAG_PROMPT_TEMPLATE = """
You are an expect biomedical resercher who specializes in answering questions about research papers given their abstracts.
Use the provided abstract to answer the question in the end, and complement it with your own domain knowledge and expertise.
Take your time to think step-by-step. After you are done thinking, write ANSWER: and then your answer.
Be extremely concise in your answers: questions must be answered only with "yes", "no" or "maybe". Avoid answering "maybe" whenever possible, but use it when you are not sure of the answer.

{context}

{abstract}

QUESTION
{question}

<think>
""".strip()

COT_RAG_INSTRUCT_PROMPT_TEMPLATE = """
You are an expect biomedical resercher who specializes in answering questions about research papers given their abstracts.
Use the provided abstract to answer the question in the end, and complement it with your own domain knowledge and expertise.
Take your time to think step-by-step and reason out loud. After you are done thinking, write ANSWER: and then your answer.
Be extremely concise in your answers: questions must be answered only with "yes", "no" or "maybe". Avoid answering "maybe" whenever possible, but use it when you are not sure of the answer.

The following question and its answer may help you in answering your question:

{context}

Now, answer the following question given the abstract:

{abstract}

QUESTION
{question}
""".strip()

ZERO_SHOT_TEMPLATE = """
You are an expect biomedical resercher who specializes in answering questions about research papers given their abstracts.
Use the provided abstract to answer the question in the end, and complement it with your own domain knowledge and expertise.
Write ANSWER: and then your answer.
Be extremely concise in your answer: questions must be answered only with "yes", "no" or "maybe". Avoid answering "maybe" whenever possible, but use it when you are not sure of the answer.

{abstract}

QUESTION
{question}

ANSWER:
""".strip()

ZERO_SHOT_MULTIPLE_CHOICE_TEMPLATE = """
You are an expect biomedical resercher who specializes in answering questions about research papers given their abstracts.
Use the provided abstract to answer the question in the end, and complement it with your own domain knowledge and expertise.
Write ANSWER: and then your answer.
Be extremely concise in your answer: questions must be answered only with the number of their option. Avoid option 3) ("maybe") whenever possible, but use it when you are not sure of the answer.

{abstract}

QUESTION
{question}

OPTIONS:
1) Yes
2) No
3) Maybe

ANSWER:
""".strip()

COT_MULTIPLE_CHOICE_PROMPT_TEMPLATE = """
You are an expect biomedical resercher who specializes in answering questions about research papers given their abstracts.
Take your time to think step-by-step. Use the provided abstract to answer the question in the end, and complement it with your own domain knowledge and expertise.
Write ANSWER: and then your answer.
Be extremely concise in your answer: questions must be answered only with the number of their option. Try to avoid option 3) ("maybe"), but use it when you are not sure of the answer. 

{abstract}

QUESTION
{question}

OPTIONS:
1) Yes
2) No
3) Maybe

<think>
""".strip()

COT_RAG_MULTIPLE_CHOICE_PROMPT_TEMPLATE = """
You are an expect biomedical resercher who specializes in answering questions about research papers given their abstracts.
Take your time to think step-by-step. Use the provided abstract to answer the question in the end, and complement it with your own domain knowledge and expertise.
Write ANSWER: and then your answer.
Be extremely concise in your answer: questions must be answered only with the number of their option. Try to avoid option 3) ("maybe"), but use it when you are not sure of the answer. 

{context}

{abstract}

QUESTION
{question}

OPTIONS:
1) Yes
2) No
3) Maybe

<think>
""".strip()

## Set experiment parameters

Here we select the prompt template, the generative model to load from HuggingFace, and the experiment name to which we save inference results.

In [None]:
PROMPT_TEMPLATE = COT_RAG_INSTRUCT_PROMPT_TEMPLATE

In [None]:
MODEL = "Qwen/Qwen2.5-1.5B-Instruct"
EXPERIMENT = "conclusion-instruct-rag-cot-multiple-choice-numbers"

## Inference

In [None]:
ds = load_dataset("parquet", data_files={"train": "/kaggle/input/nlp-pubmedqa-rag/labeled.parquet"})

In [None]:
def format_question(sample: dict, rag=False) -> dict:
    contexts = list(sample["context.contexts"])
    labels = list(sample["context.labels"])
    long_answer = sample["long_answer"]
    labels.append("conclusion")
    contexts.append(long_answer)

    abstract = "\n\n".join([
        f"{label.upper()}\n{context}"
        for label, context in zip(labels, contexts)
    ])
    if rag:
        text = PROMPT_TEMPLATE.format(
            abstract=abstract,
            question=sample["question"],
            context=sample["closest_abstract"] + "\n\n",
        )
    else:
        text = PROMPT_TEMPLATE.format(
            abstract=abstract,
            question=sample["question"],
        )
    return {
        "text": text
    }

In [None]:
text_ds = ds.map(partial(format_question, rag=True)).with_format(columns=["text"], type="torch")
text_ds

In [None]:
print(text_ds["train"][0])

In [None]:
dataloader = DataLoader(
    text_ds["train"],
    batch_size=128,
)
dataloader

In [None]:
sampling_params = SamplingParams(max_tokens=2000)

In [None]:
llm = LLM(
    model=MODEL,
    dtype="float16",                      # Mixed precision for T4 Tensor Cores :contentReference[oaicite:12]{index=12}
    # For interactive runs that can use 2xT4
    tensor_parallel_size=2,               # Shard model across both GPUs :contentReference[oaicite:13]{index=13}
    gpu_memory_utilization=0.8,           # Leave 20% buffer to avoid OOM :contentReference[oaicite:14]{index=14}
    enforce_eager=False,                  # Use CUDA graphs by default for speed :contentReference[oaicite:15]{index=15}
    enable_prefix_caching=True,
)
llm

In [None]:
all_outputs = []

with torch.no_grad():
    for batch in tqdm(dataloader):
        generated_text = llm.generate(
            batch["text"],
            sampling_params=sampling_params,
            use_tqdm=True,
        )
        all_outputs.extend(generated_text)

In [None]:
print(all_outputs[0].prompt)
print("\n", "-"*80, "\n")
print(all_outputs[0].outputs[0].text)

In [None]:
all_text = [
    output.outputs[0].text
    for output in all_outputs
]

In [None]:
with open(f"{EXPERIMENT}.pkl", "wb") as f:
    pickle.dump(all_text, f)