<a href="https://colab.research.google.com/github/paulokuriki/prompt_engineering/blob/main/prompt_engineering.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📑 **Radiology Report Classification with Large Language Models**

## **Learning Objectives**

By the end of this session, you will be able to:


✅ Understand how Large Language Models (LLMs) assist in radiology report classification.  
✅ Apply effective prompting techniques for medical report analysis.  
✅ Develop specialized classifiers for different radiological findings.  

---

## **Introduction**
Radiologists frequently need to classify reports based on specific findings or conditions. **Large Language Models (LLMs)** can streamline this process, improving efficiency and consistency in medical report classification. This tutorial demonstrates how to leverage LLMs for various classification tasks in radiology.

---

## **Dataset: Indiana Chest X-ray Collection**
This notebook processes the **Indiana Chest X-ray Collection**, a publicly available dataset provided by the **National Library of Medicine (NLM), National Institutes of Health (NIH),** in collaboration with **Indiana University**.

### 🎯 **Acknowledgment**
- **Dataset Source**: [Open-i (NLM)](https://openi.nlm.nih.gov/)  
- **Reference Paper**:  
  > **Demner-Fushman D, Kohli MD, Rosenman MB, Shooshan SE, Rodriguez L, Antani S, Thoma GR, McDonald CJ.**  
  > *Preparing a collection of radiology examinations for distribution and retrieval.*  
  > J Am Med Inform Assoc. 2016 Mar;23(2):304-10.  
  > DOI: [10.1093/jamia/ocv080](https://doi.org/10.1093/jamia/ocv080)  
  > PMID: [26133894](https://pubmed.ncbi.nlm.nih.gov/26133894/) | PMCID: [PMC5009925](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5009925/)  

## **Setup and Data Loading**
To begin, we'll install the required libraries and load the dataset.

In [1]:
pip install langchain_openai



In [2]:
import os
import getpass
import random
import json
import requests
import tarfile
import glob
import csv
import shutil
import xml.etree.ElementTree as ET

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
from tqdm import tqdm
from langchain_openai import ChatOpenAI
from IPython.display import HTML, display

## **Dataset Processing**
### **Steps**
1. **Download**: Retrieve the dataset from the official NLM repository.  
2. **Extract**: Unpack the `.tgz` archive.  
3. **Parse**: Extract the **Findings** section from each radiology report.  
4. **Label**: Assign a **normal** or **abnormal** classification based on MeSH (Medical Subject Headings) terms.  
5. **Export**: Save the processed data as a **CSV file**.  

In [3]:
# Define URLs and paths
TGZ_URL = "https://openi.nlm.nih.gov/imgs/collections/NLMCXR_reports.tgz"
TGZ_FILE = "NLMCXR_reports.tgz"
EXTRACT_DIR = "NLMCXR_reports_extracted"
XML_FOLDER = os.path.join(EXTRACT_DIR, "ecgen-radiology")
OUTPUT_CSV = "converted_reports.csv"

# Clean up old files and folders
def clean_up():
    if os.path.exists(TGZ_FILE):
        os.remove(TGZ_FILE)
    if os.path.exists(EXTRACT_DIR):
        shutil.rmtree(EXTRACT_DIR)
    if os.path.exists(OUTPUT_CSV):
        os.remove(OUTPUT_CSV)

# Download the TGZ file
def download_file():
    print("Downloading dataset...")
    response = requests.get(TGZ_URL, stream=True)
    response.raise_for_status()
    with open(TGZ_FILE, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print("Dataset downloaded successfully.")

# Extract the TGZ file
def extract_tgz():
    print("Extracting files...")
    with tarfile.open(TGZ_FILE, "r:gz") as tar:
        tar.extractall(EXTRACT_DIR)
    print("Files extracted successfully.")

# Parse a single XML file
def parse_xml_file(xml_path):
    try:
        tree = ET.parse(xml_path)
        root = tree.getroot()

        # Extract AbstractText elements
        abstract_texts = root.findall(".//AbstractText")

        # Extract only the "Findings" section
        findings_text = ""
        for abstract in abstract_texts:
            label = abstract.attrib.get("Label", "").lower()
            if label == "findings":
                findings_text = abstract.text.strip() if abstract.text else ""
                break  # Stop after finding the "Findings" section

        # Extract MeSH terms
        mesh_major_list = [m.text.strip() for m in root.findall(".//MeSH/major") if m.text]
        mesh_major = "|".join(mesh_major_list) if mesh_major_list else ""

        return findings_text, mesh_major
    except:
        return "", ""

# Main function
def download_prepare_dataset():
    clean_up()
    download_file()
    extract_tgz()

    xml_files = glob.glob(os.path.join(XML_FOLDER, "*.xml"))
    reports = []
    for xml_file in tqdm(xml_files, desc="Processing XML files"):
        reports.append(parse_xml_file(xml_file))

    with open(OUTPUT_CSV, "w", newline="", encoding="utf-8") as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(["Report", "MeSH Major"])
        for report, mesh_major in tqdm(reports):
            writer.writerow([report, mesh_major])

    df = pd.read_csv(OUTPUT_CSV)

    df["label"] = df["MeSH Major"].apply(lambda x: "normal" if x == "normal" else "abnormal")
    df.rename(columns={"Report": "report"}, inplace=True)
    df = df.map(lambda x: x.strip() if isinstance(x, str) else x)
    df = df.replace('', pd.NA).dropna().reset_index(drop=True)
    df.to_csv(OUTPUT_CSV, index=False)
    print("Extraction completed.")
    return df

df = download_prepare_dataset()

# Show data distribution
print(f'Dataset downloaded successfully. Total sample reports: {len(df)}\n')
print(df.label.value_counts())

Downloading dataset...
Dataset downloaded successfully.
Extracting files...
Files extracted successfully.


Processing XML files: 100%|██████████| 3955/3955 [00:01<00:00, 1987.46it/s]
100%|██████████| 3955/3955 [00:00<00:00, 26454.45it/s]


Extraction completed.
Dataset downloaded successfully. Total sample reports: 3425

label
abnormal    2219
normal      1206
Name: count, dtype: int64


## **Classification with Large Language Models (LLMs)**
To facilitate classification, we define functions that analyze radiology reports using LLMs.

In [4]:
def classify_report(report, template, looking_for=None, examples=None):
    """
    Classify a radiology report using LLM
    """
    llm = ChatOpenAI(
        model="gpt-4o-mini-2024-07-18",
        temperature=0,
        seed=42,
        model_kwargs={"response_format": {"type": "json_object"}}
    )

    # Format the template with the report and any findings
    if looking_for and examples:
        prompt = template.format(report=report, looking_for=looking_for, examples=examples)
    else:
        prompt = template.format(report=report)

    try:
        response = llm.invoke(prompt)
        result = json.loads(response.content.lower())
        return result['classification']
    except Exception as e:
        print(f"Error during classification: {e}")
        return 'error'

def classify_multiple_reports(df, n_reports=5, template=None, looking_for=None, examples=None, seed=44):
    """Classify multiple random reports"""
    results = []
    random.seed(seed)  # Set seed for reproducibility
    report_indices = random.sample(range(len(df)), n_reports)

    for idx in tqdm(report_indices, desc="Classifying reports"):
        report = df.iloc[idx]['report']
        label = df.iloc[idx]['label']
        prediction = classify_report(report, template, looking_for, examples)

        results.append({
            'index': idx,
            'report': report,
            'original_label': label,
            'predicted_label': prediction
        })

    return results

def display_results(results, show_original_label=True):
    """Display classification results in a readable format"""

    for r in results:
        print(f"\nReport #{r['index']}:")
        print("=" * 50)
        print(f"Report text:")
        display(r['report'])
        print("-" * 50)

        if show_original_label:
            # Compare prediction with the original label
            is_correct = r['original_label'].lower() == r['predicted_label'].lower()
            classification = 'Correct' if is_correct else 'Wrong'
            classification_icon = '✅' if is_correct else '👎'
            print(f"Original Label : {r['original_label']}")
            print(f"Predicted Label: {r['predicted_label']}")
            print(f"Classification : {classification} {classification_icon}")

        else:
            # Only show predicted classification when original label is hidden
            is_correct = r['predicted_label'].lower() == "present"
            classification_icon = '✓✓✓' if is_correct else 'xxx'
            classification_icon = '✅' if is_correct else '👎'
            print(f"Finding: {r['predicted_label']} {classification_icon}")

        print()


    total = len(results)
    print("SUMMARY:")
    print("=" * 50)
    print(f"\nAnalyzed {total} reports")

    # Show accuracy if original labels are available
    if show_original_label:
        correct = sum(1 for r in results if r['original_label'].lower() == r['predicted_label'].lower())

        print(f"Correct predictions: {correct}")
        print(f"Accuracy: {(correct/total)*100:.1f}%\n")


# OpenAI API Key Setup

## What is an API Key?
An API key functions like a password that grants access to OpenAI's services. It allows you to:
- Use OpenAI's models (e.g., GPT-4).
- Submit reports for analysis.
- Monitor usage and manage access securely.

## How to Obtain an OpenAI API Key:
1. Visit [OpenAI's platform](https://platform.openai.com/signup) and sign up or log in.
2. Navigate to the [API Keys page](https://platform.openai.com/api-keys).
3. Click **"Create new secret key"**.
4. Copy and securely store your API key (you won't be able to view it again!).

### Important Notes:
- **Keep your API key private** – do not share or store it in public repositories.
- **OpenAI services require payment** – ensure you have valid billing information.

In [5]:
if 'OPENAI_API_KEY' not in os.environ:
    print("Please enter your OpenAI API key (it won't be displayed as you type):")
    api_key = getpass.getpass()
    os.environ['OPENAI_API_KEY'] = api_key
    print("OpenAI API key set successfully.")
else:
    print("OpenAI API key already set.")

Please enter your OpenAI API key (it won't be displayed as you type):
··········
OpenAI API key set successfully.


# **Let's Try It Out!** 🏥

## **Zero-Shot Classification**
In **zero-shot classification**, the model determines whether a given chest X-ray report describes a normal or abnormal case without prior examples.


### How It Works:
**Prompt Template:**  
- The model receives a structured instruction to classify reports.
- The response format is **JSON**, containing the key **"classification"** with possible values: `"normal"` or `"abnormal"`.
- The `{report}` placeholder is replaced with an actual chest X-ray report before passing it to the model.


In [6]:
zero_shot_prompt_template = """
### INSTRUCTION
You are a specialist in chest X-ray reports. Your task is to classify whether a report is normal or abnormal.
Your response should be in JSON format with the key "classification" and the possible values: "normal" or "abnormal".

### REPORT TO CLASSIFY
{report}
"""


#for seed in range(60, 80):
results = classify_multiple_reports(df, n_reports=1, template=zero_shot_prompt_template, seed=73)
display_results(results, show_original_label=True)


Classifying reports: 100%|██████████| 1/1 [00:00<00:00,  1.49it/s]


Report #1146:
Report text:





'The cardiomediastinal contours are within normal limits. Pulmonary vasculature is unremarkable. There is no focal airspace opacity. No pleural effusion or pneumothorax is seen. There are mild degenerative changes along the thoracic spine. No acute bony abnormality is identified.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: normal
Classification : Wrong 👎

SUMMARY:

Analyzed 1 reports
Correct predictions: 0
Accuracy: 0.0%



### Testing Classification in Multiple Reports to measure performance

In [7]:
results = classify_multiple_reports(df, n_reports=10, template=zero_shot_prompt_template, seed=44)

display_results(results, show_original_label=True)

Classifying reports: 100%|██████████| 10/10 [00:08<00:00,  1.15it/s]


Report #1673:
Report text:





'The cardiomediastinal silhouette and vasculature are within normal limits for size and contour. Note is XXXX of an XXXX closure device which appears grossly appropriate The lungs are normally inflated and clear. Osseous structures are within normal limits for patient age.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: normal
Classification : Wrong 👎


Report #2130:
Report text:


'Chest. No focal consolidation. No visualized pneumothorax. No large pleural effusions. The heart size is normal. No focal thoracic bony findings. Hand. There is also cortical and trabecular irregularity through the XXXX of the scaphoid. There is a small cortical lucency through the base of the fourth metacarpal that may be a vascular XXXX.'

--------------------------------------------------
Original Label : normal
Predicted Label: abnormal
Classification : Wrong 👎


Report #2219:
Report text:


'No focal consolidation. No visualized pneumothorax. The heart size and cardia mediastinal silhouette is grossly unremarkable. No large effusions.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: normal
Classification : Wrong 👎


Report #2873:
Report text:


'The cardiac contours are normal. Calcified tortuous thoracic aorta. Emphysema. Mild apical scarring. The lungs are otherwise clear. Thoracic spondylosis.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: abnormal
Classification : Correct ✅


Report #477:
Report text:


'Cardiomediastinal silhouette is within normal limits. No acute bony abnormality is identified. There is slightly increased XXXX opacity of the right base compared to the left which may minimal right basilar airspace disease, XXXX in the right middle lobe. The left lung is clear. No pneumothorax or effusion identified.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: abnormal
Classification : Correct ✅


Report #723:
Report text:


'The lungs are clear. There is no pleural effusion or pneumothorax. The heart and mediastinum are normal. The skeletal structures are normal.'

--------------------------------------------------
Original Label : normal
Predicted Label: normal
Classification : Correct ✅


Report #1554:
Report text:


'There are T-spine osteophytes. Small nodule projecting near the left heart XXXX is unchanged from XXXX and appears calcified. This XXXX represents a calcified granuloma. The cardiomediastinal silhouette and pulmonary vasculature are within normal limits. There is no pneumothorax or pleural effusion. There are no focal areas of consolidation.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: abnormal
Classification : Correct ✅


Report #922:
Report text:


'Frontal and lateral views of the chest show an unchanged cardiomediastinal silhouette. Normal mediastinal contour, pulmonary XXXX and vasculature, central airways and aeration of the lungs. The inferior posterior sulcus is excluded.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: normal
Classification : Wrong 👎


Report #1186:
Report text:


'Lungs are clear bilaterally. Cardiac and mediastinal silhouettes are normal. Pulmonary vasculature is normal. No pneumothorax or pleural effusion. No acute bony abnormality.'

--------------------------------------------------
Original Label : normal
Predicted Label: normal
Classification : Correct ✅


Report #119:
Report text:


'Lungs are clear bilaterally. Cardiac and mediastinal silhouettes are normal. Pulmonary vasculature is normal. No pneumothorax or pleural effusion. No acute bony abnormality.'

--------------------------------------------------
Original Label : normal
Predicted Label: normal
Classification : Correct ✅

SUMMARY:

Analyzed 10 reports
Correct predictions: 6
Accuracy: 60.0%



## 2. Few-Shot Classification

Zero-shot classification may not always be accurate. To enhance performance, we use **few-shot prompting**, where the model is provided with a few labeled examples to improve its accuracy.

### How It Works:
**Prompt Template:**  
- The `{examples}` placeholder contains relevant instances of abnormalities.
- The `{report}` placeholder is replaced with an actual chest X-ray report.
- The model is instructed to classify a report as **abnormal** if it contains specific **findings**.

In [8]:
few_show_template = """
### INSTRUCTION
You are a specialist in chest X-ray reports.
Your task is to classify a report as abnormal if it describes signs of {looking_for}, such as:
{examples}
Consider the finding positive even if it is mild.
If multiple X-rays are reported together, focus only on the chest X-ray report.
Your response should be in JSON format with the key 'classification' and the possible values: 'normal' or 'abnormal'.

### REPORT TO CLASSIFY
{report}
"""

looking_for = 'any abnormalities'

examples = """
- Low or high lung volume.
- Abnormalities in the lungs, bones, heart, mediastinum, or pleura.
- Abnormal calcifications, granulomas, or calcified lymph nodes.
- Post-surgical changes overlying the axilla, neck, or abdominal regions.
- Presence of surgical or closure devices.
- Part of the lung was not evaluated.
"""


print(f"\n{looking_for.capitalize()} Classifier")
print(f"Examples: {examples}")


results = classify_multiple_reports(df, n_reports=10,
                                    template=few_show_template,
                                    looking_for=looking_for,
                                    examples=examples,
                                    seed=44)

display_results(results, show_original_label=True)


Any abnormalities Classifier
Examples: 
- Low or high lung volume.
- Abnormalities in the lungs, bones, heart, mediastinum, or pleura.
- Abnormal calcifications, granulomas, or calcified lymph nodes.
- Post-surgical changes overlying the axilla, neck, or abdominal regions.
- Presence of surgical or closure devices.
- Part of the lung was not evaluated.



Classifying reports: 100%|██████████| 10/10 [00:10<00:00,  1.02s/it]


Report #1673:
Report text:





'The cardiomediastinal silhouette and vasculature are within normal limits for size and contour. Note is XXXX of an XXXX closure device which appears grossly appropriate The lungs are normally inflated and clear. Osseous structures are within normal limits for patient age.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: normal
Classification : Wrong 👎


Report #2130:
Report text:


'Chest. No focal consolidation. No visualized pneumothorax. No large pleural effusions. The heart size is normal. No focal thoracic bony findings. Hand. There is also cortical and trabecular irregularity through the XXXX of the scaphoid. There is a small cortical lucency through the base of the fourth metacarpal that may be a vascular XXXX.'

--------------------------------------------------
Original Label : normal
Predicted Label: normal
Classification : Correct ✅


Report #2219:
Report text:


'No focal consolidation. No visualized pneumothorax. The heart size and cardia mediastinal silhouette is grossly unremarkable. No large effusions.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: normal
Classification : Wrong 👎


Report #2873:
Report text:


'The cardiac contours are normal. Calcified tortuous thoracic aorta. Emphysema. Mild apical scarring. The lungs are otherwise clear. Thoracic spondylosis.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: abnormal
Classification : Correct ✅


Report #477:
Report text:


'Cardiomediastinal silhouette is within normal limits. No acute bony abnormality is identified. There is slightly increased XXXX opacity of the right base compared to the left which may minimal right basilar airspace disease, XXXX in the right middle lobe. The left lung is clear. No pneumothorax or effusion identified.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: abnormal
Classification : Correct ✅


Report #723:
Report text:


'The lungs are clear. There is no pleural effusion or pneumothorax. The heart and mediastinum are normal. The skeletal structures are normal.'

--------------------------------------------------
Original Label : normal
Predicted Label: normal
Classification : Correct ✅


Report #1554:
Report text:


'There are T-spine osteophytes. Small nodule projecting near the left heart XXXX is unchanged from XXXX and appears calcified. This XXXX represents a calcified granuloma. The cardiomediastinal silhouette and pulmonary vasculature are within normal limits. There is no pneumothorax or pleural effusion. There are no focal areas of consolidation.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: abnormal
Classification : Correct ✅


Report #922:
Report text:


'Frontal and lateral views of the chest show an unchanged cardiomediastinal silhouette. Normal mediastinal contour, pulmonary XXXX and vasculature, central airways and aeration of the lungs. The inferior posterior sulcus is excluded.'

--------------------------------------------------
Original Label : abnormal
Predicted Label: abnormal
Classification : Correct ✅


Report #1186:
Report text:


'Lungs are clear bilaterally. Cardiac and mediastinal silhouettes are normal. Pulmonary vasculature is normal. No pneumothorax or pleural effusion. No acute bony abnormality.'

--------------------------------------------------
Original Label : normal
Predicted Label: normal
Classification : Correct ✅


Report #119:
Report text:


'Lungs are clear bilaterally. Cardiac and mediastinal silhouettes are normal. Pulmonary vasculature is normal. No pneumothorax or pleural effusion. No acute bony abnormality.'

--------------------------------------------------
Original Label : normal
Predicted Label: normal
Classification : Correct ✅

SUMMARY:

Analyzed 10 reports
Correct predictions: 8
Accuracy: 80.0%



## 3. Condition-Specific Classification

In some cases, you may need to classify reports based on specific medical conditions rather than general abnormalities.

### How It Works:
**Prompt Template:**  
- The `{looking_for}` placeholder specifies the condition to be detected (e.g., cardiomegaly, COPD).
- The `{examples}` placeholder contains key indicators related to the condition.
- The model classifies the report as **"present"** or **"absent"** based on the findings.

### Example: Cardiomegaly Detection
- **Signs of Cardiomegaly:**
  - Increased heart size
  - Enlarged cardiac silhouette
  - Increased cardiomediastinal silhouette

In [9]:
finding_specific_template = """
### INSTRUCTION
You are a specialist in chest X-ray reports.
Your task is to classify a report as positive if it describes signs of {looking_for}, such as:
{examples}
Consider the finding positive even if it is mild.
Your response should be in JSON format with the key 'classification' and the possible values: 'present' or 'absent'.

### REPORT TO CLASSIFY
{report}
"""

looking_for = 'cardiomegaly'

examples = """
- Increased heart size.
- Enlarged cardiac silhouette.
- Increased cardiomediastinal silhouette.
"""

print(f"\n{looking_for.capitalize()} Detection")
print(f"Examples: {examples}")


results = classify_multiple_reports(df,
                                    n_reports=10,
                                    template=finding_specific_template,
                                    looking_for=looking_for,
                                    examples=examples,
                                    seed=42)

display_results(results, show_original_label=False)


Cardiomegaly Detection
Examples: 
- Increased heart size.
- Enlarged cardiac silhouette.
- Increased cardiomediastinal silhouette.



Classifying reports: 100%|██████████| 10/10 [00:06<00:00,  1.43it/s]


Report #2619:
Report text:





'Heart size is stable. There is focal airspace consolidation in the lateral aspect of the right upper lobe. There is no pneumothorax or effusion. No acute bony abnormalities.'

--------------------------------------------------
Finding: absent 👎


Report #456:
Report text:


'The cardiac contours are normal. The lungs are clear. Thoracic spondylosis. Prior cholecystectomy'

--------------------------------------------------
Finding: absent 👎


Report #102:
Report text:


'There is borderline cardiomegaly. Mediastinum and pulmonary vasculature are unremarkable. Lungs are clear. No pleural fluid or pneumothorax is appreciated.'

--------------------------------------------------
Finding: present ✅


Report #3037:
Report text:


'Heart size is normal lungs are clear. No edema or effusions.'

--------------------------------------------------
Finding: absent 👎


Report #1126:
Report text:


'Cardiomegaly. No pneumothorax or pleural effusion. Clear lung XXXX bilaterally.'

--------------------------------------------------
Finding: present ✅


Report #1003:
Report text:


'Left lung clear. Questioned some vague right lower lobe opacity on the frontal film. XXXX subsegmental atelectasis or scar seen on lateral view. Cardiac silhouette borderline in size within LV contour. XXXX sulci. XXXX unremarkable.'

--------------------------------------------------
Finding: present ✅


Report #914:
Report text:


'Lucency crosses the 10th left posterior rib. Visualized portions of the thoracic spine are unremarkable. Mediastinal contours are normal. Lungs are clear. There is no pneumothorax or large pleural effusion.'

--------------------------------------------------
Finding: absent 👎


Report #571:
Report text:


'The heart and lungs have XXXX XXXX in the interval. Both lungs are clear and expanded. Heart and mediastinum normal.'

--------------------------------------------------
Finding: absent 👎


Report #3016:
Report text:


'The lungs are mildly hyperexpanded. There is no focal airspace consolidation. No suspicious pulmonary mass or nodule is identified. Heart size and mediastinal contour are within normal limits. There are degenerative changes of the spine.'

--------------------------------------------------
Finding: absent 👎


Report #419:
Report text:


'There are low lung volumes with bronchovascular crowding. There is no focal consolidation. No visualized pneumothorax. Heart size is within normal limits. The cardiomediastinal contours is grossly normal in size and contour.'

--------------------------------------------------
Finding: absent 👎

SUMMARY:

Analyzed 10 reports


# Create Your Own Specialized Classifier! 🚀

You can create a custom classifier for any specific condition by following these steps:

1. **Define the condition** you want to detect (e.g., COPD, pleural effusion).
2. **Specify key indicators** associated with the condition.
3. **Modify the template** to reflect the new condition and findings.
4. **Run the model** to classify reports based on your selected condition.

### Example: COPD Detection
- **Signs of COPD:**
  - Hyperinflated lungs
  - Flattened diaphragm
  - Increased retrosternal airspace
  - Narrowed cardiac silhouette
  - Emphysematous changes


In [10]:
finding_specific_template = """
### INSTRUCTION
You are a specialist in chest X-ray reports.
Your task is to classify a report as positive if it describes signs of {looking_for}, such as:
{examples}
Consider the finding positive even if it is mild.
Your response should be in JSON format with the key 'classification' and the possible values: 'present' or 'absent'.

### REPORT TO CLASSIFY
{report}
"""

looking_for = "COPD (Chronic Obstructive Pulmonary Disease)"

examples = """
- Hyperinflated lungs.
- Flattened diaphragm.
- Increased retrosternal airspace.
- Narrowed cardiac silhouette.
- Emphysematous changes.
"""

print(f"\n{looking_for.capitalize()} Detection")
print(f"Examples: {examples}")


results = classify_multiple_reports(df,
                                    n_reports=10,
                                    template=finding_specific_template,
                                    looking_for=looking_for,
                                    examples=examples,
                                    seed=42)

display_results(results, show_original_label=False)


Copd (chronic obstructive pulmonary disease) Detection
Examples: 
- Hyperinflated lungs.
- Flattened diaphragm.
- Increased retrosternal airspace.
- Narrowed cardiac silhouette.
- Emphysematous changes.



Classifying reports: 100%|██████████| 10/10 [00:06<00:00,  1.52it/s]


Report #2619:
Report text:





'Heart size is stable. There is focal airspace consolidation in the lateral aspect of the right upper lobe. There is no pneumothorax or effusion. No acute bony abnormalities.'

--------------------------------------------------
Finding: absent 👎


Report #456:
Report text:


'The cardiac contours are normal. The lungs are clear. Thoracic spondylosis. Prior cholecystectomy'

--------------------------------------------------
Finding: absent 👎


Report #102:
Report text:


'There is borderline cardiomegaly. Mediastinum and pulmonary vasculature are unremarkable. Lungs are clear. No pleural fluid or pneumothorax is appreciated.'

--------------------------------------------------
Finding: absent 👎


Report #3037:
Report text:


'Heart size is normal lungs are clear. No edema or effusions.'

--------------------------------------------------
Finding: absent 👎


Report #1126:
Report text:


'Cardiomegaly. No pneumothorax or pleural effusion. Clear lung XXXX bilaterally.'

--------------------------------------------------
Finding: absent 👎


Report #1003:
Report text:


'Left lung clear. Questioned some vague right lower lobe opacity on the frontal film. XXXX subsegmental atelectasis or scar seen on lateral view. Cardiac silhouette borderline in size within LV contour. XXXX sulci. XXXX unremarkable.'

--------------------------------------------------
Finding: absent 👎


Report #914:
Report text:


'Lucency crosses the 10th left posterior rib. Visualized portions of the thoracic spine are unremarkable. Mediastinal contours are normal. Lungs are clear. There is no pneumothorax or large pleural effusion.'

--------------------------------------------------
Finding: absent 👎


Report #571:
Report text:


'The heart and lungs have XXXX XXXX in the interval. Both lungs are clear and expanded. Heart and mediastinum normal.'

--------------------------------------------------
Finding: absent 👎


Report #3016:
Report text:


'The lungs are mildly hyperexpanded. There is no focal airspace consolidation. No suspicious pulmonary mass or nodule is identified. Heart size and mediastinal contour are within normal limits. There are degenerative changes of the spine.'

--------------------------------------------------
Finding: present ✅


Report #419:
Report text:


'There are low lung volumes with bronchovascular crowding. There is no focal consolidation. No visualized pneumothorax. Heart size is within normal limits. The cardiomediastinal contours is grossly normal in size and contour.'

--------------------------------------------------
Finding: absent 👎

SUMMARY:

Analyzed 10 reports


# Conclusion & Next Steps 📌

Through this notebook, we've explored how **prompt engineering** enhances the classification of radiology reports using LLMs.

### Key Takeaways:
✅ **Zero-shot prompting** enables classification without prior examples.  
✅ **Few-shot prompting** improves accuracy by providing labeled examples.  
✅ **Condition-specific classification** allows targeted detection of medical conditions.  

### What’s Next?
Now that you understand the fundamentals, here are some next steps:
1. **Test additional conditions** by modifying the `{looking_for}` and `{examples}` variables.
2. **Experiment with different LLM models** to compare performance.
3. **Refine your prompts** to improve classification accuracy.
4. **Apply real-world datasets** to validate the model in clinical settings.

By leveraging **LLMs and prompt engineering**, we can enhance efficiency in **radiology report classification**, aiding medical professionals in making informed decisions. 🏥💡

---

### Acknowledgment  
This project utilizes an **open-access chest X-ray collection** from **Indiana University**, sourced from the **OpenI repository** ([OpenI](https://openi.nlm.nih.gov/)). We acknowledge and appreciate the efforts of the dataset creators in making this valuable resource available for research and analysis.

---

### Credits
This notebook is adapted from work by **Paulo Kuriki** and **Felipe Kitamura**.

