# Summarization Pipeline for Chest X-Ray Reports

This notebook builds an automated summarization pipeline using LangChain and OpenAI's language models to generate concise summaries of chest X-ray findings.

## What This Notebook Does:
- Loads cleaned chest X-ray reports.
- Builds a prompt to summarize the "findings" section of each report.
- Uses LangChain's modern `RunnableSequence` API to connect the prompt with an LLM.
- Automatically generates summaries for each report.
- Saves the summarized dataset for further analysis or application use.

## Key Technologies:
- **LangChain (RunnableSequence)**: Modular chaining for prompt and LLM.
- **OpenAI GPT API**: For language generation and summarization.
- **Pandas**: For data manipulation and saving results.

This pipeline can support faster medical review, enable efficient retrieval, and serve as a foundation for building downstream applications like search, question-answering, or real-time clinical decision support.

In [2]:
from langchain_openai import OpenAI
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv
import pandas as pd

load_dotenv()

llm = OpenAI(temperature=0)

# Load the Cleaned Data
df_clean = pd.read_csv('../data/cleaned_reports.csv')

In [3]:
# Create Summarization Prompt
summarization_prompt = PromptTemplate(
    input_variables=["report"],
    template="Please provide a concise summary of the following chest X-ray findings:\n\n{report}\n\nSummary:"
)

# Build Summarization Chain

In [5]:
# New LangChain syntax using RunnableSequence
summarization_chain = summarization_prompt | llm

In [6]:
# Run Summarization on a Sample Report
sample_text = df_clean['findings'].iloc[0]

summary = summarization_chain.invoke({"report": sample_text})
print("Original Report:\n", sample_text)
print("\nGenerated Summary:\n", summary)

Original Report:
 The cardiac silhouette and mediastinum size are within normal limits. There is no pulmonary edema. There is no focal consolidation. There are no XXXX of a pleural effusion. There is no evidence of pneumothorax.

Generated Summary:
  The chest X-ray shows a normal cardiac silhouette and mediastinum size, no signs of pulmonary edema or focal consolidation, and no evidence of a pleural effusion or pneumothorax.


In [7]:
# Summarize All Reports and Save
summaries = []
for report in df_clean['findings']:
    try:
        summary = summarization_chain.invoke({"report": sample_text})
    except Exception as e:
        summary = f"Error: {str(e)}"
    summaries.append(summary)

# Add summaries to dataframe
df_clean['generated_summary'] = summaries

# Save
df_clean.to_csv('../results/summarized_reports.csv', index=False)

KeyboardInterrupt: 