**Value-Based Healthcare (VBHC) in Breast Reconstruction Surgery**

This notebook demonstrates how to use a Retrieval-Augmented Generation (RAG) pipeline to extract structured clinical insights from PubMed articles. It walks through loading PDFs, embedding them into a FAISS vector store, and querying the content using GPT-4 to support evidence-based analysis of breast reconstruction outcomes.

**Install Required Dependencies**

These packages are required to build a Retrieval-Augmented Generation (RAG) pipeline.

langchain provides tools to load, split, embed, and query documents.

openai connects to GPT-4 for generating answers and embeddings.

faiss-cpu stores document embeddings and enables fast semantic search.

PyPDF2 extracts text from research PDFs.

tiktoken manages token counts to ensure compatibility with model input limits.

In [1]:
!pip install langchain openai faiss-cpu PyPDF2 tiktoken

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting tiktoken
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl (30.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.7/30.7 MB[0m [31m22.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m50.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2, faiss-cpu, tiktoken
Successfully installed PyPDF2

**Mount Google Drive**

We mount Google Drive to access research PDFs stored in the cloud. This allows us to load documents directly from your Drive folder into the notebook, so we can process them, embed them, and use them for question answering with the RAG pipeline.

In [2]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


**Locate and List PDF Files in Google Drive**

In this step, we define the path to the folder in Google Drive that contains our PubMed research PDFs. We then list all PDF files in that folder using Python’s os module. This prepares the documents for loading and processing in the next steps of the pipeline.

In [3]:
import os

pdf_folder = "/content/drive/MyDrive/Project Work/RAG_PDFs"
pdf_files = [os.path.join(pdf_folder, f) for f in os.listdir(pdf_folder) if f.endswith('.pdf')]


**Upgrade PDF Processing Dependencies**

In this step, we upgrade to the latest version of the langchain-community package and install pypdf, which is required for reliable PDF parsing in LangChain. This ensures we have access to the latest document loaders and robust handling of scientific PDFs, which is crucial for accurate chunking and embedding later in the pipeline.

In [4]:
!pip install -U langchain-community pypdf

Collecting langchain-community
  Downloading langchain_community-0.3.21-py3-none-any.whl.metadata (2.4 kB)
Collecting pypdf
  Downloading pypdf-5.4.0-py3-none-any.whl.metadata (7.3 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain

**Load and Split PDF Documents into Chunks**

In this step, we use LangChain’s PyPDFLoader to load each research article and extract its text. Then, we split each document into overlapping chunks using RecursiveCharacterTextSplitter. This is important because language models like GPT-4 have context length limits, so breaking the documents into manageable pieces ensures we can embed and retrieve them effectively.

In [5]:
# Import necessary classes from LangChain
from langchain.document_loaders import PyPDFLoader  # Used to load and extract text from PDF files
from langchain.text_splitter import RecursiveCharacterTextSplitter  # Used to split long texts into manageable chunks

# Initialize an empty list to store all the document chunks
all_chunks = []

# Loop through a list of PDF file paths
for file in pdf_files:
    # Load the content of each PDF file
    loader = PyPDFLoader(file)
    docs = loader.load()  # Extracts all the text from the PDF and returns it as documents

    # Initialize the text splitter
    # Each chunk will have up to 1000 characters with a 200-character overlap between consecutive chunks
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

    # Split the loaded documents into smaller overlapping chunks
    chunks = splitter.split_documents(docs)

    # Add the generated chunks to the overall list
    all_chunks.extend(chunks)

# Print the total number of chunks created across all PDFs
print(f"Total chunks created: {len(all_chunks)}")


Total chunks created: 618


In [6]:
import os
os.environ["OPENAI_API_KEY"] = "sk-proj-......."  # os.environ["OPENAI_API_KEY"]


**Generate Embeddings and Store in FAISS Vector Database**

In this step, we convert the text chunks into vector embeddings using OpenAI's embedding model. These embeddings represent the semantic meaning of each chunk, allowing us to perform similarity search later. The resulting vectors are stored in a FAISS vector database, which we save to Google Drive so the system can be reused without repeating the entire processing pipeline.

In [7]:
# Import the embedding model and FAISS vector store from LangChain
from langchain.embeddings import OpenAIEmbeddings  # Interface to OpenAI's text embedding API
from langchain.vectorstores import FAISS  # Wrapper for FAISS-based similarity search

# Initialize the OpenAI embedding model (default uses 'text-embedding-ada-002')
embedding = OpenAIEmbeddings()

# Create a FAISS vector store from the previously created document chunks
# This indexes all_chunks using their semantic vector representations
vectorstore = FAISS.from_documents(all_chunks, embedding)

# Define the path in Google Drive where the FAISS index will be saved
save_path = "/content/drive/MyDrive/Project Work/RAG_Vs"

# Save the FAISS index and associated data locally to the specified path
vectorstore.save_local(save_path)

# Confirm successful saving of the vector store
print("FAISS vectorstore saved to Google Drive.")


  embedding = OpenAIEmbeddings()


FAISS vectorstore saved to Google Drive.


**Reload FAISS Vector Store for Querying**

In this step, we reload the previously saved FAISS vector store from Google Drive. We use OpenAI’s embedding model again to ensure consistency with the stored vectors, and enable allow_dangerous_deserialization=True since FAISS uses pickle-based files (safe here because the data is self-generated). Once loaded, we convert the vector store into a retriever, which will later be used to find the most relevant text chunks in response to user queries.

In [8]:
from getpass import getpass  # Useful for securely entering API keys if needed

# Import OpenAI embeddings and FAISS vector store functionality from LangChain
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

# Initialize the embedding model using the OpenAI API key stored in environment variables
# Ensure that OPENAI_API_KEY is already set via os.environ before this line
embedding = OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])

# Load the previously saved FAISS vector store from local storage
# The 'allow_dangerous_deserialization=True' flag is required to safely load data stored with pickle
loaded_vectorstore = FAISS.load_local(
    folder_path=save_path,              # Path where the FAISS store was saved
    embeddings=embedding,               # Embedding object to match chunk vectors
    allow_dangerous_deserialization=True  # Caution: Use only with trusted data sources
)

# Convert the loaded vector store into a retriever
# This object can now be used in RetrievalQA or similar pipelines to fetch relevant chunks given a query
retriever = loaded_vectorstore.as_retriever()


**Set Up GPT-4 Question Answering Chain**

In this step, we create a Retrieval-Augmented Generation (RAG) chain that connects GPT-4 to our vector store. When a question is asked, the retriever first finds the most relevant document chunks from the FAISS vector store. Then, GPT-4 uses those chunks as context to generate a structured, evidence-based answer. This setup allows us to ask clinical or research questions and receive context-aware responses.

In [9]:
# Import necessary components from LangChain
from langchain.chains import RetrievalQA  # Builds a QA chain that retrieves documents and generates answers
from langchain.prompts import PromptTemplate  # Used to create a custom prompt for more precise model guidance
from langchain.chat_models import ChatOpenAI  # Wrapper for accessing OpenAI's GPT chat models

# Define a custom prompt template for answering biomedical/surgical questions
custom_prompt = PromptTemplate.from_template("""
You are a clinical research assistant tasked with answering surgical questions based on biomedical literature.

Use the following retrieved context to answer the question. If the context does not contain exact numbers, use approximate reasoning and mention that. Always be specific and reference the supporting evidence when possible.

Question: {question}

Context:
{context}

Answer:
""")

# Construct a RetrievalQA chain using GPT-4 and the retriever
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(
        model_name="gpt-4",  # Specify the LLM to use (here, GPT-4 via OpenAI)
        openai_api_key=os.environ["OPENAI_API_KEY"]  # Load API key securely from environment variable
    ),
    retriever=retriever,  # Pass in the FAISS-based retriever to fetch relevant chunks from your document set
    chain_type_kwargs={"prompt": custom_prompt}  # Apply the custom prompt to guide how GPT-4 uses the retrieved context
)

# This custom RetrievalQA chain now:
# 1. Accepts a clinical query.
# 2. Retrieves relevant document chunks using semantic similarity.
# 3. Feeds the context and question into GPT-4 using your instructive prompt.
# 4. Returns a detailed, grounded answer (if possible) based on the available context.



  llm=ChatOpenAI(


**Run a Clinical Query Using the RAG Pipeline**

In this final step, we run a structured clinical query through the RAG pipeline. The retriever searches the vector store for the most relevant content from the uploaded PubMed articles, and GPT-4 generates a summarized, evidence-based response. This allows us to extract key insights (e.g., complications, cost impact, and patient-reported outcomes) without manually reviewing each paper.

In [None]:
query = "What complications are associated with increasing length of stay after microvascular breast reconstruction? 1a. How do hospital costs increase with each additional day of stay? 1b. How does length of stay affect patient-reported outcomes?"
result = qa_chain.run(query)
print(result)


The context does not provide specific complications associated with increasing length of stay after microvascular breast reconstruction. However, it mentions that certain risk factors and comorbidities such as obesity, diabetes, malignancy history, operative time, a history of radiation therapy, smoking, and bilateral reconstruction may necessitate a longer length of stay. 

The context also does not provide specific details on how hospital costs increase with each additional day of stay. However, it suggests that a shortened length of stay is safe and effective, and that from a cost-utility perspective, a discharge on postoperative day 3 is the most advantageous.

Regarding patient-reported outcomes, the context doesn't provide direct information. However, it mentions that an earlier discharge was supported not only from a cost perspective but also in terms of quality-adjusted life-years, implying that shorter hospital stays may lead to better patient-reported outcomes. 

Please note 

In [None]:
query = "How do patient-reported outcomes compare between implant-based and autologous (tissue-based) breast reconstruction?"
result = qa_chain.run(query)
print(result)

According to the systematic review, patient-reported outcomes were generally higher for autologous (tissue-based) breast reconstruction compared to implant-based reconstruction. Using the BREAST-Q validated measurement tool, patients who underwent autologous reconstruction reported higher satisfaction with their breasts and greater psychosocial well-being than those who underwent implant-based reconstruction. Differences in physical well-being between the two groups were less significant and the least significant difference was noted for sexual well-being. The EORTC-QLQ-BR23/C30 PROMs also noted similar trends. The SF-36 measure, however, noted virtually no difference between the two methods of reconstruction regarding similar quality of life domains. Therefore, from the patient perspective, autologous reconstruction is either equal to or superior to implant-based reconstruction. The context does not provide exact numbers for these outcomes.


In [None]:
query = "Are TRAM flaps associated with higher complication rates and costs compared to DIEP flaps?"
result = qa_chain.run(query)
print(result)

The rates of postoperative complications overall between patients receiving DIEP vs TRAM flap surgery were fairly similar (5.3% and 5.5% respectively). However, wound dehiscence immediately postoperatively occurred significantly more in the TRAM flaps as compared to the DIEP flaps. Regarding costs, the total hospital charges to costs using cost-to-charge ratio were comparable between DIEP and TRAM flaps. In fact, contrary to the prevailing assumption, the study found that TRAM flaps are not more cost-effective than DIEP flaps. The total hospital charges to costs for patients in the DIEP and the TRAM subgroups were $29,775 and $28,466, respectively. These findings contradict the notion that TRAM flaps are less expensive procedures when compared to DIEP flaps.


In [None]:
query = "What are the most important predictors of patient satisfaction in the BREAST-Q across the following groups: 4a. DIEP flaps 4b. TRAM flaps 4c. Implant-based reconstruction 4d. Total mastectomy without reconstruction"
result = qa_chain.run(query)
print(result)


The most important predictors of patient satisfaction in the BREAST-Q across the groups are:

4a. DIEP flaps: Patient scores following DIEP flap surgery were reported to be high (mean score, 83) on the BREAST-Q abdominal well-being scale. This indicates that patients undergoing DIEP flap surgery generally had a good satisfaction rate (Context: "Using the BREAST-Q abdominal well-being scale, patient scores following DIEP flap surgery (mean score, 83) were included in the breast health-related quality-adjusted life-year calculation").

4b. TRAM flaps: The context suggests that TRAM reconstruction was preferred over implant reconstruction, indicating a higher level of patient satisfaction with TRAM flaps (Context: "Hu et al. (19) reported that TRAM reconstruction was preferred over implant in this regard").

4c. Implant-based reconstruction: The context suggests that there was less satisfaction with breast implant reconstruction when compared to autologous reconstruction methods like DIEP

In [10]:
subqueries = [
    "What are the most important predictors of patient satisfaction in the BREAST-Q for DIEP flap reconstruction?",
    "What are the most important predictors of patient satisfaction in the BREAST-Q for TRAM flap reconstruction?",
    "What are the most important predictors of patient satisfaction in the BREAST-Q for implant-based reconstruction?",
    "What are the most important predictors of patient satisfaction in the BREAST-Q for total mastectomy without reconstruction?"
]

for i, q in enumerate(subqueries, 1):
    print(f"\n--- Query {i} ---\n{q}")
    print("\nAnswer:")
    result = qa_chain.run(q)
    print(result)

    print("\nTop Retrieved Chunks:")
    retrieved_docs = retriever.get_relevant_documents(q)
    for j, doc in enumerate(retrieved_docs[:2]):  # show top 2 chunks
        print(f"\nChunk {j+1}:\n{doc.page_content[:500]}...\n")


--- Query 1 ---
What are the most important predictors of patient satisfaction in the BREAST-Q for DIEP flap reconstruction?

Answer:


  result = qa_chain.run(q)


The context does not provide specific predictors of patient satisfaction in the BREAST-Q for DIEP flap reconstruction. However, it does mention the use of the BREAST-Q abdominal well-being scale to measure patient scores following DIEP flap surgery. It also suggests that patient-centered care and patient input are important aspects of determining patient satisfaction. Therefore, it can be inferred that individual patient experiences and concerns, such as potential abdominal weakness, might play a significant role in patient satisfaction. However, without more specific information, this is an approximate interpretation.

Top Retrieved Chunks:


  retrieved_docs = retriever.get_relevant_documents(q)



Chunk 1:
service using the BREAST-Q patient reported outcomes 
measure: A cohort study. J Plast Reconstr Aesthet Surg 
2016;69:1469-77. 
22. Tønseth KA, Hokland BM, Tindholdt TT , et al. Quality 
of life, patient satisfaction and cosmetic outcome after 
breast reconstruction using DIEP flap or expandable breast 
implant. J Plast Reconstr Aesthet Surg 2008;61:1188-94. 
23. Thorarinsson A, Fröjd V , Kölby L, et al. Long-T erm 
Health-Related Quality of Life after Breast Reconstruction: 
Comparing 4 Differe...


Chunk 2:
It is well known that patients choose prosthetic 
techniques, citing potential abdominal weakness 
as a concern. 24 Using the BREAST-Q abdominal 
well-being scale, patient scores following DIEP 
flap surgery (mean score, 83) were included in 
the breast health-related quality-adjusted life-
year calculation to directly contrast with implant 
reconstructions (score, 100). Although results of 
the current study are in agreement with previous 
research that demonstrates the