# What This Notebook Does:

1. Loads the Vector Store
	* It loads the FAISS index you created earlier (which contains the embeddings of the chunked chest X-ray reports).
	* This index allows for fast similarity search to find relevant text chunks based on a user’s query.


2. Sets Up a Retriever
	* The retriever searches the vector store to find the most similar chunks to the user’s question.
	* In this case, we retrieve the top 3 relevant text chunks.

3. Connects to the LLM
	* The retriever passes the most relevant chunks to an OpenAI GPT model via the LangChain RetrievalQA chain.
	* The model generates a final answer based on the retrieved context.

4. Builds the RetrievalQA Pipeline
	*	This combines:
	*	The retriever (retrieves relevant chunks)
	*	The LLM (generates an answer)
	*	The return_source_documents=True setting also shows which chunks were used to answer the question, giving you full transparency.

5. Tests the System
	* Provide natural language questions like: “What are the key findings in this chest X-ray report?”
	* The system:
    	- Retrieves relevant text from the dataset
    	- Generates a concise answer
    	- Displays the answer and the supporting source documents

In [1]:
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize embeddings
embedding_model = OpenAIEmbeddings()

# Load vector store from disk (trusted local file)
vectorstore = FAISS.load_local(
    "../results/faiss_vectorstore",
    embedding_model,
    allow_dangerous_deserialization=True
)

# Set Up Retriever

In [2]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

In [4]:
# Initialize the LLM
llm = OpenAI(temperature=0)

# Build the RetrievalQA Chain

In [5]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

# Test the QA System

In [6]:
query = "What are the key findings in this chest X-ray?"
result = qa_chain(query)

print("Answer:")
print(result['result'])

print("\nSources:")
for doc in result['source_documents']:
    print(doc.metadata)

  result = qa_chain(query)


Answer:
 The key findings in this chest X-ray include a small stable foreign body in the left chest, vascular calcifications in the aortic XXXX, mild degenerative changes of the spine, sclerotic lesions within the XXXX, several bilateral rib fractures with evidence of callus formation, and calcified subcarinal and right hilar lymph XXXX. There is also a vague nodular opacity in the right midlung that may be an artifact.

Sources:
{'source': 'report_2234'}
{'source': 'report_2154'}
{'source': 'report_136'}


# RetrievalQA Pipeline

In [9]:
# Set Up the Retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# k=3 → Retrieves the top 3 most relevant chunks

# Initialize the LLM
llm = OpenAI(temperature=0)

#  Build the RetrievalQA Chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

In [10]:
# Test with Example Query
query = "What are the key findings in this chest X-ray report?"
result = qa_chain(query)

print("Answer:")
print(result['result'])

print("\nSources:")
for doc in result['source_documents']:
    print(f"Source: {doc.metadata}")
    print(f"Content: {doc.page_content[:200]}")  # Print first 200 chars
    print("-" * 40)

Answer:
 The key findings in this chest X-ray report include a small stable foreign body over the left chest, vascular calcifications over the aortic XXXX, mild degenerative changes of the spine, sclerotic lesions within the XXXX, several bilateral rib fractures with evidence of callus formation, and normal heart, pulmonary XXXX, and mediastinum.

Sources:
Source: {'source': 'report_2234'}
Content: The heart, pulmonary XXXX and mediastinum are within normal limits. There is no pleural effusion or pneumothorax. There is no focal air space opacity to suggest a pneumonia. There is a small stable XX
----------------------------------------
Source: {'source': 'report_2154'}
Content: The lungs are clear. No suspicious pulmonary mass or nodule is identified. There is no pleural effusion or pneumothorax. Heart size and mediastinal contour are normal. There are sclerotic lesions with
----------------------------------------
Source: {'source': 'report_2356'}
Content: The heart, pulmonary XXXX an