# Lesson 4: Q&A over Documents

### **1. Components of the Retrieval Q&A Chain**

#### **1.1 Document Store**
The document store is where all your documents are stored and indexed for retrieval. Popular options include:

- **FAISS (Facebook AI Similarity Search):** For vector-based similarity search.
- **Pinecone:** A scalable vector database for high-performance retrieval.
- **Weaviate or Chroma:** Modern alternatives with feature-rich capabilities.

Here, we're using **DocArrayInMemorySearch**, as it is suitable for small scale applications like this one, whereas the above DBs are more suited for large-scale applications

The document store allows for the efficient retrieval of documents based on vector similarity.

#### **1.2 Embedding Model**
The embedding model converts documents and user queries into dense vector representations. These embeddings capture semantic meaning and are essential for similarity searches

#### **1.3 Retriever**
The retriever is responsible for searching the document store and returning the most relevant documents based on the query embedding. Two main types of retrieval methods are used:

- **Similarity-based retrieval:** Finds documents closest to the query in vector space.
- **Hybrid retrieval:** Combines traditional keyword search with vector similarity.

#### **1.4 Large Language Model (LLM)**
The LLM interprets the retrieved documents and generates an accurate and contextually appropriate answer to the user’s query.

#### **1.5 Chain Logic**
Chains in LangChain enable the combination of multiple components into a coherent pipeline. For Retrieval Q&A, the chain typically involves:

- Embedding the query.
- Retrieving relevant documents.
- Answer generation using the LLM.


And again, Andrew Ng's lesson used deprecated classes, so here I use the latest ones, as suggested by LangChain: https://python.langchain.com/docs/versions/migrating_chains/retrieval_qa/

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [2]:
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import ChatOllama, OllamaLLM, OllamaEmbeddings
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import DocArrayInMemorySearch
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain import hub

### 1. Creating Document Store

#### 1.1 Loading Document

In [3]:
loader = PyPDFLoader(
    file_path="SuFIA.pdf",
    extract_images=True,
    )

pages = [page for page in loader.lazy_load()]

print(len(pages)) #the pdf has 8 pages, and this prints 8
print(pages[0])

8
page_content='SUFIA: Language-Guided Augmented Dexterity
for Robotic Surgical Assistants
Masoud Moghani1, Lars Doorenbos 2, William Chung-Ho Panitch 3,
Sean Huver4, Mahdi Azizian 4, Ken Goldberg 3, Animesh Garg 1,4,5
Abstract— In this work, we present SUFIA , the first frame-
work for natural language-guided augmented dexterity for
robotic surgical assistants. SUFIA incorporates the strong
reasoning capabilities of large language models (LLMs) with
perception modules to implement high-level planning and low-
level control of a robot for surgical sub-task execution. This
enables a learning-free approach to surgical augmented dexterity
without any in-context examples or motion primitives. SUFIA
uses a human-in-the-loop paradigm by restoring control to
the surgeon in the case of insufficient information, mitigating
unexpected errors for mission-critical tasks. We evaluate SUFIA
on four surgical sub-tasks in a simulation environment and two
sub-tasks on a physical surgical robotic platfo

In [4]:
print(type(pages[0]))

<class 'langchain_core.documents.base.Document'>


In [5]:
print(len(pages[0].page_content))

4681


#### 1.2 Configuring the DB with document and embedding 

In [6]:
embeddings = OllamaEmbeddings(model="llama3.2")

db = DocArrayInMemorySearch.from_documents(pages, embeddings)




In [7]:
llm = ChatOllama(model="llama3.2")  

### 2. Creating the QnA chain

`db.as_retriever()` wraps the vector db into a retriever, and returns a `VectorStoreRetriever`.\
It acts acts as a bridge between a vector store and the query processing logic in a chain or pipeline.\
It Converts user queries into embeddings ( using the embedding model associated with the vector store) and \
retrieves the top-k (default k=4,by performing a similarity search in the vector store) most relevant documents using the vector store.

In [8]:
#See the full prompt at: https://smith.langchain.com/hub/langchain-ai/retrieval-qa-chat?tab=0
retrieval_qa_chat_prompt = hub.pull("langchain-ai/retrieval-qa-chat")

combine_docs_chain = create_stuff_documents_chain(llm, retrieval_qa_chat_prompt)
rag_chain = create_retrieval_chain(db.as_retriever(), combine_docs_chain)

response = rag_chain.invoke({"input": "How can LLMs be used in a surgical setting?"})



In [9]:
print(response["answer"])

Large Language Models (LLMs) have the potential to transform various fields, including surgery. Here are some ways LLMs can be used in a surgical setting:

1. **Preoperative planning**: LLMs can help surgeons analyze patient data, such as medical history, imaging results, and laboratory tests, to provide personalized treatment plans.
2. **Surgical documentation**: AI-powered LLMs can assist with surgical documentation by automatically generating reports, summaries of procedures, and even assisting with medical dictation.
3. **Procedural guidance**: LLMs can be used to provide step-by-step instructions for surgeons during complex procedures, such as minimally invasive surgery or robotic-assisted surgeries.
4. **Patient education**: AI-powered chatbots or virtual assistants built on LLMs can educate patients about their conditions, treatment options, and post-operative care, improving patient engagement and outcomes.
5. **Annotated medical texts**: LLMs can be trained to analyze large vo

In [10]:
response["context"]

[Document(metadata={'source': 'SuFIA.pdf', 'page': 6}, page_content='SuFIA LLM Planner: To begin the task of finding the suturing site, I will first need to detect \nthe location of the suturing site within the environment. I will call the `detect_object` function \nwith the parameter "suturing site" to obtain its position.\n```python\ndetect_object("suturing site")\n```\nSystem execution output: Object not found!\nSuFIA LLM Planner: Since the "suturing site" object was not found in the environment, I \nwill defer control to the surgeon to provide further instructions or to adjust the environment so \nthat the suturing site can be detected.\n```python\ntransfer_control()\n```\nControl of the robot is transferred to the Surgeon.\n(a) (b)\n(c)\nSurgeon: Please find the suturing site.\nFig. 6: Interactive human-in-the-loop approach. (a) An overview of the environment showing the dVRK robotic arm and endoscope\ncamera as well as a needle and a suturing pad in ORBIT -Surgical, (b) RGB image

### Explanation

The final `rag_chain` builds a full Retrieval-Augmented Generation (RAG) chain.

**Key Components:**
The flow of the chain would be as: 
1. **Retriever (`db.as_retriever()`):**
   - Converts `db` (a vector database) into a retriever object.
   - Responsible for retrieving the most relevant documents from the vector store based on the query embedding.

2. **Combiner (`combine_docs_chain`):**
   - Takes the documents retrieved by the retriever.
   - Uses the LLM to generate a final response.

**Result:** 
- `rag_chain` orchestrates the entire process:
  - Takes a user query.
  - Uses the retriever to fetch relevant documents.
  - Passes those documents to the `combine_docs_chain` for response generation.

`create_retrieval_chain()` and `combine_docs_chain()` are helper functions which create the chain (RunnableSequence) as specified by their names.\
So, in the above `rag_chain`, the `combine_docs_chain()` takes input from the retriever, inserts that input into its prompt as input variable, and then passes that prompt to the \
specified llm, which then returns a response.


In [11]:
print(combine_docs_chain)

bound=RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableLambda(format_docs)
}), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
| ChatPromptTemplate(input_variables=['context', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessageChunk')], typing.Annotated[langchain_core.messages.human.HumanMessageChunk, Tag(tag='HumanMessageChunk')], typing.Annota

In [12]:
print(rag_chain)

bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['DocArrayInMemorySearch'], vectorstore=<langchain_community.vectorstores.docarray.in_memory.DocArrayInMemorySearch object at 0x000001B3F51FE420>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], optional_variables=['chat_history'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatM