# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

In [10]:
import os
from langchain_google_genai import ChatGoogleGenerativeAI

os.environ["GOOGLE_API_KEY"] = os.environ["GEMINI_API_KEY"]
llm_model = "gemini-2.0-flash-lite" # "gemma-3-27b-it" # 

llm = ChatGoogleGenerativeAI(
    model=llm_model,
    temperature=0.9,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

!["4.1"](img/4.1.png)

In [11]:
from langchain.chains import RetrievalQA
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain.indexes import VectorstoreIndexCreator
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

index = VectorstoreIndexCreator(
    embedding=embeddings,
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])
# index = VectorstoreIndexCreator().from_loaders([loader])

query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

response = index.query(query, llm=llm)

In [13]:
display(Markdown(response))

| Name                             | Description Summary                                                                                                                                                              |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Sun Shield Shirt by              | High-performance sun shirt with UPF 50+ sun protection, wicks moisture, abrasion resistant.                                                                                   |
| Sunrise Tee                      | Women's UV-protective button-down shirt with UPF 50+ sun protection, lightweight, wicks moisture, wrinkle-free.                                                                 |
| Women's Tropical Tee, Sleeveless | Sleeveless button-up shirt with SunSmart™ UPF 50+ sun protection, wrinkle resistant.                                                                                           |
| Men's Plaid Tropic Shirt, Short-Sleeve | Ultracomfortable sun protection rated to UPF 50+, wrinkle-free, quickly evaporates perspiration.                                                                                         |

## Step By Step

In [15]:
from langchain.document_loaders import CSVLoader
loader = CSVLoader(file_path=file)

docs = loader.load()
docs[0]

Document(metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 0}, page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.")

In [16]:
embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")
embed = embeddings.embed_query("Hi my name is Harrison")
print(len(embed))
print(embed[:5])

768
[0.0057015432976186275, 0.003848188789561391, -0.057218100875616074, -0.03507888689637184, -0.036717262119054794]


In [17]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

In [18]:
query = "Please suggest a shirt with sunblocking"
docs = db.similarity_search(query)
print(len(docs))
print(docs[0])

4
page_content=': 255
name: Sun Shield Shirt by
description: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. 

Size & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.

Fabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.

Additional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.

Sun Protection That Won't Wear Off
Our high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun's harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.' metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 255}


In [19]:
retriever = db.as_retriever()

In [24]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])
# print(qdocs)

response = llm.invoke(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.") 

display(Markdown(response.content))

Here's a table summarizing the shirts with sun protection, based on the provided descriptions:

| Shirt Name                     | Description Summary                                                                                                                                                           | Features                                                                                                                                                                                             | Fabric & Care                                                                                                                                                              |
| ------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Sun Shield Shirt by            | Women's high-performance sun shirt designed to block UV rays.                                                                                                                 | UPF 50+ rated, wicks moisture, abrasion resistant.                                                                                                                                                  | 78% nylon, 22% Lycra Xtra Life fiber. Handwash, line dry.                                                                                                                 |
| Men's TropicVibe Shirt, Short-Sleeve | Men's sun-protection shirt with built-in UPF 50+ for hot weather.                                                                                                             | UPF 50+ rated, wrinkle resistant, front and back cape venting, two front bellows pockets.                                                                                                               | Shell: 71% Nylon, 29% Polyester. Lining: 100% Polyester knit mesh. Machine wash and dry.                                                                                  |
| Men's Plaid Tropic Shirt, Short-Sleeve | Ultracomfortable sun protection shirt rated to UPF 50+, designed for hot weather and travel.                                                                                | UPF 50+ rated, wrinkle-free, front and back cape venting, two front bellows pockets.                                                                                                                    | 52% polyester and 48% nylon. Machine wash and dry.                                                                                                                        |
| Sunrise Tee                    | Women's UV-protective button-down shirt designed for hot weather.                                                                                                                 | UPF 50+ rated, wicks moisture, wrinkle-free, front and back cape venting, two front pockets, tool tabs, eyewear loop, smoother buttons, low-profile pockets, and side shaping for a flattering fit. | Shell: 71% nylon, 29% polyester. Cape lining: 100% polyester. Machine wash and dry.                                                                                                  |

In [26]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

response = qa_stuff.run(query)
display(Markdown(response))



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


| Name                                   | Description Summary                                                                                                                                                                                                                                                        |
| :------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sun Shield Shirt by                  | High-performance sun shirt with UPF 50+ protection, wicks moisture, and is abrasion-resistant.                                                                                                                                                                        |
| Sunrise Tee                            | Women's UV-protective button-down shirt with UPF 50+ protection, lightweight, wicks moisture, wrinkle-resistant, and has front/back cape venting.                                                                                                                   |
| Women's Tropical Tee, Sleeveless      | Sleeveless button-up shirt with UPF 50+ protection, wrinkle-resistant, has front/back cape venting.                                                                                                                                                                 |
| Men's Plaid Tropic Shirt, Short-Sleeve | Ultracomfortable sun protection rated to UPF 50+, wrinkle-free, quickly evaporates perspiration, and has front/back cape venting.                                                                                                                                          |

### LangChain 0.3

In [31]:
# Load docs
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

# Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

embeddings = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

# Store splits
vectorstore = FAISS.from_documents(documents=all_splits, embedding=embeddings)

# LLM
llm = ChatGoogleGenerativeAI(
    model="gemma-3-27b-it",
    temperature=0.9,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

#### LangChain Expression Language LCEL 

In [32]:
from langchain import hub
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# See full prompt at https://smith.langchain.com/hub/rlm/rag-prompt
prompt = hub.pull("rlm/rag-prompt")


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


qa_chain = (
    {
        "context": vectorstore.as_retriever() | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | llm
    | StrOutputParser()
)

qa_chain.invoke("What are autonomous agents?")

'Autonomous agents use large language models (LLMs) as their core controller, functioning as the agent’s “brain”. These agents can solve problems by planning tasks, decomposing them into steps, and reflecting on past actions to improve iteratively. Examples like AutoGPT and BabyAGI demonstrate the potential of LLMs beyond simple text generation.'

The LCEL implementation exposes the internals of what's happening around retrieving, formatting documents, and passing them through a prompt to the LLM, but it is more verbose. You can customize and wrap this composition logic in a helper function, or use the higher-level create_retrieval_chain and create_stuff_documents_chain helper method:

In [45]:
from langchain import hub
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# LLM
llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-lite",
    temperature=0.9,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)
# See full prompt at https://smith.langchain.com/hub/langchain-ai/retrieval-qa-chat
retrieval_qa_chat_prompt = hub.pull("langchain-ai/retrieval-qa-chat")

combine_docs_chain = create_stuff_documents_chain(llm, retrieval_qa_chat_prompt)
rag_chain = create_retrieval_chain(vectorstore.as_retriever(), combine_docs_chain)

rag_chain.invoke({"input": "What are autonomous agents?"})

{'input': 'What are autonomous agents?',
 'context': [Document(id='f8607017-194e-4637-9ba7-63a344413148', metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, l