# LangChain: Q&A over Documents

An example might be a tool that would allow you to query a product catalog for items of interest.

1️⃣ Map Reduce

This method follows a two-step process:

Map Step: The LLM processes each chunk of retrieved text independently and generates partial answers.
Reduce Step: The outputs from all chunks are combined to form a final, coherent answer. The reduction step can involve summarization, majority voting, or another aggregation method.

✅ Advantages:

Efficient when processing large datasets.
Each chunk is processed separately, making it scalable.

❌ Disadvantages:

Lacks interdependence between chunks in the first step, which may lead to missing cross-references.

2️⃣ Refine

This approach is more sequential and iterative:

The LLM processes the first retrieved document and generates an initial answer.
Each subsequent document modifies or enhances the previous answer.
The final answer is generated after all chunks have been processed in order.

✅ Advantages:

Can refine and build a more comprehensive response by integrating new information step by step.
Helps in handling multi-document reasoning.

❌ Disadvantages:

Can be computationally expensive as each step depends on the previous one.
Errors in early stages may propagate through the refining process.

3️⃣ Map Rerank

This method also follows a two-step process, but with a ranking mechanism:

Map Step: The LLM generates an answer for each chunk independently (like in Map Reduce).
Rerank Step: Instead of summarizing, the LLM assigns a relevance score to each generated answer and selects the best one.

✅ Advantages:

Ensures that the most relevant and high-quality answer is chosen.
Works well when retrieving multiple potential answers.

❌ Disadvantages:

May discard useful information if the best-scored answer isn't comprehensive.
Requires a robust ranking mechanism.

In [36]:
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

In [37]:
# account for deprecation of LLM model
import datetime
# Get the current date
current_date = datetime.datetime.now().date()

# Define the date after which the model should be set to "gpt-3.5-turbo"
target_date = datetime.date(2024, 6, 12)

# Set the model variable based on the current date
if current_date > target_date:
    llm_model = "gpt-3.5-turbo"
else:
    llm_model = "gpt-3.5-turbo-0301"

In [38]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain.llms import OpenAI
from langchain.vectorstores import FAISS 

In [42]:
# Load CSV data
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
docs = loader.load()

In [43]:
# Initialize OpenAI embeddings
embeddings = OpenAIEmbeddings()

In [44]:
# Create FAISS vectorstore from documents
faiss_index = FAISS.from_documents(docs, embeddings)


In [45]:
# Create a retriever from the FAISS index
retriever = faiss_index.as_retriever()

In [46]:
# Define LLM model
llm = ChatOpenAI(temperature=0.0, model='gpt-3.5-turbo')

In [56]:
# Query for shirts with sun protection
query = "Please list all your shirts with sun protection"

In [57]:
# Retrieve relevant documents from FAISS index
retrieved_docs = faiss_index.similarity_search(query)

In [58]:
# Combine retrieved document content
qdocs = "\n".join([doc.page_content for doc in retrieved_docs])

In [59]:
# Generate response using LLM
response = llm.call_as_llm(f"{qdocs} Question: {query}")

In [60]:
# Create RetrievalQA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
)

In [61]:
# Run query using RetrievalQA
qa_response = qa_chain.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [62]:
# Create FAISS index using VectorstoreIndexCreator
index = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
).from_loaders([loader])


In [63]:
# Query the FAISS index
final_response = index.query(query, llm=llm)

In [64]:
# Display the final markdown formatted response
display(Markdown(final_response))

1. Men's Plaid Tropic Shirt, Short-Sleeve
2. Men's Tropical Plaid Short-Sleeve Shirt
3. Men's TropicVibe Shirt, Short-Sleeve
4. Girls' Ocean Breeze Long-Sleeve Stripe Shirt