# 4. Question Answering over Custom Data (RAG)

This notebook implements a complete Question-Answering system using the **Retrieval-Augmented Generation (RAG)** pattern. We walk through the end-to-end process of loading custom documents (a CSV catalog), creating vector embeddings for semantic search, and building a `RetrievalQA` chain. This allows the LLM to answer questions by first retrieving relevant information from our private data, enabling powerful, context-aware applications.

In [15]:
import os
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import CSVLoader
from langchain.vectorstores import DocArrayInMemorySearch
from IPython.display import display, Markdown
from langchain.llms import OpenAI
from langchain.indexes import VectorstoreIndexCreator


from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

In [16]:
llm_model = "gpt-4o"

In [17]:
file = 'OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)

In [18]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [19]:
query ="Please list all your shirts with sun protection \
in a table in markdown and summarize each one."

**Note**:
- The notebook uses `langchain==0.0.179` and `openai==0.27.7`
- For these library versions, `VectorstoreIndexCreator` uses `text-davinci-003` as the base model, which has been deprecated since 1 January 2024.
- The replacement model, `gpt-3.5-turbo-instruct` will be used instead for the `query`.
- The `response` format might be different than the video because of this replacement model.

In [20]:
llm_replacement_model = OpenAI(temperature=0, 
                               model='gpt-3.5-turbo-instruct')

response = index.query(query, 
                       llm = llm_replacement_model)

In [23]:
display(Markdown(response))



| Name | Description | Sun Protection Rating |
| --- | --- | --- |
| Men's Tropical Plaid Short-Sleeve Shirt | Made of 100% polyester, UPF 50+ rating, wrinkle-resistant, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |
| Men's Plaid Tropic Shirt, Short-Sleeve | Made of 52% polyester and 48% nylon, UPF 50+ rating, SunSmart technology, wrinkle-free, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |
| Men's TropicVibe Shirt, Short-Sleeve | Made of 71% nylon and 29% polyester, UPF 50+ rating, front and back cape venting, two front bellows pockets | SPF 50+, blocks 98% of harmful UV rays |
| Sun Shield Shirt | Made of 78% nylon and 22% Lycra Xtra Life fiber, UPF 50+ rating, moisture-wicking, abrasion-resistant, fits over swimsuit | SPF 50+, blocks 98% of harmful UV rays |

## Step By Step

### Loading Data from the CSV File

In [53]:
from langchain.document_loaders import CSVLoader
loader = CSVLoader(file_path=file)

In [54]:
docs = loader.load()

In [55]:
docs[0]

Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 0})

In [56]:
len(docs)

1000

### Turning Text into Numbers (Embeddings)

In [57]:
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

In [58]:
embed = embeddings.embed_query("Hi my name is Harrison")

In [59]:
print(len(embed))

1536


In [60]:
print(embed[:5])

[-0.021964654326438904, 0.006758837960660458, -0.01824948936700821, -0.03923514857888222, -0.014007173478603363]


### Creating a Searchable Vector Database

In [61]:
db = DocArrayInMemorySearch.from_documents(
    docs, 
    embeddings
)

### Performing a Semantic Search (Retrieval)

In [62]:
query = "Please suggest a shirt with sunblocking"

In [63]:
docs = db.similarity_search(query)

In [64]:
len(docs)

4

In [65]:
docs[0]

Document(page_content=': 255\nname: Sun Shield Shirt by\ndescription: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. \n\nSize & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.\n\nFabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.\n\nAdditional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.\n\nSun Protection That Won\'t Wear Off\nOur high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun\'s harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.', metadata={'source': 'OutdoorClothingCatalog_1000.csv', 'row': 255})

### Generating an Answer with the LLM

    We've retrieved the relevant documents. Now, we'll use a Large Language Model (LLM) to synthesize a final answer based on them

In [66]:
retriever = db.as_retriever()

In [67]:
llm = ChatOpenAI(temperature = 0.0, model=llm_model)

In [68]:
qdocs = "".join([docs[i].page_content for i in range(len(docs))])

In [70]:
print(qdocs)

: 255
name: Sun Shield Shirt by
description: "Block the sun, not the fun – our high-performance sun shirt is guaranteed to protect from harmful UV rays. 

Size & Fit: Slightly Fitted: Softly shapes the body. Falls at hip.

Fabric & Care: 78% nylon, 22% Lycra Xtra Life fiber. UPF 50+ rated – the highest rated sun protection possible. Handwash, line dry.

Additional Features: Wicks moisture for quick-drying comfort. Fits comfortably over your favorite swimsuit. Abrasion resistant for season after season of wear. Imported.

Sun Protection That Won't Wear Off
Our high-performance fabric provides SPF 50+ sun protection, blocking 98% of the sun's harmful rays. This fabric is recommended by The Skin Cancer Foundation as an effective UV protectant.: 374
name: Men's Plaid Tropic Shirt, Short-Sleeve
description: Our Ultracomfortable sun protection is rated to UPF 50+, helping you stay cool and dry. Originally designed for fishing, this lightest hot-weather shirt offers UPF 50+ coverage and is gr

In [71]:
response = llm.call_as_llm(f"{qdocs} Question: Please list all your \
shirts with sun protection in a table in markdown and summarize each one.")

In [72]:
print(response)

Certainly! Below is a markdown table summarizing the shirts with sun protection:

| Name                                | Description                                                                                                           | Size & Fit                  | Fabric & Care                                                                 | Additional Features                                                                 |
|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------|-----------------------------|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| Sun Shield Shirt                    | High-performance sun shirt with UPF 50+ protection. Softly shapes the body and falls at the hip.                      | Slightly Fitted             | 78% nylon, 22% L

In [73]:
display(Markdown(response))

Certainly! Below is a markdown table summarizing the shirts with sun protection:

| Name                                | Description                                                                                                           | Size & Fit                  | Fabric & Care                                                                 | Additional Features                                                                 |
|-------------------------------------|-----------------------------------------------------------------------------------------------------------------------|-----------------------------|-------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| Sun Shield Shirt                    | High-performance sun shirt with UPF 50+ protection. Softly shapes the body and falls at the hip.                      | Slightly Fitted             | 78% nylon, 22% Lycra Xtra Life fiber. Handwash, line dry.                     | Moisture-wicking, abrasion-resistant, fits over swimsuits.                          |
| Men's Plaid Tropic Shirt, Short-Sleeve | Ultracomfortable shirt with UPF 50+ protection, originally designed for fishing.                                      | Not specified               | 52% polyester, 48% nylon. Machine washable and dryable.                       | Wrinkle-free, quick-drying, front and back cape venting, two front bellows pockets. |
| Men's TropicVibe Shirt, Short-Sleeve | Lightweight sun-protection shirt with UPF 50+ protection. Relaxed fit through chest, sleeve, and waist.               | Traditional Fit             | Shell: 71% Nylon, 29% Polyester. Lining: 100% Polyester knit mesh. Machine wash and dry. | Wrinkle-resistant, front and back cape venting, two front bellows pockets.          |
| Men's Tropical Plaid Short-Sleeve Shirt | Lightest hot-weather shirt with UPF 50+ protection. Relaxed fit through chest, sleeve, and waist.                      | Traditional Fit             | 100% polyester. Wrinkle-resistant.                                            | Front and back cape venting, two front bellows pockets.                             |

Each shirt provides UPF 50+ sun protection, blocking 98% of the sun's harmful rays, ensuring you stay protected while enjoying outdoor activities.

### Automating the Process with a QA Chain

    automate the entire "retrieve then generate" process.

In [75]:
qa_stuff = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    verbose=True
)

In [76]:
query =  "Please list all your shirts with sun protection in a table \
in markdown and summarize each one."

In [77]:
response = qa_stuff.run(query)



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [78]:
display(Markdown(response))

Here is a table summarizing the shirts with sun protection:

| Name                                      | Summary                                                                                           |
|-------------------------------------------|---------------------------------------------------------------------------------------------------|
| Men's Tropical Plaid Short-Sleeve Shirt   | A lightweight, wrinkle-resistant shirt with UPF 50+ sun protection, made of 100% polyester. Features front and back cape venting and two front bellows pockets. Imported. |
| Men's Plaid Tropic Shirt, Short-Sleeve    | A comfortable, wrinkle-free shirt with UPF 50+ sun protection, made of 52% polyester and 48% nylon. Designed for fishing and travel, it features cape venting and two front bellows pockets. Machine washable and dryable. Imported. |
| Men's TropicVibe Shirt, Short-Sleeve      | A lightweight shirt with UPF 50+ sun protection, made of 71% nylon and 29% polyester. Features wrinkle resistance, cape venting, and two front bellows pockets. Machine washable and dryable. Imported. |
| Sun Shield Shirt                          | A slightly fitted shirt with UPF 50+ sun protection, made of 78% nylon and 22% Lycra Xtra Life fiber. Moisture-wicking and abrasion-resistant, it fits comfortably over swimsuits. Handwash and line dry. Imported. |

Each shirt provides the highest rated sun protection possible, blocking 98% of the sun's harmful rays.

In [80]:
index = VectorstoreIndexCreator(
    vectorstore_cls=DocArrayInMemorySearch,
    embedding=embeddings,
).from_loaders([loader])

In [81]:
response = index.query(query, llm=llm)

In [82]:
display(Markdown(response))

Here is a table summarizing the shirts with sun protection:

| Name                                      | Summary                                                                                           |
|-------------------------------------------|---------------------------------------------------------------------------------------------------|
| Men's Tropical Plaid Short-Sleeve Shirt   | A lightweight, wrinkle-resistant shirt with UPF 50+ sun protection, made of 100% polyester. Features front and back cape venting and two front bellows pockets. Imported. |
| Men's Plaid Tropic Shirt, Short-Sleeve    | A comfortable, wrinkle-free shirt with UPF 50+ sun protection, made of 52% polyester and 48% nylon. Designed for fishing and travel, it features cape venting and two front bellows pockets. Machine washable and dryable. Imported. |
| Men's TropicVibe Shirt, Short-Sleeve      | A lightweight shirt with UPF 50+ sun protection, made of 71% nylon and 29% polyester. Features wrinkle resistance, cape venting, and two front bellows pockets. Machine washable and dryable. Imported. |
| Sun Shield Shirt                          | A slightly fitted shirt with UPF 50+ sun protection, made of 78% nylon and 22% Lycra Xtra Life fiber. Moisture-wicking and abrasion-resistant, it fits comfortably over swimsuits. Handwash and line dry. Imported. |

Each shirt provides the highest rated sun protection possible, blocking 98% of the sun's harmful rays.