# Lab | Agent & Vector store

**Change the state union dataset and replicate this lab by updating the prompts accordingly.**

# Shakespeare 

# 🎯 What's the goal of this lab?
Basically, we’re learning how to make an AI agent that can talk to data

We do this by:

Loading some text (like Shakespeare sonnets or State of the Union speeches),

Splitting it into chunks, embedding it,

Storing those chunks in a vector database (so we can search it smartly),

And finally creating an agent that uses the vector DB to answer our questions like a brainy assistant.



## Create the vector store

In [None]:
!pip install chromadb langchain langchain_community langchain_openai

In [None]:
!pip install chromadb

In [None]:
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader

#  Load .env file to get API keys securely

In [None]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

OPENAI_API_KEY  = os.getenv('OPENAI_API_KEY')

# Initialize the LLM 

In [None]:
llm = OpenAI(temperature=0) 

# This code is dynamic—it automatically finds the right folder

In [None]:
from pathlib import Path #Path is from Python’s built-in 

relevant_parts = []
for p in Path(".").absolute().parts:
    relevant_parts.append(p)
    if relevant_parts[-3:] == ["langchain", "docs", "modules"]:
        break
doc_path = str(Path(*relevant_parts) / "sonnets.txt")

In [None]:
loader = TextLoader(doc_path)
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings, collection_name="shakespeare_sonnets")

- RetrievalQA is:
A ready-made pipeline that:

Retrieves relevant chunks using smart search (semantic search via docsearch).

Feeds those chunks into an LLM.

Returns a nice, natural language answer.

In [None]:
sonnets = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=docsearch.as_retriever()
)

- Other options:

"map_reduce" – processes each chunk individually, then combines answers.

"refine" – refines the answer step-by-step with each chunk.

"map_rerank" – scores answers and picks the best.

stuff for text isnt huge 

In [None]:
from langchain_community.document_loaders import WebBaseLoader

In [None]:
loader = WebBaseLoader("https://beta.ruff.rs/docs/faq/")

In [None]:
docs = loader.load()
ruff_texts = text_splitter.split_documents(docs)
ruff_db = Chroma.from_documents(ruff_texts, embeddings, collection_name="ruff")
ruff = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=ruff_db.as_retriever()
)

## Create the Agent

In [None]:
# Import things that are needed generically
from langchain.agents import AgentType, Tool, initialize_agent
from langchain_openai import OpenAI

# 💥 Building a multi-tool AI agent - 2 tools below 

In [None]:
tools = [
    Tool(
        name="Shakespeare QA System",
        func=sonnets.run,  # Make sure this matches your RetrievalQA object
        description="Useful for answering questions about Shakespeare's sonnets, such as themes, metaphors, and interpretations. Input should be a fully formed question like 'What is the theme of Sonnet 18?'",
        return_direct=True,
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="Useful for answering questions about Ruff (a Python linter), including features, rules, and behavior. Input should be a fully formed question.",
        return_direct=True,
    ),
]

In [None]:
# Construct the agent (uses ZERO_SHOT_REACT_DESCRIPTION by default)

# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True # see how it thinks - a full play-by-play.
)

In [None]:
print(OPENAI_API_KEY)

In [None]:
import openai
openai.api_key = "k-proj-QOapNUasNeC59V3D7qaH4Vp1xznP8_0iYZeBZ7u712x-BeuuJGsUiSNYAZSHpxVBdd-Info83pT3BlbkFJp0oiLWHjAgy8N2bQUOAPR_3gC4qfnPube9qBiigS1OTcdJAgDdIoI4NP5qp9iP0wLeiPb33O8A"

- sends question to the agent

In [None]:
agent.invoke(
    "what poem in shakspaer is very popular?"
)

In [None]:
agent.invoke("Why use ruff over flake8?")

## Use the Agent solely as a router

You can also set `return_direct=True` if you intend to use the agent as a router and just want to directly return the result of the RetrievalQAChain.

Notice that in the above examples the agent did some extra work after querying the RetrievalQAChain. You can avoid that and just return the result directly.

In [None]:
tools = [
    Tool(
        name="Shakespeare QA System",
        func=sonnets.run,  # Make sure this matches your RetrievalQA object
        description="Useful for answering questions about Shakespeare's sonnets, such as themes, metaphors, and interpretations. Input should be a fully formed question like 'What is the theme of Sonnet 18?'",
        return_direct=True,
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="Useful for answering questions about Ruff (a Python linter), including features, rules, and behavior. Input should be a fully formed question.",
        return_direct=True,
    ),
]


#return_direct=True
#eyo agent, don’t overthink it. Just run this tool and return whatever it gives you directly. No more thought chains, no commentary — just the answer

In [None]:
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [None]:
agent.invoke("Why use ruff over flake8?")

## Vector Stores Work So Well Here
## Multi-Hop vector store reasoning

- giving the agent a complex question that requires multiple facts.

In [None]:
tools = [
    Tool(
        name="Shakespeare QA System",
        func=sonnets.run, 
        description="Useful for answering questions about Shakespeare's sonnets, such as themes, metaphors, and interpretations. Input should be a fully formed question like 'What is the theme of Sonnet 18?'",
        return_direct=True,
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="Useful for answering questions about Ruff (a Python linter), including features, rules, and behavior. Input should be a fully formed question.",
        return_direct=True,
    ),
]

In [None]:
# Construct the agent. We will use the default agent type here.
# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [None]:
agent.invoke("Compare what Shakespeare says about eternal love in Sonnet 18 and Sonnet 116. Are their views aligned?")

In [None]:
agent.invoke("What are two different views on time from Shakespeare's sonnets, and which one presents a more optimistic view?")

# 🛠️ TOOLS BUILT in this NoteBook


 Shakespeare QA System	sonnets.txt (via TextLoader)	

 
 Ruff QA System	Ruff docs (via WebBaseLoader)

# 💬 TYPES OF QUESTIONS HANDLED
 1- ❓ Single-hop:

"What is Sonnet 18 about?"

2- 🔁 Multi-hop:

"How does Sonnet 18’s message about love compare to Sonnet 116?"

3- 🤖 Agent routing:

"What is Ruff’s default behavior for unused imports?"
(agent knows to use Ruff tool)

🧪 Direct tool response:

Using return_direct=True to skip extra agent logic