# Lab | Agent & Vector store

**Change the state union dataset and replicate this lab by updating the prompts accordingly.**

One such dataset is the [sonnets.txt](https://github.com/martin-gorner/tensorflow-rnn-shakespeare/blob/master/shakespeare/sonnets.txt) dataset or any other data of your choice from the same git.

# Combine agents and vector stores

This notebook covers how to combine agents and vector stores. The use case for this is that you've ingested your data into a vector store and want to interact with it in an agentic manner.

The recommended method for doing so is to create a `RetrievalQA` and then use that as a tool in the overall agent. Let's take a look at doing this below. You can do this with multiple different vector DBs, and use the agent as a way to route between them. There are two different ways of doing this - you can either let the agent use the vector stores as normal tools, or you can set `return_direct=True` to really just use the agent as a router.

## Create the vector store

In [1]:
!pip install -U \
    pydantic \
    langchain>=0.1.0 \
    langchain_community \
    langchain_openai \
    chromadb \
    tiktoken


In [2]:
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAI, OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader


In [4]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

OPENAI_API_KEY  = os.getenv('OPENAI_API_KEY')

In [5]:
# If you're using colab, run this
os.environ['OPENAI_API_KEY'] = "OPENAI_API_KEY"

In [6]:
llm = OpenAI(temperature=0)

In [10]:
from pathlib import Path

relevant_parts = []
for p in Path(".").absolute().parts:
    relevant_parts.append(p)
    if relevant_parts[-3:] == ["langchain", "docs", "modules"]:
        break
doc_path = str(Path(*relevant_parts) / "state_of_the_union.txt")

In [11]:

loader = TextLoader(doc_path)
documents = loader.load()


text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# embeddings
embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_documents(texts, embeddings, collection_name="ai-2024")




In [12]:
llm = OpenAI(temperature=0)
ai_qa = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=docsearch.as_retriever()
)


In [13]:
from langchain_community.document_loaders import WebBaseLoader

In [14]:
loader = WebBaseLoader("https://beta.ruff.rs/docs/faq/")

In [15]:
docs = loader.load()
ruff_texts = text_splitter.split_documents(docs)
ruff_db = Chroma.from_documents(ruff_texts, embeddings, collection_name="ruff")
ruff = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=ruff_db.as_retriever()
)



## Create the Agent

In [16]:
# Import things that are needed generically
from langchain.agents import AgentType, Tool, initialize_agent
from langchain_openai import OpenAI

In [17]:
tools = [
    Tool(
        name="AI 2024 QA System",
        func=ai_qa.run,
        description="Useful for questions about the AI landscape in 2024. Input should be a full question.",
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="Useful for questions about Ruff (a Python linter). Input should be a full question.",
    ),
]

In [18]:
# Construct the agent. We will use the default agent type here.
# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)


In [19]:
agent.invoke("What were the main concerns about AI in 2024?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m We should consider the potential ethical and societal implications of AI.
Action: AI 2024 QA System
Action Input: "What were the main concerns about AI in 2024?"[0m
Observation: [36;1m[1;3m The main concerns about AI in 2024 were related to surveillance, data privacy, job displacement, and the authenticity of AI-generated content.[0m
Thought:[32;1m[1;3m We should also look into any advancements or breakthroughs in AI technology.
Action: AI 2024 QA System
Action Input: "What were the main advancements in AI technology in 2024?"[0m
Observation: [36;1m[1;3m Some of the main advancements in AI technology in 2024 included its integration into public systems such as healthcare, education, and transportation, the development of multi-modal AI models, and the launch of open-data initiatives and a Global AI Governance Framework.[0m
Thought:[32;1m[1;3m It's important to also consider the global perspective on AI.
Action: A

{'input': 'What were the main concerns about AI in 2024?',
 'output': 'The main concerns about AI in 2024 were related to ethical and societal implications, advancements in AI technology, and the global perspective on AI.'}

In [20]:
agent.invoke("What did President Biden say about AI education in 2024?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m We should use the AI 2024 QA System to get information about President Biden's statements.
Action: AI 2024 QA System
Action Input: "What did President Biden say about AI education in 2024?"[0m
Observation: [36;1m[1;3m President Biden announced increased funding for AI education and workforce transition programs.[0m
Thought:[32;1m[1;3m This is helpful information, but let's also check with the Ruff QA System to see if there are any updates on AI education policies.
Action: Ruff QA System
Action Input: "What are the latest updates on AI education policies?"[0m
Observation: [33;1m[1;3m I don't know.[0m
Thought:[32;1m[1;3m It seems like the Ruff QA System is not equipped to answer this question. Let's stick with the AI 2024 QA System for now.
Action: AI 2024 QA System
Action Input: "What did President Biden say about AI education in 2024?"[0m
Observation: [36;1m[1;3m President Biden announced increased funding for

{'input': 'What did President Biden say about AI education in 2024?',
 'output': 'President Biden announced increased funding for AI education and workforce transition programs in 2024.'}

## Use the Agent solely as a router

You can also set `return_direct=True` if you intend to use the agent as a router and just want to directly return the result of the RetrievalQAChain.

Notice that in the above examples the agent did some extra work after querying the RetrievalQAChain. You can avoid that and just return the result directly.

In [24]:
tools = [
    Tool(
        name="AI 2024 QA System",
        func=ai_qa.run,
        description="Answer questions about the AI developments in 2024.",
        return_direct=True,
    ),
    Tool(
        name="Climate QA System",
        func=ruff.run,
        description="Answer questions about climate change and sustainability.",
        return_direct=True,
    ),
]

In [25]:
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [27]:
agent.invoke("What are the global challenges related to AI and climate change in 2024?")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m We should use the AI 2024 QA System to answer this question.
Action: AI 2024 QA System
Action Input: "What are the global challenges related to AI and climate change in 2024?"[0m
Observation: [36;1m[1;3m I don't know.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


{'input': 'What are the global challenges related to AI and climate change in 2024?',
 'output': " I don't know."}

In [28]:
agent.invoke("What makes Ruff faster than Flake8?")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m Ruff and Flake8 are both code linters, so I should use the AI 2024 QA System to answer this question.
Action: AI 2024 QA System
Action Input: "What makes Ruff faster than Flake8?"[0m
Observation: [36;1m[1;3m I don't know.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


{'input': 'What makes Ruff faster than Flake8?', 'output': " I don't know."}

## Multi-Hop vector store reasoning

Because vector stores are easily usable as tools in agents, it is easy to use answer multi-hop questions that depend on vector stores using the existing agent framework.

In [29]:
tools = [
    Tool(
        name="AI QA System",
        func=ai_qa.run,
        description="Answer questions about the 2024 State of the Union, especially AI-related topics. Ask full questions.",
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="Answer questions about the Ruff Python linter. Ask clear, specific questions.",
    ),
]


In [32]:
# Construct the agent. We will use the default agent type here.
# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [30]:

tool_name = agent.invoke("What tool does ruff use to run over Jupyter Notebooks?")


agent.invoke(f"Did the president mention {tool_name} in the state of the union?")




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I am not sure what ruff is or what it does. I will need to do some research to find out.
Action: AI 2024 QA System
Action Input: "What is ruff?"[0m
Observation: [36;1m[1;3m I don't know.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should use the AI 2024 QA System to answer this question.
Action: AI 2024 QA System
Action Input: Did the president mention {'input': 'What tool does ruff use to run over Jupyter Notebooks?', 'output': " I don't know."} in the state of the union?[0m
Observation: [36;1m[1;3m I don't know.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


{'input': 'Did the president mention {\'input\': \'What tool does ruff use to run over Jupyter Notebooks?\', \'output\': " I don\'t know."} in the state of the union?',
 'output': " I don't know."}