# Combine agents and vector stores

This notebook covers how to combine agents and vector stores. The use case for this is that you've ingested your data into a vector store and want to interact with it in an agentic manner.

The recommended method for doing so is to create a `RetrievalQA` and then use that as a tool in the overall agent. Let's take a look at doing this below. You can do this with multiple different vector DBs, and use the agent as a way to route between them. There are two different ways of doing this - you can either let the agent use the vector stores as normal tools, or you can set `return_direct=True` to really just use the agent as a router.

## Create the vector store

In [36]:
from langchain.chains import RetrievalQA
from langchain_chroma import Chroma
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_text_splitters import CharacterTextSplitter, RecursiveCharacterTextSplitter

llm = ChatOllama(model="qwen2:0.5b-instruct-fp16",temperature=0)

In [59]:
from pathlib import Path

relevant_parts_pdf = []
for p in Path(".").absolute().parts:
    relevant_parts_pdf.append(p)
    if relevant_parts_pdf[-3:] == ["langchain", "docs", "modules"]:
        break
print(relevant_parts_pdf)
doc_path_pdf_01 = str(Path(*relevant_parts_pdf) / "CCI1010.pdf")
doc_path_pdf_02 = str(Path(*relevant_parts_pdf) / "CCI1477.pdf")
doc_path_pdf_01, doc_path_pdf_02

['/', 'home', 'rogerio_rodrigues', 'python-workspace', 'rag_python', 'notes']


('/home/rogerio_rodrigues/python-workspace/rag_python/notes/CCI1010.pdf',
 '/home/rogerio_rodrigues/python-workspace/rag_python/notes/CCI1477.pdf')

In [61]:
from langchain_community.document_loaders import TextLoader, PyPDFium2Loader

def get_documents(file: str):
    loader        = PyPDFium2Loader(file)
    documents     = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=3)
    texts         = text_splitter.split_documents(documents)
    return texts

documents = []
documents.extend(get_documents(doc_path_pdf_01))
documents.extend(get_documents(doc_path_pdf_02))
embeddings    = OllamaEmbeddings(model="nomic-embed-text")
docsearch     = Chroma.from_documents(documents, embeddings, collection_name="cci_pdf")



In [62]:
for doc in documents:
    print("\n# -------------------------------------------------\n")
    print(doc)
    print("\n# -------------------------------------------------\n")



# -------------------------------------------------

page_content='#RESTRITA#
CCI – 1.010/2024 - CCS Brasília/DF, 31 de julho de 2024.
Às entidades do Sicoob.
Evolução da funcionalidade Multicálculo Automóvel no Sistema Integrado para Gestão e 
Aquisição de Seguros do Sicoob (SicoobSigas) do Sisbr 3.0, referente às mudanças na 
integração das seguradoras Porto e Azul.
Senhores(as),
1. Com as mudanças na integração das seguradoras Porto e Azul, informamos a 
evolução da funcionalidade Multicálculo Automóvel no SicoobSigas do Sisbr 3.0, 
realizada em 24/7/2024, contemplando as seguintes alterações nas companhias:
a) a partir de 24/7/2024, os cálculos realizados não serão salvos. Após essa data, 
será necessário criar uma simulação para calcular ou contratar seguros;
b) nas renovações de seguros com vigência a partir de 23/8/2024, será 
apresentado um pacote fechado de condição exclusiva de cada seguradora. 
Não será válido o recálculo ou ajustes de coberturas;' metadata={'source': '/hom

In [82]:
relevant_context = docsearch.as_retriever(query='*',search_type='mmr')
    #query="Faça um resumo da CCI 1.477",
    #where_document={"$contains": "CCI - 1.477"},
    #fetch_k=20, k=2
#)
print(relevant_context)
for doc in relevant_context:
    print(doc.page_content.replace('\n', '\n\n'))

[Document(metadata={'page': 1, 'source': '/home/rogerio_rodrigues/python-workspace/rag_python/notes/CCI1010.pdf'}, page_content='#RESTRITA#\r\nvistoria da seguradora:'), Document(metadata={'page': 1, 'source': '/home/rogerio_rodrigues/python-workspace/rag_python/notes/CCI1010.pdf'}, page_content='seguinte caminho: Sisbr 2.0 → Sigas (Direcionamento para o Sisbr 3.0) → Nome ou'), Document(metadata={'page': 0, 'source': '/home/rogerio_rodrigues/python-workspace/rag_python/notes/CCI1010.pdf'}, page_content='a) a partir de 24/7/2024, os cálculos realizados não serão salvos. Após essa data,'), Document(metadata={'page': 1, 'source': '/home/rogerio_rodrigues/python-workspace/rag_python/notes/CCI1010.pdf'}, page_content='Atendimento I, II, III e IV; Crédito I, II, III, IV e V; Retaguarda I, II, III e IV.')]
#RESTRITA#

vistoria da seguradora:
seguinte caminho: Sisbr 2.0 → Sigas (Direcionamento para o Sisbr 3.0) → Nome ou
a) a partir de 24/7/2024, os cálculos realizados não serão salvos. Após e

In [7]:
from langchain_community.document_loaders import TextLoader, PyPDFium2Loader

loader        = TextLoader(doc_path)
documents     = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts         = text_splitter.split_documents(documents)

embeddings    = OllamaEmbeddings(model="nomic-embed-text")
docsearch     = Chroma.from_documents(texts, embeddings, collection_name="cci")

In [22]:
comunicado_retrieve = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=docsearch.as_retriever()
)

In [9]:
from langchain_community.document_loaders import WebBaseLoader

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [10]:
loader = WebBaseLoader("https://beta.ruff.rs/docs/faq/")

In [11]:
docs       = loader.load()
ruff_texts = text_splitter.split_documents(docs)
ruff_db    = Chroma.from_documents(ruff_texts, embeddings, collection_name="ruff")
ruff       = RetrievalQA.from_chain_type(
    llm=llm, chain_type="stuff", retriever=ruff_db.as_retriever()
)

Created a chunk of size 2134, which is longer than the specified 1000
Created a chunk of size 2470, which is longer than the specified 1000
Created a chunk of size 1259, which is longer than the specified 1000
Created a chunk of size 1053, which is longer than the specified 1000


## Create the Agent

In [19]:
# Import things that are needed generically
from langchain.agents import AgentType, Tool, initialize_agent, ZeroShotAgent, create_react_agent

In [23]:
tools = [
    Tool(
        name="Comunicado 1.010/2024 QA System",
        func=comunicado_retrieve.run,
        description="útil para quando você precisa responder a perguntas sobre o comunicado. A entrada deve ser uma pergunta totalmente formada.",
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.",
    ),
]

In [24]:
# Construct the agent. We will use the default agent type here.
# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [21]:
agent.run(
    "A evolução da funcionalidade Multicálculo Automóvel se referente às mudanças na integração das seguradoras Porto e Azul?"
)

  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mI need to find information about the evolution of the Multi-Calculator Vehicle function and its connection with the integration of Porto Seguro and Azul Insurance.
Action: Comunicado 1.010/2024 QA System
Action Input:[0m

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2



Observation: [36;1m[1;3mA evolução da funcionalidade Multicálculo Automóvel no Sistema Integrado para Gestão e Aquisição de Seguros do Sicoob (SicoobSigas) do Sisbr 3.0, referente às mudanças na integração das seguradoras Porto e Azul, inclui a alteração da funcionalidade de salvar cálculos realizados após 24/7/2024. Além disso, houve a apresentação de um pacote fechado de condição exclusiva para cada seguradora nas renovações de seguros com vigência a partir de 23/8/2024 e a disponibilização da Cotação para Todos como oferta alternativa. O Desconto Cota Prêmio está disponível na opção Condições Especiais, conforme o saldo acumulado da corretora.[0m
Thought:[32;1m[1;3mThe evolution of the Multi-Calculator Vehicle function in the Sicoob Sigas System for Porto and Azul Insurance integration is related to the changes in the system's functionality. The observation indicates that after 24/7/2024, the ability to save calculations made during 24/7/2024 has been added. Additionally, a cl

"The evolution of the Multi-Calculator Vehicle function in the Sicoob Sigas System for Porto and Azul Insurance integration refers to changes in the system's functionality after 24/7/2024. This includes the ability to save calculations made during this period, as well as a closed condition exclusive package presented for each insurance company starting from August 23, 2024, and the All-inclusive quote being available as an alternative offer. The Discount Cota Prêmio option is also available under Special Conditions, based on the accumulated balance of the insurer."

In [47]:
agent.run("Why use ruff over flake8?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out the advantages of using ruff over flake8
Action: Ruff QA System
Action Input: What are the advantages of using ruff over flake8?[0m
Observation: [33;1m[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, 

'Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'

## Use the Agent solely as a router

You can also set `return_direct=True` if you intend to use the agent as a router and just want to directly return the result of the RetrievalQAChain.

Notice that in the above examples the agent did some extra work after querying the RetrievalQAChain. You can avoid that and just return the result directly.

In [25]:
tools = [
    Tool(
        name="Comunicado 1.010/2024 QA System",
        func=comunicado_retrieve.run,
        description="útil para quando você precisa responder a perguntas sobre o comunicado. A entrada deve ser uma pergunta totalmente formada.",
        return_direct=True
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question.",
        return_direct=True
    ),
]

In [27]:
agent = initialize_agent(
    tools, llm, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [28]:
agent.run(
    "A evolução da funcionalidade Multicálculo Automóvel se referente às mudanças na integração das seguradoras Porto e Azul?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: The user is asking about the evolution of the MultiCalculation Vehicle function regarding changes in the integration of Porto and Azul insurance companies. I need to find information on this topic.

Action:
```
{
  "action": "Comunicado 1.010/2024 QA System",
  "action_input": "A evolução da funcionalidade Multicálculo Automóvel se referente às mudanças na integração das seguradoras Porto e Azul?"
}
```

[0m

Number of requested results 4 is greater than number of elements in index 2, updating n_results = 2



Observation: [36;1m[1;3mSim, a evolução da funcionalidade Multicálculo Automóvel no Sistema Integrado para Gestão e Aquisição de Seguros do Sicoob (SicoobSigas) do Sisbr 3.0 se refere às mudanças na integração das seguradoras Porto e Azul. Essa evolução inclui alterações nas companhias, como a necessidade de criar uma simulação para calcular ou contratar seguros após 24/7/2024, apresentação de pacotes fechados de condição exclusiva de cada seguradora, disponibilização de uma oferta alternativa chamada Cotação para Todos com condições adequadas para cada cooperado e a aplicação do Desconto Cota Prêmio na opção Condições Especiais.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


'Sim, a evolução da funcionalidade Multicálculo Automóvel no Sistema Integrado para Gestão e Aquisição de Seguros do Sicoob (SicoobSigas) do Sisbr 3.0 se refere às mudanças na integração das seguradoras Porto e Azul. Essa evolução inclui alterações nas companhias, como a necessidade de criar uma simulação para calcular ou contratar seguros após 24/7/2024, apresentação de pacotes fechados de condição exclusiva de cada seguradora, disponibilização de uma oferta alternativa chamada Cotação para Todos com condições adequadas para cada cooperado e a aplicação do Desconto Cota Prêmio na opção Condições Especiais.'

In [51]:
agent.run("Why use ruff over flake8?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out the advantages of using ruff over flake8
Action: Ruff QA System
Action Input: What are the advantages of using ruff over flake8?[0m
Observation: [33;1m[1;3m Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.[0m
[32;1m[1;3m[0m

[1m> Finished chain.[0m


' Ruff can be used as a drop-in replacement for Flake8 when used (1) without or with a small number of plugins, (2) alongside Black, and (3) on Python 3 code. It also re-implements some of the most popular Flake8 plugins and related code quality tools natively, including isort, yesqa, eradicate, and most of the rules implemented in pyupgrade. Ruff also supports automatically fixing its own lint violations, which Flake8 does not.'

## Multi-Hop vector store reasoning

Because vector stores are easily usable as tools in agents, it is easy to use answer multi-hop questions that depend on vector stores using the existing agent framework.

In [57]:
tools = [
    Tool(
        name="State of Union QA System",
        func=state_of_union.run,
        description="useful for when you need to answer questions about the most recent state of the union address. Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.",
    ),
    Tool(
        name="Ruff QA System",
        func=ruff.run,
        description="useful for when you need to answer questions about ruff (a python linter). Input should be a fully formed question, not referencing any obscure pronouns from the conversation before.",
    ),
]

In [58]:
# Construct the agent. We will use the default agent type here.
# See documentation for a full list of options.
agent = initialize_agent(
    tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)

In [59]:
agent.run(
    "What tool does ruff use to run over Jupyter Notebooks? Did the president mention that tool in the state of the union?"
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I need to find out what tool ruff uses to run over Jupyter Notebooks, and if the president mentioned it in the state of the union.
Action: Ruff QA System
Action Input: What tool does ruff use to run over Jupyter Notebooks?[0m
Observation: [33;1m[1;3m Ruff is integrated into nbQA, a tool for running linters and code formatters over Jupyter Notebooks. After installing ruff and nbqa, you can run Ruff over a notebook like so: > nbqa ruff Untitled.html[0m
Thought:[32;1m[1;3m I now need to find out if the president mentioned this tool in the state of the union.
Action: State of Union QA System
Action Input: Did the president mention nbQA in the state of the union?[0m
Observation: [36;1m[1;3m No, the president did not mention nbQA in the state of the union.[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: No, the president did not mention nbQA in the state of the union.[0m

[1m> Finished chain.[0m


'No, the president did not mention nbQA in the state of the union.'