# Multi Doc Chat with reference by ID using the [Self Querying Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/how_to/self_query)

This notebook demonstrates how to chat with multiple documents by referencing them by their IDs in your database.

Example Query: `How are @doc:bw456 and @doc:cg789 different in their main ideas on learning?`

In [2]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader
from langchain.docstore.document import Document

In [3]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [4]:
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)

files = [
    {
        "path": "/home/bharath/Projects/langchain-experiments/Behaviorism.txt",
        "id": "bw456",
    },
    {
        "path": "/home/bharath/Projects/langchain-experiments/Cognitivism.txt",
        "id": "cg789",
    },
]
docs = []

for file in files:
    loader = TextLoader(file["path"])
    documents = loader.load()
    chunks = text_splitter.split_documents(documents)

    for chunkIndex, chunk in enumerate(chunks):
        docs.append(
            Document(
                page_content=chunk.page_content,
                metadata={
                    "id": file["id"],
                    "source": chunk.metadata["source"],
                    "chunkIndex": chunkIndex,
                },
            )
        )

Created a chunk of size 506, which is longer than the specified 500
Created a chunk of size 793, which is longer than the specified 500


In [5]:
db = Chroma.from_documents(docs, OpenAIEmbeddings())

Create the self querying retriever by passing it the metadata attributes of the documents you want to reference.

In [6]:
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

metadata_field_info = [
    AttributeInfo(
        name="id",
        description="The ID of the doc",
        type="string",
    ),
    AttributeInfo(
        name="source",
        description="The path to the file of the source document",
        type="string",
    ),
]
document_content_description = "Info of a doc"
llm = OpenAI(temperature=0)
retriever = SelfQueryRetriever.from_llm(
    llm, db, document_content_description, metadata_field_info, verbose=True
)

Using the retirever we create a Retrieval QA Chain that can apply appropriate metadata filters to the documents.

In [7]:
from langchain.chat_models import ChatOpenAI
from langchain.chains.conversation.memory import ConversationBufferWindowMemory
from langchain.chains import RetrievalQAWithSourcesChain

# chat completion llm
llm = ChatOpenAI(
    model_name='gpt-3.5-turbo-0613',
    temperature=0.7
)

# retrieval qa chain
qa = RetrievalQAWithSourcesChain.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

Create a tool that uses the QA chain to answer questions about a single doc

In [8]:
from langchain.agents import Tool
from pydantic import BaseModel, Field


class DocQaInput(BaseModel):
    question: str = Field()

def docQA(question):
    return qa({"question": question})

tools = [
    Tool(
        name='Doc Info',
        func=docQA,
        args_schema=DocQaInput,
        description=(
            'use this tool to answer queries about a single doc'
        )
    )
]

Create a chat agent using the `STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION` agent type that uses the tool we just made.

We can chat with this agent with queries that reference the documents by their IDs.

In [10]:
from langchain.agents import initialize_agent
from langchain.agents.agent_types import AgentType

agent = initialize_agent(
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
    tools=tools,
    llm=llm,
    verbose=True,
    max_iterations=3,
    early_stopping_method='generate',
    memory=ConversationBufferWindowMemory(
        memory_key='chat_history',
        k=5,
        return_messages=True
    )
)

QUERY="""\
Instructions:
If the query references multiple docs then:
1. create a task for each doc
2. combine the results from each task.

User Query:
How are @doc:bw456 and @doc:cg789 different in their main ideas on learning?
"""

result = agent(QUERY)
print(result["output"])



[1m> Entering new  chain...[0m
[32;1m[1;3mThought: The user query references two different documents, "bw456" and "cg789". To answer the question, I will need to create a task for each document and combine the results to identify the differences in their main ideas on learning.

Action 1: Retrieve information about the first document.
```
{
  "action": "Doc Info",
  "action_input": {
    "question": "What are the main ideas on learning in @doc:bw456?"
  }
}
```

Observation 1: The result will contain information about the main ideas on learning in document "bw456".

Action 2: Retrieve information about the second document.
```
{
  "action": "Doc Info",
  "action_input": {
    "question": "What are the main ideas on learning in @doc:cg789?"
  }
}
```

Observation 2: The result will contain information about the main ideas on learning in document "cg789".

Thought: Now that we have retrieved the main ideas on learning from both documents, we can combine the results and identify the



query='learning' filter=Comparison(comparator=<Comparator.EQ: 'eq'>, attribute='id', value='bw456') limit=None

Observation: [36;1m[1;3m{'question': 'What are the main ideas on learning in @doc:bw456?', 'answer': 'The main ideas on learning in @doc:bw456 are behaviorism and its two main types of conditioning: classical conditioning and operant conditioning. Classical conditioning involves pairing a specific action or stimulus with an automatic behavioral response, while operant conditioning uses positive and negative reinforcements to encourage or discourage specific behaviors. Positive reinforcement is a consequence that aims to strengthen or encourage certain behavior by providing a reward. \n', 'sources': '/home/bharath/Projects/langchain-experiments/Behaviorism.txt', 'source_documents': [Document(page_content='Behaviorism', metadata={'id': 'bw456', 'source': '/home/bharath/Projects/langchain-experiments/Behaviorism.txt', 'chunkIndex': 0}), Document(page_content='Our behavior is l