Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aksing about uploaded documents only works when I first ask the AI on a specific one #9451

Closed
4 of 14 tasks
Zylon989 opened this issue Aug 18, 2023 · 2 comments
Closed
4 of 14 tasks
Labels
Ɑ: agent Related to agents module Ɑ: models Related to LLMs or chat model modules 🤖:question A specific question about the codebase, product, project, or how to use a feature

Comments

@Zylon989
Copy link

System Info

Hello!
I wrote a code that is very similar to this one
https://python.langchain.com/docs/integrations/toolkits/document_comparison_toolkit

Who can help?

No response

Information

  • The official example notebooks/scripts
  • My own modified scripts

Related Components

  • LLMs/Chat Models
  • Embedding Models
  • Prompts / Prompt Templates / Prompt Selectors
  • Output Parsers
  • Document Loaders
  • Vector Stores / Retrievers
  • Memory
  • Agents / Agent Executors
  • Tools / Toolkits
  • Chains
  • Callbacks/Tracing
  • Async

Reproduction

class DocumentInput(BaseModel):
    question: str = Field()


llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")

tools = []
files = [
    # https://abc.xyz/investor/static/pdf/2023Q1_alphabet_earnings_release.pdf
    {
        "name": "alphabet-earnings",
        "path": "/Users/harrisonchase/Downloads/2023Q1_alphabet_earnings_release.pdf",
    },
    # https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q1-2023-Update
    {
        "name": "tesla-earnings",
        "path": "/Users/harrisonchase/Downloads/TSLA-Q1-2023-Update.pdf",
    },
]

for file in files:
    loader = PyPDFLoader(file["path"])
    pages = loader.load_and_split()
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    docs = text_splitter.split_documents(pages)
    embeddings = OpenAIEmbeddings()
    retriever = FAISS.from_documents(docs, embeddings).as_retriever()
        # Wrap retrievers in a Tool
tools.append(
    Tool(
        args_schema=DocumentInput,
        name=file["name"],
        description=f"useful when you want to answer questions about {file['name']}",
        func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever),
    )
)
llm = ChatOpenAI(
temperature=0,
model="gpt-3.5-turbo-0613",
)

agent = initialize_agent(
    agent=AgentType.OPENAI_FUNCTIONS,
    tools=tools,
    llm=llm,
    verbose=True,
)

agent({"input": "did alphabet or tesla have more revenue?"})

Expected behavior

It's working so far, but when I directly ask the AI to compare the documents mentioning them by name,
the AI does not know what I am talking about!
I first have to ask questions specificly about each document and seperatly mentioning their names, then the llm can compare them and answer my questions.
Any idea to solve this issue?

I would appreaciate any help!:)

cheers

@dosubot dosubot bot added Ɑ: agent Related to agents module Ɑ: models Related to LLMs or chat model modules 🤖:question A specific question about the codebase, product, project, or how to use a feature labels Aug 18, 2023
@dosubot
Copy link

dosubot bot commented Aug 18, 2023

🤖

Hello,

From your code, it seems like you're trying to use the LangChain framework to compare two documents and answer questions about them. However, you're facing an issue where the AI doesn't recognize the documents unless you ask about them individually first.

This behavior might be due to the way the LangChain framework handles memory and document retrieval. As per the LangChain memory management, the AI model doesn't inherently know about the documents unless they are brought into the conversation context. When you ask a question about a specific document by name, the corresponding tool is invoked, and the document is loaded into the memory, allowing the AI to answer questions about it.

However, when you ask a question that involves comparing two documents, the AI might not be able to handle it correctly if both documents are not already in the memory. This is because each document is loaded into memory by a separate tool, and these tools are invoked individually when their specific document is mentioned.

To solve this issue, you might need to modify your code to ensure that both documents are loaded into memory at the start of the conversation. You could do this by invoking the tools for both documents at the start, or by modifying the tools to load the documents into memory when they are initialized.

Here's a simple example of how you might modify your code to load both documents into memory at the start:

# Initialize the tools and load the documents into memory
for file in files:
    loader = PyPDFLoader(file["path"])
    pages = loader.load_and_split()
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    docs = text_splitter.split_documents(pages)
    embeddings = OpenAIEmbeddings()
    retriever = FAISS.from_documents(docs, embeddings).as_retriever()
    # Wrap retrievers in a Tool
    tools.append(
        Tool(
            args_schema=DocumentInput,
            name=file["name"],
            description=f"useful when you want to answer questions about {file['name']}",
            func=RetrievalQA.from_chain_type(llm=llm, retriever=retriever),
        )
    )

# Invoke the tools to load the documents into memory
for tool in tools:
    tool.func()

# Now you can ask questions that involve comparing the two documents
agent({"input": "did alphabet or tesla have more revenue?"})

Please note that this is a simplified example and the actual implementation might need to be more complex, depending on your specific requirements and the details of how the LangChain framework handles memory and document retrieval.

I hope this helps! Let me know if you have any other questions.

Best,
Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Copy link

dosubot bot commented Nov 18, 2023

Hi, @Zylon989! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you were experiencing an issue where the AI is unable to compare documents unless they are specifically mentioned by name. Dosubot suggested modifying the code to ensure that both documents are loaded into memory at the start of the conversation. This behavior is due to the way the LangChain framework handles memory and document retrieval. It seems that both you and Dosubot have reacted positively to this suggestion.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: agent Related to agents module Ɑ: models Related to LLMs or chat model modules 🤖:question A specific question about the codebase, product, project, or how to use a feature
Projects
None yet
Development

No branches or pull requests

1 participant