Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can Document metadata be passed into prompts? #1136

Closed
batmanscode opened this issue Feb 18, 2023 · 17 comments
Closed

How can Document metadata be passed into prompts? #1136

batmanscode opened this issue Feb 18, 2023 · 17 comments

Comments

@batmanscode
Copy link

Here is an example:

  • I have created vector stores from several podcasts
  • metadata = {"guest": guest_name}
  • question = "which guests have talked about <topic>?"

Using VectorDBQA, this could be possible if {context} contained text + metadata

@batmanscode
Copy link
Author

Another format for retrieving text with metadata could be:

TEXT: <what the guest said>
GUEST: <guest_name>

Or maybe even:

TEXT: <what the guest said>
METADATA: {"guest": guest_name}

This way when asking questions, I can ask things like "what did <guest_name> say about <topic>?"

@sbc-max
Copy link

sbc-max commented Mar 6, 2023

I have a number of different uses cases where this would also be helpful. I considered just adding the metadata directly to the text before embedding, but that's not ideal.

@flash1293
Copy link
Contributor

Not 100% sure whether applicable to your case, but if you are using the stuff chain, you can do this by adjusting the document_prompt:

document_prompt = PromptTemplate(input_variables=["page_content", "id"], template="{page_content}, id: {id}")
qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vector_store.as_retriever(), chain_type_kwargs={"document_prompt": document_prompt})

if there is an id metadata in your doc, it will be injected correctly

@batmanscode
Copy link
Author

Not 100% sure whether applicable to your case, but if you are using the stuff chain, you can do this by adjusting the document_prompt:

document_prompt = PromptTemplate(input_variables=["page_content", "id"], template="{page_content}, id: {id}")
qa = RetrievalQA.from_chain_type(llm=OpenAI(temperature=0), chain_type="stuff", retriever=vector_store.as_retriever(), chain_type_kwargs={"document_prompt": document_prompt})

if there is an id metadata in your doc, it will be injected correctly

Wow that's cool, didn't know about that kwarg! Thanks, will try this 😃

@connorjoleary
Copy link

This won't change the docs grabbed by the retriever right? For example if I have a guest (Greg) stored in the metadata and I ask "what did Greg say", the retriever won't take the guest into account when grabbing the source and use it to match on something like similarity.

@flash1293
Copy link
Contributor

No, that's just for the refinement of the context documents by the LLM part.

@joe-barhouch
Copy link

joe-barhouch commented Aug 10, 2023

Is there a way i could do the same with a ConversationalRetrievalChain?
I keep running into the error: ValueError: Missing some input keys
This is my function:
`
def get_conversation_chain(vectorstore: FAISS):
llm = ChatOpenAI(model="gpt-4-0613", temperature=0.5, streaming=False)

templates = [
    SystemMessagePromptTemplate.from_template(
        prompts.system_prompt_v1,
        input_variables=["context", "source", "page_number"],
    ),
    HumanMessagePromptTemplate.from_template(
        prompts.user_prompt,
        input_variables=["context", "source", "page_number"],
    ),
]
qa_template = ChatPromptTemplate.from_messages(templates)

memory = ConversationSummaryBufferMemory(
    llm=llm, max_token_limit=5000, memory_key="chat_history", return_messages=True
)
memory.input_key = "question"
memory.output_key = "answer"

conversation_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(
        k=5, search_type="mmr", fetch_k=20, lambda_mult=0.5
    ),
    memory=memory,
    return_source_documents=True,
    chain_type="stuff",
    combine_docs_chain_kwargs={"prompt": qa_template},
)

return conversation_chain

`

@Robs-Git-Hub
Copy link

Is there a way i could do the same with a ConversationalRetrievalChain? I keep running into the error: ValueError: Missing some input keys This is my function: ` def get_conversation_chain(vectorstore: FAISS): llm = ChatOpenAI(model="gpt-4-0613", temperature=0.5, streaming=False)

@joe-barhouch Did you solve this? I want to use metadata as an input_variable but it only seems to allow 'context', which is page_content.

@joe-barhouch
Copy link

joe-barhouch commented Sep 11, 2023

@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.

I ended up implementing my own version with LLMChain with a memory. All of the document retrieval is taken care of by immediately calling similarity_search or similar calls directly from your vectorstore.
Then i can get the metadata I have created and pass it into the prompt.

At the end of the day the RAG application just copy paste the results to the prompt, so I just handled it on my own without the need of the abstraction layer of Conversation Agents

@Robs-Git-Hub
Copy link

@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.

Thanks for the quick reply. Very helpful, and I was reaching a similar conclusion.

@theekshanamadumal
Copy link

for ConversationalRetrievalChain

document_combine_prompt = PromptTemplate(
     input_variables=["source","year", "page","page_content"],
     template= """source: {source}
     year: {year}
     page: {page} 
     page content: {page_content}"""
)
qa = ConversationalRetrievalChain.from_llm(
                         ...           ,
        combine_docs_chain_kwargs={
            "prompt": retrieval_qa_chain_prompt,
            "document_prompt": document_combine_prompt,
        },
        
)

@joe-barhouch
Copy link

@theekshanamadumal
Unless you query the metadata, this will give an error with the missing input variables passed into the prompt template

@AI-General
Copy link

What is difference between "prompt" and "document_prompt"?

@theekshanamadumal
Copy link

theekshanamadumal commented Oct 12, 2023

@theekshanamadumal Unless you query the metadata, this will give an error with the missing input variables passed into the prompt template

Yes. you should know what are the metadata fields in the document before creating the document prompt.

@theekshanamadumal
Copy link

What is difference between "prompt" and "document_prompt"?

document prompt is the Prompt template used to organize content in retrieved documents.
This ends up in the main prompt as the 'context'

Copy link

dosubot bot commented Feb 8, 2024

Hi, @batmanscode! I'm helping the LangChain team manage their backlog and am marking this issue as stale.

It looks like you opened this issue to discuss passing Document metadata into prompts when using VectorDBQA. There have been contributions from other users sharing similar use cases and suggesting potential solutions. However, the issue remains unresolved.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to LangChain!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 8, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Feb 15, 2024
@sgautam666
Copy link

@Robs-Git-Hub had to step back from Conversational Agents. The layer of abstraction helps with prototypes but hurts full fledged apps.

I ended up implementing my own version with LLMChain with a memory. All of the document retrieval is taken care of by immediately calling similarity_search or similar calls directly from your vectorstore. Then i can get the metadata I have created and pass it into the prompt.

At the end of the day the RAG application just copy paste the results to the prompt, so I just handled it on my own without the need of the abstraction layer of Conversation Agents

Hello, I am looking for similar use case. I am extracting some metadata using 'similarity_search'. Now I want to use this to another QA chain. Can you show me the code snippet you used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants