-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call Cohere RAG inference with documents
argument
#13196
Conversation
There are a few other templates that you'd probably want a "cohere" version of
Trying to think of a few ways so that the user doesn't have to think about this. All our query engines actually use selector prompts -- based on the LLM being used, select an appropriate template. We could add cohere to this logic.
I think thats ok-ish? I don't know how many LLMs will end up with specific RAG prompts, but this feels fairly scalable? The other option is having cohere-specific response synthesizers, but that feels like mostly overkill to implement the same algorithm with slightly different LLM calls. |
@logan-markewich That's a great suggestion, thanks a lot. We implemented it, and it works! llm = Cohere(model="command-r-plus")
# Define your vector_index... e.g.
vector_index = VectorStoreIndex.from_documents(documents, transformations=[splitter], embed_model=embed_model)
engine = vector_index.as_query_engine(llm=llm, similarity_top_k=3)
pprint(engine._response_synthesizer._text_qa_template) Will show the new template. If you're happy with the shape of the PR, let's start the review? Who should I mark for reviewers? One last question: our lives would have been easier if we'd added the core package as a dependency to poetry. Something like |
@co-antwan awesome, glad it works!! hmm, does that poetry command work? I know coordinating changes that involve both core and another external package are a little annoying, but its only annoying until I can try and give this a more thorough review today at some point! |
llama-index-integrations/llms/llama-index-llms-cohere/llama_index/llms/cohere/base.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally on board with this, thanks for the fix! Just one comment about using this in more methods
llama-index-integrations/llms/llama-index-llms-cohere/llama_index/llms/cohere/utils.py
Outdated
Show resolved
Hide resolved
@logan-markewich Thanks for the review! We've implemented your change requests (thanks for catching these). In addition, @harry-cohere and I reworked our function for converting Is there a list of keywords that the different retrievers introduce when formatting retrieved documents? If so, I'd like to add those to our PR before merging (at least the most common ones). Thanks as always |
@logan-markewich One last question: not sure if we need to bump the version in pyproject.toml (don't know what counts as a sufficient change). You have the rights to edit if you want to make the change yourself :) |
@co-antwan these "keyword" fields are just the metadata from the nodes/documents For example
This is the text that is used in the synthesizers. Since its metadata, and user defined, it really could be anything. But the most common are the ones inserted by the simple directory reader, but I'm not sure if all these make sense to track
|
Normally, any change should bump the toml version for an integration. But since this depends on core being updated, I'll have to coordinate a release for the two after this merges |
|
Hello @co-antwan! Thanks for your implementation :). I was actually considering how to implement support for the "documents" field myself, so this has been very helpful. However, I have a concern regarding the handling of non-document text when using the Chat Engine with a custom prompt. In my setup, I use a CondensePlusContextChatEngine with _context_prompt_template that includes specific instructions followed by document data. The template looks something like this:
Could you clarify what happens when such a message is processed using the current implementation of remove_documents_from_messages and document_message_to_cohere_document? My main concern is that the instructions in the prompt might be removed, leaving only the document data to be passed to the LLM. Does the logic in the PR ensure that instructional or other non-document text within the same message as document data is preserved, or do we need to implement additional safeguards to ensure that such text is not lost during processing? Thank you for any guidance you can provide! |
I think I can simply add my instructions directly to the last message in the chat_messages :) |
Hey @ulan-yisaev , it's good that you raise those kinds of concerns now :)
|
Hey @co-antwan, Thanks a lot for the 'preamble' hint; that’s really helpful! I appreciate your guidance on how to best use the new features. My setup is pretty straightforward. Here's a minimal working example that you can try to understand how I'm currently integrating everything:
|
I've conducted some tests with my setup and noticed something interesting that might be worth discussing. It seems the documents are not being split as expected, likely due to how the message roles are assigned. In the function remove_documents_from_messages(), it expects DocumentMessage which has the role MessageRole.SYSTEM. However, in my setup, the role is assigned as MessageRole.CHATBOT. Here’s what my debug logs show (with
I think aligning the message roles or adjusting how they are interpreted might be necessary? |
Upon further investigation I've identified a potential configuration mismatch that might be contributing to the issue with document recognition and handling. This role discrepancy causes all our system messages, which include both instructions and document data, to be categorized as 'remaining' rather than 'documents', leading to none of the document data being processed as intended. |
@ulan-yisaev Good catch, thanks a ton -- this was gnarly 🙏 Can you test if the current change fixes your prompt formatting? For outstanding issues, e.g. integrating |
@ulan-yisaev on second thought: are you sure that your issue is caused by the SYSTEM role? Our check depends only on checking the instance, not the role. I just checked and the Could something else be causing your issue inside |
@co-antwan I was just about to merge this (forgot that it was still waiting), and it seems like there's maybe other things going on 😅 Seems like its actually good to merge though? |
@co-antwan , you're right; my apologies for the confusion. The issue isn't related to the MessageRole.
This direct assignment bypasses the use of any specialized document handling mechanisms, like those defined in the COHERE_QA_TEMPLATE. |
Hello @logan-markewich , In your condense_plus_context setup, we currently handle context strings and document data separately before passing them to the chat function. Given that context_str already encapsulates the necessary document information in a structured format, does it makes sense modifying the chat method to accept this context directly? This would simplify Cohere's current implementation by removing the need for additional parsing or keyword searching. Could this approach be considered a reasonable solution within the llama_index framework? Or perhaps you have another suggestion on how we might better integrate or handle this context data?
|
Seems to me there should just be a cohere context chat engine, rather than trying to shoe-horn the existing chat engine into this specific API 🤔 Having a context specific kwarg doesn't generalize to all llms in this case |
@ulan-yisaev No worries, I appreciate the double check :) |
Cool! Just note I will need to coordinate a release with core for this once it merges. I should get to it tomorrow. Thanks for all the help and effort here everyone! 👍🏻 |
Thanks @logan-markewich for your advice throughout! |
Description
Adds support for Cohere.chat's
documents
argument when using in RAG pipelines. This ensures proper formatting on Cohere's client side, and leads to better downstream performance.This is a 'drafty' PR. Our implementation works and improves on the current state of the code. However, it's also somewhat brittle. We're looking for guidance from the LlamaIndex team on how to best tackle the problem (especially if you have ongoing efforts to add customisation by model family).EDIT: Fixed thanks for @logan-markewich 's suggestions ❤️
The problem
When called from inside retrievers (e.g. for RAG evaluations),
Cohere.chat
doesn't follow the intended route for RAG inference.message
documents
keyword insteadThe proposed solution
The only light-touch fix we could find was to create a new
ChatPromptTemplate
along a newChatMessage
instance calledDocumentMessage
. We adjusted the logic insideCohere.chat
to parse these DocumentMessage's properly and pass them to thedocuments
keyword.Example snippet:
Where we would like guidance
~~Our proposed solutions works, but is brittle for a number of reasons. Could you advise us on whether you can think of a better solution that's similarly light-touch?
COHERE_QA_TEMPLATE
, which won't be obvious. How can we make some templates the default for a model family?See @logan-markewich 's proposed changes and our implementation. This should ensure that Cohere templates get called whenever a retriever uses a Cohere LLM. Thanks @logan-markewich !
Fixes # (issue)
New Package?
Did I fill in the
tool.llamahub
section in thepyproject.toml
and provide a detailed README.md for my new integration or package?Version Bump?
Did I bump the version in the
pyproject.toml
file of the package I am updating? (Except for thellama-index-core
package)Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods