Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ChatMessage class to generalize chat interface #652

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kokuyouwind
Copy link
Contributor

Description

This PR is the first prototype based on discussion #629.

The purpose is to make it possible to use ChatMessage instead of raw Hash for chat messages, and to generalize it so that it can be used regardless of LLM.
First of all, this PR will aim to handle the basic roles such as system, user, and assistant.

@kokuyouwind kokuyouwind marked this pull request as draft June 2, 2024 04:25
module Messages
# Generic messages that can be used in the `chat` method of the LLM adapter
# It is recommended that `assistants/messages/*_messages.rb` be unified with this one in the future.
class ChatMessage
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to simplify the namespace structure to just Langchain::ChatMessage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had placed it in a namespace aligned with lib/langchain/assistants/messages/openai_message.rb. I think it should be fine to put it at the root level.

end

def to_hash
{role: role.to_s, content:}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you envision we would account for cases like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't intend to go that far within the scope of this PR, if I expand in the future, wouldn't it be desirable to split ChatMessage into two classes, such as ChatMessage::Basic and ChatMessage::ToolCall?
The former would be a simple class that only holds role and content, while the latter would be a class that additionally holds information necessary for tool calls.

By defining to_hash for both classes, they can be handled transparently.
For parts that cannot be fully handled by the default to_hash, each LLM adapter can absorb them. This way, I believe we can reduce the need for individual LLM support within ChatMessage and Assistant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former would be a simple class that only holds role and content, while the latter would be a class that additionally holds information necessary for tool calls.

Take a look at, for example, how tool results are sent to Anthropic (inside the content key): https://github.com/patterns-ai-core/langchainrb/blob/main/lib/langchain/assistants/messages/anthropic_message.rb#L32-L39

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the purposes of this PR I would just collapse:

Langchain::Messages::AnthropicMessage
Langchain::Messages::GoogleGeminiMessage
Langchain::Messages::OpenAIMessage

into =>

class Langchain::ChatMessage
  def initialize()

  def to_openai_hash()

  def to_google_gemini_hash()

  def to_anthropic_hash()

  def to_hash()

end

What do you think? I can try take a stab at it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not very familiar with function calling, so I was surprised to find that the message formats are completely incompatible. Although I haven't been able to check all the documents, is my understanding below correct?

  • In messages where the LLM instructs the use of a tool, the details of the tool to be called are specified in tool_calls (without using content or tool_call_id)
  • In messages representing the result of the Assistant calling a tool as instructed, the call ID is stored in tool_call_id, and the result of the call is stored in content (without using tool_calls)
  • While the above concepts are generally common among LLMs, the hash keys and hash formats differ

If this understanding is correct, couldn't we generalize it by creating classes like the following? (The namespace and implementation details of the classes are shown in a very simplified manner.)

# Message for LLM to instruct the use of a tool
class ToolCalls < ChatMessage::Base
  # Single tool call
  class ToolCall < Data.new(:id, :tool_name, :method_name, :arguments)
  def initialize(tool_calls:)
    @tool_calls = tool_calls)
  end
  # ...
end

# Message representing the result of a tool call
class ToolCallResult < ChatMessage::Base
  def initialize(tool_call_id:, result:)
    @tool_call_id = tool_call_id
    @result = result
  end
  # ...
end

Since it seems difficult to generalize to_hash, how about performing the conversion for each LLM model as needed?

class LLM::OpenAI
  def chat(params = {})
    if params[:messages].all? { *1.is*a?(Langchain::Messages::ChatMessage) }
      convert_chat_messages!(params)
    end
    # ...
  end

  def onvert_chat_messages!(params)
     params[:messages] = params[:messages].map | message |
        case message
        in ToolCalls(tool_calls:)
          tool_calls.map { tool_call_to_hash(_1) }
        in ToolCallResult(tool_call_id:, result:)
          { role: 'tool', tool_call_id:, content: result }
        else
          message.to_h
        end
    end
  end
end

Although the details need a bit more consideration, I am thinking of a policy to make the Messages structure as common as possible in this way and hide the differences among LLMs within each LLM class. What do you think?

class ChatMessage
attr_reader :role, :content

ROLES = [:system, :user, :assistant].freeze
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These roles would be hidden, right? The user would only use these 3 when creating an instance, right? I think we'd still need one for :tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked that because I hadn't thoroughly investigated LLMs other than OpenAI and Anthropic. In Gemini, it's model instead of assistant, right?
To maximize compatibility between LLMs, I think it's best not to allow the model role and have the LLM adapter layer convert the assistant role.

As mentioned earlier, I believe it's best to handle tool or function with specialized classes since the data they hold is different.
However, since this is an area where the differences in specifications between LLMs are significant, would it be okay to focus on supporting only user, system, and assistant roles in this PR for now?
I believe there will be no functional issues with proceeding incrementally, as the specification still allows receiving the previous Hash Array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants