Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server: add support for "tool_calls" (MeetKai/functionary model) #5695

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Feb 23, 2024

Support "tool_calls" OAI-compatible via MeetKai/functionary model

Motivation

Following my research on !5588 , I tried to implement ability to use https://github.com/MeetKai/functionary

The idea is that user can use the same OAI "tool_calls" included in /v1/chat/completions to interact with the model. There will be a translation layer to convert OAI schema <==> prompt.

Implementation

My implementation is self-contained inside functionary.hpp, with a simple functionary-test.cpp which allow me to test it without make server.

The current call stack looks like this (without tool_calls):

  • oaicompat_completion_params_parse
  • llama.request_completion
  • llama.queue_results.recv
  • format_final_response_oaicompat
  • res.set_content <== the final response is sent back to user

With tool_calls enabled:

  • oaicompat_completion_params_parse
  • 🔴 convert_oai_to_prompt <== format the request into functionary's template
  • llama.request_completion
  • llama.queue_results.recv
  • format_final_response_oaicompat
  • 🔴 convert_response_to_oai_choices <== extract tool calls and assistant message
  • res.set_content <== the final response is sent back to user

Upon loading the model, the template stored inside model is read, and if it's functionary's template, the tool_calls will be enable automatically. No additional config is required.

For the demo, see in the comment section

Testing

For now, I have no idea how to test it on CI. These changes are needed:

  • Compile functionary-test.cpp in CI
  • For E2E test, we need to find a way so that server (or only model part) responses with pre-defined data, since we don't want to download the whole 4GB model inside the CI.

@ngxson ngxson linked an issue Feb 23, 2024 that may be closed by this pull request
3 tasks
@ngxson
Copy link
Collaborator Author

ngxson commented Feb 24, 2024

Demo

GGUF model is downloaded from this link: https://huggingface.co/meetkai/functionary-small-v2.2-GGUF/tree/main

I'm using functionary-small-v2.2.q4_0.gguf in this demo.

Turn 1: User asks and assistant wants to call a tool

Click to see
{
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the weather like in Paris and Lyon?"}
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "get the weather of a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type": "string",
                            "description": "name of city to get weather"
                        }
                    }
                }
            }
        }
    ]
}

Response:

{
    "choices": [
        {
            "finish_reason": "tool_calls",
            "index": 0,
            "message": {
                "content": null,
                "role": "assistant",
                "tool_calls": [
                    {
                        "function": {
                            "arguments": "{\"city\": \"Paris\"}",
                            "name": "get_weather"
                        },
                        "id": "get_weather",
                        "type": "function"
                    },
                    {
                        "function": {
                            "arguments": "{\"city\": \"Lyon\"}",
                            "name": "get_weather"
                        },
                        "id": "get_weather",
                        "type": "function"
                    }
                ]
            }
        }
    ],
    "created": 1708774160,
    "id": "chatcmpl-sh6ACjjtwn4s3xZlHCmmrVwN5bgIMXZf",
    "model": "unknown",
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 34,
        "prompt_tokens": 106,
        "total_tokens": 140
    }
}

Turn 2: Function is called and return data to assistant

Click to see
{
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the weather like in Paris and Lyon?"},
        {
            "role": "tool",
            "tool_call_id": "get_weather",
            "name": "get_weather",
            "content": "{\"weather\": \"Cloudy, 6°C\"}"
        },
        {
            "role": "tool",
            "tool_call_id": "get_weather",
            "name": "get_weather",
            "content": "{\"weather\": \"Sunny, 12°C\"}"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "get the weather of a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "city": {
                            "type": "string",
                            "description": "name of city to get weather"
                        }
                    }
                }
            }
        }
    ]
}

Response:

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "The weather in Paris is cloudy with a temperature of 6°C, while the weather in Lyon is sunny with a temperature of 12°C.",
                "role": "assistant"
            }
        }
    ],
    "created": 1708774354,
    "id": "chatcmpl-QgvImRLqsP9ETs3LxJ0f1jkzeCC8B6Cn",
    "model": "unknown",
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 39,
        "prompt_tokens": 155,
        "total_tokens": 194
    }
}

The final conversation should looks like this:

  • User: What is the weather like in Paris and Lyon?
  • Assistant: Let me call get_weather({city: "Paris"})
  • Assistant: Let me call get_weather({city: "Lyon"})
  • Function: Paris: Cloudy, 6°C
  • Function: Lyon: Sunny, 12°C
  • Assistant: The weather in Paris is cloudy with a temperature of 6°C, while the weather in Lyon is sunny with a temperature of 12°C.

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 24, 2024

@ggerganov @phymbert Could you take a bit of time to give me some inputs regarding testing part? Thanks in advance!

@ggerganov
Copy link
Owner

This is an interesting application, but keep in mind I consider it low priority to merge in the short term as it adds even more functionality to server. I think server is in a bad state atm and needs significant rework to make the existing functionality stable (#4216). With yours, @phymbert's and other recent contributions, I'm much more optimistic that we'll be able to fix all issues and improve the implementation. Once this is done we can focus on adding more stuff such as this PR and also improving vision support

@ngxson
Copy link
Collaborator Author

ngxson commented Feb 24, 2024

Thanks for the info. As I'm not expecting it to be merged very soon, the my work is already been done in a self-contained manner to prevent having conflicts with future reworks on server side.

I'll keep this PR in draft state though, as some parts are still missing. Will come back to this when the server code become more stable.

@545089467
Copy link

This is a super wonderful feature, exactly what I'm looking for! With no tool using, many other useful features would not be possible. Hoping the feature can merge soon!

if (enable_tool_calls) {
choices = llama_functionary::convert_response_to_oai_choices(content);
} else {
choices = streaming
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the streaming mode not supported for tools_call ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it does not, because convert_response_to_oai_choices can only parse a full-constructed response

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code to throw an error with streaming mode is not yet implemented, but below I leave a // TODO: "enable_tool_calls" cannot be used with "stream" mode

using json = nlohmann::json;

/**
* A simple test program that allow testing functionary.hpp without using server.
Copy link
Collaborator

@phymbert phymbert Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is a simple unit test, better to go with the ctest approach as in the root repo.

Copy link
Collaborator Author

@ngxson ngxson Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, it should be a ctest. Problem is that the ctest in root CMakeLists is for core library, but not for the examples.

I believe that I'll need to convert this file to ctest anyway, maybe the ctest will run along with behave, I'll see what's the best approach when I have time to continue working with this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the ci server workflow, we can plug a ctest target only for server. I will push you an example on a side branch

@jmtatsch
Copy link

More and more function calling capable models becoming available:
https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF
https://huggingface.co/cognitivecomputations/fc-dolphin-2.6-mistral-7b-dpo-laser
really looking forward to having this integrated in llama.cpp server

@ngxson ngxson added the demo Demonstrate some concept or idea, not intended to be merged label Mar 16, 2024
@adrianliechti
Copy link

Since Hermes 2 Pro has chosen OpenAI compatible schemas for their training, an implementation on top of llama.cpp was pretty easy:

  • convert or insert a system prompt with their template format (if there are any tools)
  • when reading the response from the model, convert any <tool_call> content and map to the openai api tool calls
  • convert incoming messages of role "tool" to their expected format <tool_response>

https://github.com/adrianliechti/llama/blob/main/pkg/adapter/hermesfn/adapter.go

@vladfaust
Copy link

I understand this might not be the most appropriate forum for my observation, but while researching llama.cpp, I noticed that the server component of this repository seems to receive a lot of attention. It appears to me that this might be breaking the principle of isolation of concerns. Would @ggerganov consider extracting the server component to a separate repository? If this issue has already been raised, could you please direct me to it? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo Demonstrate some concept or idea, not intended to be merged server/webui
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Server: add function calling API
7 participants