feat(tools): support Tool calls in the API #1715

mudler · 2024-02-15T17:28:17Z

Description

Part of #1712

Notes for Reviewers

still missing stream support

netlify · 2024-02-15T17:28:31Z

✅ Deploy Preview for localai canceled.

Name	Link
🔨 Latest commit	`d44c8ba`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/65cfe68ed3fc8000080b7915

api/openai/request.go

api/schema/openai.go

Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>

stippi2 · 2024-02-15T18:07:26Z

As for the missing streaming support... Do you think it might be an idea to stream contents as implemented now and then wrap the entire function call output in one big streaming event? In my application, I would have to wait for the function anyway, before I can do anything. (I guess there are other use-cases where you force a JSON response and the chat response is one argument that should also stream. But one big event for the function would already enable my case.)

mudler · 2024-02-15T18:14:59Z

As for the missing streaming support... Do you think it might be an idea to stream contents as implemented now and then wrap the entire function call output in one big streaming event? In my application, I would have to wait for the function anyway, before I can do anything. (I guess there are other use-cases where you force a JSON response and the chat response is one argument that should also stream. But one big event for the function would already enable my case.)

yes, that's exactly what I'm looking at now - at least for the moment we provide compatibility, we can get back at this later on to try to stream the whole result from the LLM directly

api/openai/request.go

stippi2 · 2024-02-15T19:13:28Z

I'm amazed by your quick progress! This is the project I'm working on: https://github.com/stippi/voice-assistant
I've added a section to the README for what to configure in order to use LocalAI.

mudler · 2024-02-15T19:18:37Z

I'm amazed by your quick progress! This is the project I'm working on: https://github.com/stippi/voice-assistant I've added a section to the README for what to configure in order to use LocalAI.

that's super cool! I'd be happy to add it in the community section in the README :)

mudler · 2024-02-15T19:19:27Z

I didn't give it a shot yet at this PR, but I think most of the pieces should be in place by now, if you have some cycles to spend and test this out would be great!

Thanks for the quick feedback @stippi2, that really helps me through, really appreciated!

stippi2 · 2024-02-15T19:23:32Z

Ok, I'll give it a shot. Maybe two questions.

When I reply with a tool result, the payload is also a bit different compared to the older functions. I didn't notice that in the diff yet.
GPT-4 will, depending on the situation, call multiple tools at once and optionally give a chat reply, too. Is that already supported/transported from the local LLMs in LocalAI?

mudler · 2024-02-15T19:28:40Z

Ok, I'll give it a shot. Maybe two questions.

1. When I reply with a tool result, the payload is also a bit different compared to the older functions. I didn't notice that in the diff yet.

This should be covered by dddd67d (non-streaming mode), for streaming mode we just reply with the tools format (as we didn't supported streaming functions before, I would not port deprecated APIs to the new SSE feature)

2. GPT-4 will, depending on the situation, call multiple tools at once _and_ optionally give a chat reply, too. Is that already supported/transported from the local LLMs in LocalAI?

Multiple tools at once is not supported (yet), see #1275, as it requires changes to the BNF grammar and would probably require few rounds of tests first, but replies in case a tool is not selected is supported - however, at the moment I didn't wired it up with streaming responses, but that's easy to get after this bunch of changes.

stippi2 · 2024-02-15T19:32:48Z

This is an example messages array I see in the network tab of the Chrome dev tools:

  {
      "role": "user",
      "content": "Play songs from Peter Fox, please."
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_C8tn5mmpCnlWgaSBvIYXv85p",
          "type": "function",
          "function": {
            "name": "find_artists_and_play_top_songs_on_spotify",
            "arguments": "{\"queries\":[\"Peter Fox\"]}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "name": "find_artists_and_play_top_songs_on_spotify",
      "tool_call_id": "call_C8tn5mmpCnlWgaSBvIYXv85p",
      "content": "{\"result\":\"playback started\"}"
    }

mudler · 2024-02-15T19:34:58Z

tool_call_id

Ok, I'll give it a shot. Maybe two questions.

1. When I reply with a tool result, the payload is also a bit different compared to the older functions. I didn't notice that in the diff yet.

🤦 I didn't read it right. I totally missed up the reply to a tool result. Will try to have a look at it later/tomorrow

Update:

well, thinking again, actually that should be covered already without any changes - just map the "tool" role in the model config. We should pass the name, but for a first implementation should be already working.

stippi2 · 2024-02-15T23:04:41Z

Had a chance to try it out, but it crashes. I attached a debugger at the spot and this is what I see:

stippi2 · 2024-02-15T23:08:09Z

well, thinking again, actually that should be covered already without any changes - just map the "tool" role in the model config. We should pass the name, but for a first implementation should be already working.

To what string should it be mapped?

mudler · 2024-02-16T19:43:07Z

well, thinking again, actually that should be covered already without any changes - just map the "tool" role in the model config. We should pass the name, but for a first implementation should be already working.

To what string should it be mapped?

It would ideally depend on how the model is fine-tuned, if it didn't saw any "tool" or function it might be problematic. The role is used when constructing back the prompt feeded back to the LLM

Had a chance to try it out, but it crashes. I attached a debugger at the spot and this is what I see:

Going to have a look soon

api/openai/chat.go

mudler · 2024-02-16T22:50:52Z

okay, this should be working now at least - I didn't give it a deep look into the API diffs, but I think we are very much closer now

stippi2 · 2024-02-16T22:52:44Z

Very close, but not quite:

mudler · 2024-02-16T22:55:19Z

mm that looks more a model misconfiguration - or somehow the LLM output is not entirely JSON at the end.

Did you tried to set the stopwords on the model?

Example:

stopwords:
- "<dummy32000>"

stippi2 · 2024-02-16T22:56:54Z

This is my config:

name: mistral
mmap: true
parameters:
  model: mistral-7b-openorca.Q6_K.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
template:
  chat_message: chatml
  chat: chatml-block
  completion: completion
context_size: 4096
f16: true
stopwords:
- <|im_end|>
threads: 4

mudler · 2024-02-16T22:57:54Z

This is my config:

name: mistral
mmap: true
parameters:
  model: mistral-7b-openorca.Q6_K.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
template:
  chat_message: chatml
  chat: chatml-block
  completion: completion
context_size: 4096
f16: true
stopwords:
- <|im_end|>
threads: 4

I think you should add the string you see in there in the final string in the output as a stopword, definitely should be not part of the response as we force JSON - but seems it is not entirely JSON in your case

stippi2 · 2024-02-16T22:58:33Z

Will try, thanks.

mudler · 2024-02-16T23:00:16Z

For reference:

name: mistral
mmap: true
parameters:
  model: mistral-7b-openorca.Q6_K.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
template:
  chat_message: chatml
  chat: chatml-block
  completion: completion
context_size: 4096
f16: true
stopwords:
- <|im_end|>
- <dummy32000>
threads: 4

stippi2 · 2024-02-16T23:07:14Z

With that change, the function call works. Still waiting for the chat response after sending the tool result to the model. Note that I didn't add a new role mapping anywhere. Should I?

stippi2 · 2024-02-16T23:11:31Z

Hm. It keeps calling the tool. Are you sure the LLM is being forwarded the tool result? I'll try to add a mapping for the "tool" role, if I can find where it needs to be added.

    {
      "role": "user",
      "content": "Hi. How are you?"
    },
    {
      "content": "I'm doing well, thank you for asking. How can I assist you today?\n\n",
      "role": "assistant"
    },
    {
      "role": "user",
      "content": "Can you tell me the current weather?"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
          "type": "function",
          "function": {
            "name": "get_current_weather",
            "arguments": "{\"latitude\":52.3497672,\"longitude\":13.3008244}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "name": "get_current_weather",
      "tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
      "content": "{\"result\":{\"weather\":\"light intensity shower rain\",\"temperature\":12,\"temperature_feels_like\":12,\"humidity\":93,\"wind_speed\":4.63,\"wind_direction\":270}}"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
          "type": "function",
          "function": {
            "name": "get_current_weather",
            "arguments": "{\"latitude\":52.349785,\"longitude\":13.300821}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "name": "get_current_weather",
      "tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
      "content": "{\"result\":{\"weather\":\"light intensity shower rain\",\"temperature\":12,\"temperature_feels_like\":12,\"humidity\":93,\"wind_speed\":4.63,\"wind_direction\":270}}"
    }

mudler · 2024-02-17T08:53:45Z

   {
      "role": "tool",
      "name": "get_current_weather",
      "tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
      "content": "{\"result\":{\"weather\":\"light intensity shower rain\",\"temperature\":12,\"temperature_feels_like\":12,\"humidity\":93,\"wind_speed\":4.63,\"wind_direction\":270}}"
    },

It does, if you turn off streaming, you should be able to see the prompt being passed by, for instance:

9:49AM DBG Prompt (before templating): What is the weather like in Boston?
test result sent back from the client
9:49AM DBG Prompt (after templating): What is the weather like in Boston?
test result sent back from the client

And the request is:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{                                                                                                   
  "model": "phi-2", 
  "messages": [          
    {              
      "role": "user",                 
      "content": "What is the weather like in Boston?"                                                                                                                                                                             
    },    
   {            
      "role": "tool",      
      "name": "get_current_weather",
      "tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
      "content": "test result sent back from the client"
    }                                                                    
  ],          
  "tools": [         
    {                          
      "type": "function",                      
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },

What you are seeing is the LLM model capacity to do follow-ups, which is a bit limited. For function calling I can tell you need at least models bigger than >30b to perform "good enough". But I can tell by experience if you want something meaningful you need to aim at 70b models or leverage speculative sampling with even bigger ones.

I've summed up some of my experience with it in https://github.com/mudler/LocalAGI - feel free to have a look in there, it basically forces the LLM to reason over the results and it improves results even with smaller models (but still, you need at least a 30b model, 7b won't be enough).

Edit: You can alternatively also play with roles mapping to check how the prompt is formatted back to the LLM. It improves accuracy by a long shot if the model can recognize correctly results. Proper templating helps in that, but still my suggestion above is relevant (use bigger models, 7b won't cut it)

mudler · 2024-02-17T08:59:51Z

I'm going to merge this as it looks good from a functional perspective, LLM/models limitation is out of scope of this PR. I'll add tests as soon as I get a bit more time to test this from master images in my home setup.

stippi2 · 2024-02-17T09:14:26Z

Ok, thanks for the insight, and especially for implementing this so quickly!! Awesome!

….0 by renovate (#18546) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-cublas-cuda11-ffmpeg-core` -> `v2.9.0-cublas-cuda11-ffmpeg-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-cublas-cuda11-core` -> `v2.9.0-cublas-cuda11-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-cublas-cuda12-ffmpeg-core` -> `v2.9.0-cublas-cuda12-ffmpeg-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-cublas-cuda12-core` -> `v2.9.0-cublas-cuda12-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2-ffmpeg-core` -> `v2.9.0-ffmpeg-core` | | [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) | minor | `v2.8.2` -> `v2.9.0` | --- > [!WARNING] > Some dependencies could not be looked up. Check the Dependency Dashboard for more information. --- ### Release Notes <details> <summary>mudler/LocalAI (docker.io/localai/localai)</summary> ### [`v2.9.0`](https://togithub.com/mudler/LocalAI/releases/tag/v2.9.0) [Compare Source](https://togithub.com/mudler/LocalAI/compare/v2.8.2...v2.9.0) This release brings many enhancements, fixes, and a special thanks to the community for the amazing work and contributions! We now have sycl images for Intel GPUs, ROCm images for AMD GPUs,and much more: - You can find the AMD GPU images tags between the container images available - look for `hipblas`. For example, [master-hipblas-ffmpeg-core](https://quay.io/repository/go-skynet/local-ai/tag/master-hipblas-ffmpeg-core). Thanks to [@fenfir](https://togithub.com/fenfir) for this nice contribution! - Intel GPU images are tagged with `sycl`. You can find images with two flavors, sycl-f16 and sycl-f32 respectively. For example, [master-sycl-f16](https://quay.io/repository/go-skynet/local-ai/tag/master-sycl-f16-core). Work is in progress to support also diffusers and transformers on Intel GPUs. - Thanks to [@christ66](https://togithub.com/christ66) first efforts in supporting the Assistant API were made, and we are planning to support the Assistant API! Stay tuned for more! - Now LocalAI supports the Tools API endpoint - it also supports the (now deprecated) functions API call as usual. We now also have support for SSE with function calling. See [mudler/LocalAI#1726 for more - Support for Gemma models - did you hear? Google released OSS models and LocalAI supports it already! - Thanks to [@dave-gray101](https://togithub.com/dave-gray101) in [mudler/LocalAI#1728 to put efforts in refactoring parts of the code - we are going to support soon more ways to interface with LocalAI, and not only restful api! ##### Support the project First off, a massive thank you to each and every one of you who've chipped in to squash bugs and suggest cool new features for LocalAI. Your help, kind words, and brilliant ideas are truly appreciated - more than words can say! And to those of you who've been heros, giving up your own time to help out fellow users on Discord and in our repo, you're absolutely amazing. We couldn't have asked for a better community. Just so you know, LocalAI doesn't have the luxury of big corporate sponsors behind it. It's all us, folks. So, if you've found value in what we're building together and want to keep the momentum going, consider showing your support. A little shoutout on your favorite social platforms using [@LocalAI_OSS](https://twitter.com/LocalAI_API) and [@mudler_it](https://twitter.com/mudler_it) or joining our sponsorship program can make a big difference. Also, if you haven't yet joined our Discord, come on over! Here's the link: https://discord.gg/uJAeKSAGDy Every bit of support, every mention, and every star adds up and helps us keep this ship sailing. Let's keep making LocalAI awesome together! Thanks a ton, and here's to more exciting times ahead with LocalAI! 🚀 ##### What's Changed ##### Bug fixes 🐛 - Add TTS dependency for cuda based builds fixes [#1727](https://togithub.com/mudler/LocalAI/issues/1727) by [@blob42](https://togithub.com/blob42) in [mudler/LocalAI#1730 ##### Exciting New Features 🎉 - Build docker container for ROCm by [@fenfir](https://togithub.com/fenfir) in [mudler/LocalAI#1595 - feat(tools): support Tool calls in the API by [@mudler](https://togithub.com/mudler) in [mudler/LocalAI#1715 - Initial implementation of upload files api. by [@christ66](https://togithub.com/christ66) in [mudler/LocalAI#1703 - feat(tools): Parallel function calling by [@mudler](https://togithub.com/mudler) in [mudler/LocalAI#1726 - refactor: move part of api packages to core by [@dave-gray101](https://togithub.com/dave-gray101) in [mudler/LocalAI#1728 - deps(llama.cpp): update, support Gemma models by [@mudler](https://togithub.com/mudler) in [mudler/LocalAI#1734 ##### 👒 Dependencies - deps(llama.cpp): update by [@mudler](https://togithub.com/mudler) in [mudler/LocalAI#1714 - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [mudler/LocalAI#1740 ##### Other Changes - ⬆️ Update docs version mudler/LocalAI by [@localai-bot](https://togithub.com/localai-bot) in [mudler/LocalAI#1718 - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [mudler/LocalAI#1705 - Update README.md by [@lunamidori5](https://togithub.com/lunamidori5) in [mudler/LocalAI#1739 - ⬆️ Update ggerganov/llama.cpp by [@localai-bot](https://togithub.com/localai-bot) in [mudler/LocalAI#1750 ##### New Contributors - [@fenfir](https://togithub.com/fenfir) made their first contribution in [mudler/LocalAI#1595 - [@christ66](https://togithub.com/christ66) made their first contribution in [mudler/LocalAI#1703 - [@blob42](https://togithub.com/blob42) made their first contribution in [mudler/LocalAI#1730 **Full Changelog**: mudler/LocalAI@v2.8.2...v2.9.0 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 10pm on monday" in timezone Europe/Amsterdam, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).

mudler mentioned this pull request Feb 15, 2024

Add transparent conversion for "tools" to "functions" in v1/chat/completions endpoint #1712

Closed

mudler added the enhancement New feature or request label Feb 15, 2024

stippi2 reviewed Feb 15, 2024

View reviewed changes

api/openai/request.go Outdated Show resolved Hide resolved

api/schema/openai.go Outdated Show resolved Hide resolved

feat(tools): support Tools in the API

03f802e

Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>

mudler force-pushed the tools_api_support branch from 9024c5f to 03f802e Compare February 15, 2024 18:21

stippi2 reviewed Feb 15, 2024

View reviewed changes

api/openai/request.go Outdated Show resolved Hide resolved

feat(tools): support function streaming

ccf5faf

Adhere to new return types when using tools instead of functions

dddd67d

Keep backward compatibility with function calling

496374a

mudler changed the title ~~feat(tools): support Tools in the API~~ feat(tools): support Tool calls in the API Feb 15, 2024

mudler mentioned this pull request Feb 15, 2024

tools: reply with text when no tool is selected in SSE mode #1716

Closed

mudler added the area/tools label Feb 15, 2024

Evaluate function names in chat templates

2510451

mudler force-pushed the tools_api_support branch from aaf2758 to 2510451 Compare February 16, 2024 20:43

mudler added 2 commits February 16, 2024 23:18

Disable recovery with --debug

f2e803c

Correctly stream out the entire result

c6d026e

stippi2 reviewed Feb 16, 2024

View reviewed changes

api/openai/chat.go Outdated Show resolved Hide resolved

mudler added 2 commits February 16, 2024 23:46

Detect when llm chooses to reply and to not perform any action in SSE

dba9094

Feedback from code review

d44c8ba

mudler merged commit c72808f into master Feb 17, 2024
27 checks passed

mudler deleted the tools_api_support branch February 17, 2024 09:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tools): support Tool calls in the API #1715

feat(tools): support Tool calls in the API #1715

mudler commented Feb 15, 2024

netlify bot commented Feb 15, 2024 •

edited

Loading

stippi2 commented Feb 15, 2024

mudler commented Feb 15, 2024

stippi2 commented Feb 15, 2024

mudler commented Feb 15, 2024

mudler commented Feb 15, 2024 •

edited

Loading

stippi2 commented Feb 15, 2024

mudler commented Feb 15, 2024 •

edited

Loading

stippi2 commented Feb 15, 2024

mudler commented Feb 15, 2024 •

edited

Loading

stippi2 commented Feb 15, 2024

stippi2 commented Feb 15, 2024

mudler commented Feb 16, 2024 •

edited

Loading

mudler commented Feb 16, 2024

stippi2 commented Feb 16, 2024

mudler commented Feb 16, 2024 •

edited

Loading

stippi2 commented Feb 16, 2024

mudler commented Feb 16, 2024

stippi2 commented Feb 16, 2024

mudler commented Feb 16, 2024 •

edited

Loading

stippi2 commented Feb 16, 2024

stippi2 commented Feb 16, 2024 •

edited

Loading

mudler commented Feb 17, 2024 •

edited

Loading

mudler commented Feb 17, 2024 •

edited

Loading

stippi2 commented Feb 17, 2024

feat(tools): support Tool calls in the API #1715

feat(tools): support Tool calls in the API #1715

Conversation

mudler commented Feb 15, 2024

netlify bot commented Feb 15, 2024 • edited Loading

✅ Deploy Preview for localai canceled.

stippi2 commented Feb 15, 2024

mudler commented Feb 15, 2024

stippi2 commented Feb 15, 2024

mudler commented Feb 15, 2024

mudler commented Feb 15, 2024 • edited Loading

stippi2 commented Feb 15, 2024

mudler commented Feb 15, 2024 • edited Loading

stippi2 commented Feb 15, 2024

mudler commented Feb 15, 2024 • edited Loading

stippi2 commented Feb 15, 2024

stippi2 commented Feb 15, 2024

mudler commented Feb 16, 2024 • edited Loading

mudler commented Feb 16, 2024

stippi2 commented Feb 16, 2024

mudler commented Feb 16, 2024 • edited Loading

stippi2 commented Feb 16, 2024

mudler commented Feb 16, 2024

stippi2 commented Feb 16, 2024

mudler commented Feb 16, 2024 • edited Loading

stippi2 commented Feb 16, 2024

stippi2 commented Feb 16, 2024 • edited Loading

mudler commented Feb 17, 2024 • edited Loading

mudler commented Feb 17, 2024 • edited Loading

stippi2 commented Feb 17, 2024

netlify bot commented Feb 15, 2024 •

edited

Loading

mudler commented Feb 15, 2024 •

edited

Loading

mudler commented Feb 15, 2024 •

edited

Loading

mudler commented Feb 15, 2024 •

edited

Loading

mudler commented Feb 16, 2024 •

edited

Loading

mudler commented Feb 16, 2024 •

edited

Loading

mudler commented Feb 16, 2024 •

edited

Loading

stippi2 commented Feb 16, 2024 •

edited

Loading

mudler commented Feb 17, 2024 •

edited

Loading

mudler commented Feb 17, 2024 •

edited

Loading