Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tools): support Tool calls in the API #1715

Merged
merged 9 commits into from
Feb 17, 2024
Merged

feat(tools): support Tool calls in the API #1715

merged 9 commits into from
Feb 17, 2024

Conversation

mudler
Copy link
Owner

@mudler mudler commented Feb 15, 2024

Description

Part of #1712

Notes for Reviewers

still missing stream support

Copy link

netlify bot commented Feb 15, 2024

Deploy Preview for localai canceled.

Name Link
🔨 Latest commit d44c8ba
🔍 Latest deploy log https://app.netlify.com/sites/localai/deploys/65cfe68ed3fc8000080b7915

api/openai/request.go Outdated Show resolved Hide resolved
api/schema/openai.go Outdated Show resolved Hide resolved
Co-authored-by: =?UTF-8?q?Stephan=20A=C3=9Fmus?= <stephan.assmus@sap.com>
@stippi2
Copy link
Contributor

stippi2 commented Feb 15, 2024

As for the missing streaming support... Do you think it might be an idea to stream contents as implemented now and then wrap the entire function call output in one big streaming event? In my application, I would have to wait for the function anyway, before I can do anything. (I guess there are other use-cases where you force a JSON response and the chat response is one argument that should also stream. But one big event for the function would already enable my case.)

@mudler
Copy link
Owner Author

mudler commented Feb 15, 2024

As for the missing streaming support... Do you think it might be an idea to stream contents as implemented now and then wrap the entire function call output in one big streaming event? In my application, I would have to wait for the function anyway, before I can do anything. (I guess there are other use-cases where you force a JSON response and the chat response is one argument that should also stream. But one big event for the function would already enable my case.)

yes, that's exactly what I'm looking at now - at least for the moment we provide compatibility, we can get back at this later on to try to stream the whole result from the LLM directly

api/openai/request.go Outdated Show resolved Hide resolved
@stippi2
Copy link
Contributor

stippi2 commented Feb 15, 2024

I'm amazed by your quick progress! This is the project I'm working on: https://github.com/stippi/voice-assistant
I've added a section to the README for what to configure in order to use LocalAI.

@mudler
Copy link
Owner Author

mudler commented Feb 15, 2024

I'm amazed by your quick progress! This is the project I'm working on: https://github.com/stippi/voice-assistant I've added a section to the README for what to configure in order to use LocalAI.

that's super cool! I'd be happy to add it in the community section in the README :)

@mudler
Copy link
Owner Author

mudler commented Feb 15, 2024

I didn't give it a shot yet at this PR, but I think most of the pieces should be in place by now, if you have some cycles to spend and test this out would be great!

Thanks for the quick feedback @stippi2, that really helps me through, really appreciated!

@mudler mudler changed the title feat(tools): support Tools in the API feat(tools): support Tool calls in the API Feb 15, 2024
@stippi2
Copy link
Contributor

stippi2 commented Feb 15, 2024

Ok, I'll give it a shot. Maybe two questions.

  1. When I reply with a tool result, the payload is also a bit different compared to the older functions. I didn't notice that in the diff yet.
  2. GPT-4 will, depending on the situation, call multiple tools at once and optionally give a chat reply, too. Is that already supported/transported from the local LLMs in LocalAI?

@mudler
Copy link
Owner Author

mudler commented Feb 15, 2024

Ok, I'll give it a shot. Maybe two questions.

1. When I reply with a tool result, the payload is also a bit different compared to the older functions. I didn't notice that in the diff yet.

This should be covered by dddd67d (non-streaming mode), for streaming mode we just reply with the tools format (as we didn't supported streaming functions before, I would not port deprecated APIs to the new SSE feature)

2. GPT-4 will, depending on the situation, call multiple tools at once _and_ optionally give a chat reply, too. Is that already supported/transported from the local LLMs in LocalAI?

Multiple tools at once is not supported (yet), see #1275, as it requires changes to the BNF grammar and would probably require few rounds of tests first, but replies in case a tool is not selected is supported - however, at the moment I didn't wired it up with streaming responses, but that's easy to get after this bunch of changes.

@stippi2
Copy link
Contributor

stippi2 commented Feb 15, 2024

This is an example messages array I see in the network tab of the Chrome dev tools:

  {
      "role": "user",
      "content": "Play songs from Peter Fox, please."
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_C8tn5mmpCnlWgaSBvIYXv85p",
          "type": "function",
          "function": {
            "name": "find_artists_and_play_top_songs_on_spotify",
            "arguments": "{\"queries\":[\"Peter Fox\"]}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "name": "find_artists_and_play_top_songs_on_spotify",
      "tool_call_id": "call_C8tn5mmpCnlWgaSBvIYXv85p",
      "content": "{\"result\":\"playback started\"}"
    }

@mudler
Copy link
Owner Author

mudler commented Feb 15, 2024

tool_call_id

Ok, I'll give it a shot. Maybe two questions.

1. When I reply with a tool result, the payload is also a bit different compared to the older functions. I didn't notice that in the diff yet.

🤦 I didn't read it right. I totally missed up the reply to a tool result. Will try to have a look at it later/tomorrow

Update:

well, thinking again, actually that should be covered already without any changes - just map the "tool" role in the model config. We should pass the name, but for a first implementation should be already working.

@stippi2
Copy link
Contributor

stippi2 commented Feb 15, 2024

Had a chance to try it out, but it crashes. I attached a debugger at the spot and this is what I see:

image

@stippi2
Copy link
Contributor

stippi2 commented Feb 15, 2024

well, thinking again, actually that should be covered already without any changes - just map the "tool" role in the model config. We should pass the name, but for a first implementation should be already working.

To what string should it be mapped?

@mudler
Copy link
Owner Author

mudler commented Feb 16, 2024

well, thinking again, actually that should be covered already without any changes - just map the "tool" role in the model config. We should pass the name, but for a first implementation should be already working.

To what string should it be mapped?

It would ideally depend on how the model is fine-tuned, if it didn't saw any "tool" or function it might be problematic. The role is used when constructing back the prompt feeded back to the LLM

Had a chance to try it out, but it crashes. I attached a debugger at the spot and this is what I see:
image

Going to have a look soon

api/openai/chat.go Outdated Show resolved Hide resolved
@mudler
Copy link
Owner Author

mudler commented Feb 16, 2024

okay, this should be working now at least - I didn't give it a deep look into the API diffs, but I think we are very much closer now

@stippi2
Copy link
Contributor

stippi2 commented Feb 16, 2024

Very close, but not quite:
not-valid-json

@mudler
Copy link
Owner Author

mudler commented Feb 16, 2024

mm that looks more a model misconfiguration - or somehow the LLM output is not entirely JSON at the end.

Did you tried to set the stopwords on the model?

Example:

stopwords:
- "<dummy32000>"

@stippi2
Copy link
Contributor

stippi2 commented Feb 16, 2024

This is my config:

name: mistral
mmap: true
parameters:
  model: mistral-7b-openorca.Q6_K.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
template:
  chat_message: chatml
  chat: chatml-block
  completion: completion
context_size: 4096
f16: true
stopwords:
- <|im_end|>
threads: 4

@mudler
Copy link
Owner Author

mudler commented Feb 16, 2024

This is my config:

name: mistral
mmap: true
parameters:
  model: mistral-7b-openorca.Q6_K.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
template:
  chat_message: chatml
  chat: chatml-block
  completion: completion
context_size: 4096
f16: true
stopwords:
- <|im_end|>
threads: 4

I think you should add the string you see in there in the final string in the output as a stopword, definitely should be not part of the response as we force JSON - but seems it is not entirely JSON in your case

@stippi2
Copy link
Contributor

stippi2 commented Feb 16, 2024

Will try, thanks.

@mudler
Copy link
Owner Author

mudler commented Feb 16, 2024

For reference:

name: mistral
mmap: true
parameters:
  model: mistral-7b-openorca.Q6_K.gguf
  temperature: 0.2
  top_k: 40
  top_p: 0.95
template:
  chat_message: chatml
  chat: chatml-block
  completion: completion
context_size: 4096
f16: true
stopwords:
- <|im_end|>
- <dummy32000>
threads: 4

@stippi2
Copy link
Contributor

stippi2 commented Feb 16, 2024

With that change, the function call works. Still waiting for the chat response after sending the tool result to the model. Note that I didn't add a new role mapping anywhere. Should I?

@stippi2
Copy link
Contributor

stippi2 commented Feb 16, 2024

Hm. It keeps calling the tool. Are you sure the LLM is being forwarded the tool result? I'll try to add a mapping for the "tool" role, if I can find where it needs to be added.

    {
      "role": "user",
      "content": "Hi. How are you?"
    },
    {
      "content": "I'm doing well, thank you for asking. How can I assist you today?\n\n",
      "role": "assistant"
    },
    {
      "role": "user",
      "content": "Can you tell me the current weather?"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
          "type": "function",
          "function": {
            "name": "get_current_weather",
            "arguments": "{\"latitude\":52.3497672,\"longitude\":13.3008244}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "name": "get_current_weather",
      "tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
      "content": "{\"result\":{\"weather\":\"light intensity shower rain\",\"temperature\":12,\"temperature_feels_like\":12,\"humidity\":93,\"wind_speed\":4.63,\"wind_direction\":270}}"
    },
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
          "type": "function",
          "function": {
            "name": "get_current_weather",
            "arguments": "{\"latitude\":52.349785,\"longitude\":13.300821}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "name": "get_current_weather",
      "tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
      "content": "{\"result\":{\"weather\":\"light intensity shower rain\",\"temperature\":12,\"temperature_feels_like\":12,\"humidity\":93,\"wind_speed\":4.63,\"wind_direction\":270}}"
    }

@mudler
Copy link
Owner Author

mudler commented Feb 17, 2024

   {
      "role": "tool",
      "name": "get_current_weather",
      "tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
      "content": "{\"result\":{\"weather\":\"light intensity shower rain\",\"temperature\":12,\"temperature_feels_like\":12,\"humidity\":93,\"wind_speed\":4.63,\"wind_direction\":270}}"
    },

It does, if you turn off streaming, you should be able to see the prompt being passed by, for instance:

9:49AM DBG Prompt (before templating): What is the weather like in Boston?
test result sent back from the client
9:49AM DBG Prompt (after templating): What is the weather like in Boston?
test result sent back from the client

And the request is:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{                                                                                                   
  "model": "phi-2", 
  "messages": [          
    {              
      "role": "user",                 
      "content": "What is the weather like in Boston?"                                                                                                                                                                             
    },    
   {            
      "role": "tool",      
      "name": "get_current_weather",
      "tool_call_id": "9618f662-741e-49b3-8a92-87b6d2b4567e",
      "content": "test result sent back from the client"
    }                                                                    
  ],          
  "tools": [         
    {                          
      "type": "function",                      
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },       

What you are seeing is the LLM model capacity to do follow-ups, which is a bit limited. For function calling I can tell you need at least models bigger than >30b to perform "good enough". But I can tell by experience if you want something meaningful you need to aim at 70b models or leverage speculative sampling with even bigger ones.

I've summed up some of my experience with it in https://github.com/mudler/LocalAGI - feel free to have a look in there, it basically forces the LLM to reason over the results and it improves results even with smaller models (but still, you need at least a 30b model, 7b won't be enough).

Edit: You can alternatively also play with roles mapping to check how the prompt is formatted back to the LLM. It improves accuracy by a long shot if the model can recognize correctly results. Proper templating helps in that, but still my suggestion above is relevant (use bigger models, 7b won't cut it)

@mudler
Copy link
Owner Author

mudler commented Feb 17, 2024

I'm going to merge this as it looks good from a functional perspective, LLM/models limitation is out of scope of this PR. I'll add tests as soon as I get a bit more time to test this from master images in my home setup.

@mudler mudler merged commit c72808f into master Feb 17, 2024
27 checks passed
@mudler mudler deleted the tools_api_support branch February 17, 2024 09:00
@stippi2
Copy link
Contributor

stippi2 commented Feb 17, 2024

Ok, thanks for the insight, and especially for implementing this so quickly!! Awesome!

truecharts-admin added a commit to truecharts/charts that referenced this pull request Feb 24, 2024
….0 by renovate (#18546)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.8.2-cublas-cuda11-ffmpeg-core` ->
`v2.9.0-cublas-cuda11-ffmpeg-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.8.2-cublas-cuda11-core` -> `v2.9.0-cublas-cuda11-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.8.2-cublas-cuda12-ffmpeg-core` ->
`v2.9.0-cublas-cuda12-ffmpeg-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.8.2-cublas-cuda12-core` -> `v2.9.0-cublas-cuda12-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.8.2-ffmpeg-core` -> `v2.9.0-ffmpeg-core` |
| [docker.io/localai/localai](https://togithub.com/mudler/LocalAI) |
minor | `v2.8.2` -> `v2.9.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>mudler/LocalAI (docker.io/localai/localai)</summary>

### [`v2.9.0`](https://togithub.com/mudler/LocalAI/releases/tag/v2.9.0)

[Compare
Source](https://togithub.com/mudler/LocalAI/compare/v2.8.2...v2.9.0)

This release brings many enhancements, fixes, and a special thanks to
the community for the amazing work and contributions!

We now have sycl images for Intel GPUs, ROCm images for AMD GPUs,and
much more:

- You can find the AMD GPU images tags between the container images
available - look for `hipblas`. For example,
[master-hipblas-ffmpeg-core](https://quay.io/repository/go-skynet/local-ai/tag/master-hipblas-ffmpeg-core).
Thanks to [@&#8203;fenfir](https://togithub.com/fenfir) for this nice
contribution!
- Intel GPU images are tagged with `sycl`. You can find images with two
flavors, sycl-f16 and sycl-f32 respectively. For example,
[master-sycl-f16](https://quay.io/repository/go-skynet/local-ai/tag/master-sycl-f16-core).
Work is in progress to support also diffusers and transformers on Intel
GPUs.
- Thanks to [@&#8203;christ66](https://togithub.com/christ66) first
efforts in supporting the Assistant API were made, and we are planning
to support the Assistant API! Stay tuned for more!
- Now LocalAI supports the Tools API endpoint - it also supports the
(now deprecated) functions API call as usual. We now also have support
for SSE with function calling. See
[mudler/LocalAI#1726
for more
- Support for Gemma models - did you hear? Google released OSS models
and LocalAI supports it already!
- Thanks to [@&#8203;dave-gray101](https://togithub.com/dave-gray101) in
[mudler/LocalAI#1728
to put efforts in refactoring parts of the code - we are going to
support soon more ways to interface with LocalAI, and not only restful
api!

##### Support the project

First off, a massive thank you to each and every one of you who've
chipped in to squash bugs and suggest cool new features for LocalAI.
Your help, kind words, and brilliant ideas are truly appreciated - more
than words can say!

And to those of you who've been heros, giving up your own time to help
out fellow users on Discord and in our repo, you're absolutely amazing.
We couldn't have asked for a better community.

Just so you know, LocalAI doesn't have the luxury of big corporate
sponsors behind it. It's all us, folks. So, if you've found value in
what we're building together and want to keep the momentum going,
consider showing your support. A little shoutout on your favorite social
platforms using [@&#8203;LocalAI_OSS](https://twitter.com/LocalAI_API)
and [@&#8203;mudler_it](https://twitter.com/mudler_it) or joining our
sponsorship program can make a big difference.

Also, if you haven't yet joined our Discord, come on over! Here's the
link: https://discord.gg/uJAeKSAGDy

Every bit of support, every mention, and every star adds up and helps us
keep this ship sailing. Let's keep making LocalAI awesome together!

Thanks a ton, and here's to more exciting times ahead with LocalAI! 🚀

##### What's Changed

##### Bug fixes 🐛

- Add TTS dependency for cuda based builds fixes
[#&#8203;1727](https://togithub.com/mudler/LocalAI/issues/1727) by
[@&#8203;blob42](https://togithub.com/blob42) in
[mudler/LocalAI#1730

##### Exciting New Features 🎉

- Build docker container for ROCm by
[@&#8203;fenfir](https://togithub.com/fenfir) in
[mudler/LocalAI#1595
- feat(tools): support Tool calls in the API by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1715
- Initial implementation of upload files api. by
[@&#8203;christ66](https://togithub.com/christ66) in
[mudler/LocalAI#1703
- feat(tools): Parallel function calling by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1726
- refactor: move part of api packages to core by
[@&#8203;dave-gray101](https://togithub.com/dave-gray101) in
[mudler/LocalAI#1728
- deps(llama.cpp): update, support Gemma models by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1734

##### 👒 Dependencies

- deps(llama.cpp): update by
[@&#8203;mudler](https://togithub.com/mudler) in
[mudler/LocalAI#1714
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1740

##### Other Changes

- ⬆️ Update docs version mudler/LocalAI by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1718
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1705
- Update README.md by
[@&#8203;lunamidori5](https://togithub.com/lunamidori5) in
[mudler/LocalAI#1739
- ⬆️ Update ggerganov/llama.cpp by
[@&#8203;localai-bot](https://togithub.com/localai-bot) in
[mudler/LocalAI#1750

##### New Contributors

- [@&#8203;fenfir](https://togithub.com/fenfir) made their first
contribution in
[mudler/LocalAI#1595
- [@&#8203;christ66](https://togithub.com/christ66) made their first
contribution in
[mudler/LocalAI#1703
- [@&#8203;blob42](https://togithub.com/blob42) made their first
contribution in
[mudler/LocalAI#1730

**Full Changelog**:
mudler/LocalAI@v2.8.2...v2.9.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 10pm on monday" in timezone
Europe/Amsterdam, Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these
updates again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://togithub.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yMTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjIxMy4wIiwidGFyZ2V0QnJhbmNoIjoibWFzdGVyIn0=-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tools enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants