Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Berkeley Function-Calling Leaderboard #772

Open
1 task
irthomasthomas opened this issue Mar 16, 2024 · 1 comment
Open
1 task

Berkeley Function-Calling Leaderboard #772

irthomasthomas opened this issue Mar 16, 2024 · 1 comment
Labels
ai-leaderboards leaderdoards for llm's and other ml models llm-function-calling Function Calling with Large Language Models openai OpenAI APIs, LLMs, Recipes and Evals

Comments

@irthomasthomas
Copy link
Owner

Berkeley Function-Calling Leaderboard

Description

This live leaderboard evaluates the LLM's ability to call functions (aka tools) accurately. This leaderboard consists of real-world data and will be updated periodically. For more information on the evaluation dataset and methodology, please refer to our blog post and code release.

Leaderboard

Rank 🔼 Overall Acc Model Organization License AST Summary Exec Summary Relevance
1 84.28 GPT-4-1106-Preview OpenAI Proprietary 86.06 65.53 88.75
2 84.16 GPT-4-0125-Preview OpenAI Proprietary 85.61 67.24 87.50
3 84.16 Gorilla-OpenFunctions-v2 Gorilla LLM Apache 2.0 84.33 72.72 71.67
4 83.67 Claude-3-Opus-20240229 Anthropic Proprietary 79.82 73.73 84.58
5 81.75 Mistral-medium-2312 Mistral AI Proprietary 78.67 66.93 90.00
6 80.30 Claude-3-Sonnet-20240229 Anthropic Proprietary 84.91 76.15 41.25
7 80.30 GPT-3.5-Turbo-0125 OpenAI Proprietary 81.55 69.43 68.33
8 79.07 Functionary-Medium-v2.2 MeetKai N/A 82.25 61.97 61.97
9 77.41 Claude-2.1 Anthropic Proprietary 76.53 53.93 78.33
10 61.75 Mistral-tiny-2312 Mistral AI Proprietary 55.28 53.42 77.08
11 61.02 Claude-instant-1.2 Anthropic Proprietary 57.06 49.88 61.67
12 56.87 Mistral-small-2312 Mistral AI Proprietary 57.01 36.18 89.58
13 56.81 Mistral-large-2402 Mistral AI Proprietary 40.58 38.49 84.58
14 55.90 Nexusflow-Raven-v2 Nexusflow Apache 2.0 58.01 63.67 0.00
15 55.87 Firefunction-v1 Fireworks-ai Apache 2.0 40.05 37.31 81.25
16 55.68 Gemini-1.0-Pro Google Proprietary 42.18 29.30 78.30
17 54.52 GPT-4-0613 OpenAI Proprietary 40.14 27.12 87.08
18 45.96 Deepseek-v1.5 Deepseek Deepseek License 48.59 8.55 66.25
19 44.40 Gemma-7B-IT Google gemma-term-of-use 48.61 40.43 0.42
20 33.37 Gorilla-OpenFunctions-v0 Gorilla LLM Apache 2.0 29.88 24.06 4.58
21 24.58 Glaive-v1 Glaive cc-by-sa-4.0 15.14 14.92 46.25

Source

Berkeley Function-Calling Leaderboard

Suggested labels

@irthomasthomas irthomasthomas added ai-leaderboards leaderdoards for llm's and other ml models llm-function-calling Function Calling with Large Language Models openai OpenAI APIs, LLMs, Recipes and Evals labels Mar 16, 2024
@irthomasthomas
Copy link
Owner Author

Related content

#331 similarity score: 0.89
#625 similarity score: 0.89
#456 similarity score: 0.89
#358 similarity score: 0.88
#366 similarity score: 0.88
#725 similarity score: 0.88

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai-leaderboards leaderdoards for llm's and other ml models llm-function-calling Function Calling with Large Language Models openai OpenAI APIs, LLMs, Recipes and Evals
Projects
None yet
Development

No branches or pull requests

1 participant