Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: unable to make calculator example work with local models #2953

Open
geoffroy-noel-ddh opened this issue Jun 17, 2024 · 6 comments
Open
Labels
alt-models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.)

Comments

@geoffroy-noel-ddh
Copy link

Describe the issue

No matter which local model I use the calculator example from the autogen Tool Use tutorial fails in various ways:

  1. the agent often fails to produce inputs that match the function signature (e.g. passing extra parameter d or a number instead of the operator, {"a": 44232, "b": 13312, "operator": "/", "c": 232, "d": 32})
  2. even when the agent sometimes matches the function signature the mathematical expression isn't correctly decomposed into the correct elementary binary operations (e.g. {"a": 44232, "b": 13312, "operator": "+"});
  3. the agents are stuck in a loop and never terminate (even if it manages to return the correct result of a simpler expression);
  4. the agents never factor in the result of the calculator function to formulate the next messages (might be the cause for failure 2 above);

I would like to know:

  1. what is the cause of each problem (is it with autogen, local model, the prompt, my configuration, ...?)
  2. if anyone has managed to make the calculator example work with a local model?
  3. if so, how? could a complete working example be provided here or in the tutorial?

More generally the issue for me is that despite the autogen tutorial giving examples and ways to adapt them for local models, those instructions together don't seem to work with a large number of local models. So, perhaps, either the documentation should acknowledge that limitation upfront (and point to the technical reasons) or it should be more specific on how to make the example work. Knowing where the limitations are (autogen, model, litellm or ollama?) would be
very helpful.

Steps to reproduce

  1. take the python code of the calculator example exactly as it is written in the Tool Use tutorial (the first version of the code at the top of the tutorial page)
  2. change llm_config of the assistant to use LiteLLM + Ollama exactly as described in the user guide
  3. configure ollama to use llama3:instruct (or other similar model, e.g. mistral:v0.3, phi3:latest, ...)
  4. run the adapted calculator example

In the output log below which exhibits all the 4 issues described above.

Screenshots and logs

User (to Assistant):

What is (44232 + 13312 / (232 - 32)) * 5?

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_a1ff8853-5428-409f-805a-a9e1bf22ca42): calculator *****
Arguments: 
{"a": 44232, "b": 13312, "c": 200, "d": 32, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_a1ff8853-5428-409f-805a-a9e1bf22ca42) *****
Error: calculator() got an unexpected keyword argument 'c'
**********************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_f989593b-36ac-4788-bb22-30162c6462f8): calculator *****
Arguments: 
{"a": 44232, "b": 13312, "operator": "/", "c": "unexpected"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_f989593b-36ac-4788-bb22-30162c6462f8) *****
Error: calculator() got an unexpected keyword argument 'c'
**********************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_a602644e-4192-4168-a511-128f1131440e): calculator *****
Arguments: 
{"a": null, "b": null, "operator": null}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_a602644e-4192-4168-a511-128f1131440e) *****
Error: Invalid operator
**********************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_d839acb3-6898-4ed9-b36f-5c94cf4c46d1): calculator *****
Arguments: 
{"a": 44232, "b": 13312, "operator": "/"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_d839acb3-6898-4ed9-b36f-5c94cf4c46d1) *****
3
**********************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_a21cf647-2c00-4702-8288-fcb3fa78ffc7): calculator *****
Arguments: 
{"a": 44232, "b": 13312.0, "operator": "/"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_a21cf647-2c00-4702-8288-fcb3fa78ffc7) *****
3
**********************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_c5121502-c973-44b9-aa2e-f33aebf4f6f5): calculator *****
Arguments: 
{"a": 44232, "b": 13312.0, "operator": "**"}
***************************************************************************************

--------------------------------------------------------------------------------

Additional Information

  • pyautogen==0.2.28
  • OS: Ubuntu 22.04.4 LTS
  • python: 3.10.12
  • litellm: 1.40.0
  • ollama: 0.1.39
@geoffroy-noel-ddh
Copy link
Author

Another run with a trivial calculation (all the rest is unchanged, including the underlying model 'llama3:instruct') which is called correctly but still never terminates. This shows issue 3 & 4 as described at the top.

User (to Assistant):

What is (232 - 32)?

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_68407b7f-9752-4df2-9a42-fa3a5c76cc25): calculator *****
Arguments: 
{"a": 232, "b": 32, "operator": "-"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
User (to Assistant):

User (to Assistant):

***** Response from calling tool (call_68407b7f-9752-4df2-9a42-fa3a5c76cc25) *****
200
**********************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

***** Suggested tool call (call_f01a0f53-b313-468f-a2c1-334642c71d34): calculator *****
Arguments: 
{"a": 232, "b": 32, "operator": "-"}
***************************************************************************************

@geoffroy-noel-ddh
Copy link
Author

geoffroy-noel-ddh commented Jun 17, 2024

I noticed that the last output in the autogen LiteLLM with Ollama user guide seems to make similarly redundant calls despite getting the right answer. That's a different behaviour from the ones demonstrated and expected by the Tool Use tutorial, where the agent terminate the conversation with a plain English summary of the response.

>>>>>>>> USING AUTO REPLY...
Assistant (to User):

The result of the calculation is 221490. TERMINATE

@qingyun-wu qingyun-wu added the alt-models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.) label Jun 18, 2024
@yonitjio
Copy link

yonitjio commented Jun 18, 2024

No matter which local model I use the calculator example from the autogen Tool Use tutorial fails in various ways:

  1. the agent often fails to produce inputs that match the function signature (e.g. passing extra parameter d or a number instead of the operator, {"a": 44232, "b": 13312, "operator": "/", "c": 232, "d": 32})
  2. even when the agent sometimes matches the function signature the mathematical expression isn't correctly decomposed into the correct elementary binary operations (e.g. {"a": 44232, "b": 13312, "operator": "+"});
  3. the agents are stuck in a loop and never terminate (even if it manages to return the correct result of a simpler expression);
  4. the agents never factor in the result of the calculator function to formulate the next messages (might be the cause for failure 2 above);

I would like to know:

  1. what is the cause of each problem (is it with autogen, local model, the prompt, my configuration, ...?)
  2. if anyone has managed to make the calculator example work with a local model?
  3. if so, how? could a complete working example be provided here or in the tutorial?

I'll try to answer this, but please do take it with a load of salt, because this is based on my own experience and my limited understanding of how AI works. So this might be entirely correct or entirely wrong. LOL.

My conclusion after several experiment, this issue is more on the models than others.

There are two main factors:

  1. The model itself.
  2. The AI platform.

Please remember function calling is not actually the AI calling functions directly. It's just another form of text generation. The main problem here is "hallucination" (I'm not sure if hallucination is the correct word for this case, hence the double quotes).

If the model is not trained to use the specific function we provide, it will try to "understand" how to use it first, by trying to call it multiple times.

Unfortunately, this gets worse with complex tasks. Imo, the reason is the same, if the model is not trained to solve complex tasks, in this case, complex calculation, it will try the function even more.

The second factor, the AI platform, well AI platforms implement function calling differently. Let's take localai.io and litellm for examples. Litellm does not enforce how AI models generate function calling, this is the reason why we can get extra arguments for the function. It's different with localai.io, it can enforce the function definition.

But, that alone only solve the extra arguments, it won't solve the problem with AI calling the same function multiple times.

In addition with the model is not trained with the our function, it seems the model also get confused on how to answer. Once it uses a function, it seems to expect that everything must be solved via function calling.

Localai.io solved this by adding a "no action" function, users won't see it though.

Litellm doesn't have this. Perhaps litellm expect the model is smart enough to create replies without function calling. To solve this, we can add our own "no action" function.

BUT, unfortunately that's all we can do. We can try to minimize the probability calling function multiple times with prompts, or how we define the functions, but the underlying issue is still there.

My conclusion is, to have function calling works with local AI model we need to have the "correct" AI model for the task. Perhaps this issue can only be solved with fine tuning.

To demonstrate this issue, I made a small python script, I tested it with litellm, ollama and mistral.
I ask "What is (1 + 1) * 2?". It can solve it but not immediately.

Another example is here https://github.com/yonitjio/exploring-odoo/tree/main/calendar_bot. It demonstrate a workaround for similar issue by wrapping a several function into one function with command argument.

Code:

from textwrap import dedent

from typing import Annotated, Literal

from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager, filter_config, register_function

config_list = [
    {
        "model": "mistral",
        "base_url": "http://localhost:4000/v1",
        "api_key": "__localkey__",
    },
]

llm_config = {"config_list": filter_config(config_list, {"model": ["mistral"]}), "cache_seed": None}

Operator = Literal["+", "-", "*", "/"]

def termination_msg(x):
    return isinstance(x, dict) and "TERMINATE" == str(x.get("content", ""))[-9:].upper()

admin = UserProxyAgent(
    "admin",
    description="A proxy who act on behalf of user.",
    human_input_mode="NEVER",
    llm_config=False,
    is_termination_msg= termination_msg,
    code_execution_config={
        "work_dir": ".",
        "use_docker": False,
    },
)

ASSISTANT_SYSTEM_MESSAGE =\
"""
You are a helpful assistant.

### RULES
1. Do not use function unless it is necessary.
2. Avoid calling the same function with the same parameter multiple times.
3. Only answer what you are asked.
4. Carefully inspect the available functions, before calling the function. Takes your time, there's no need to hurry.
5. Split a complex task into simpler tasks when necessary. Plan your execution first.
6. Read the function definition carefully, do not add arguments that isn't described in the definition.

### EXAMPLES
To calculate (1 + 2) + (2 * 3), you need to split it to three tasks.
First calculate (1 + 2) which will result to 3
Second calculate (2 * 3) which will result to 6
Finally calculate the result, 3 + 6, which will result to 9.
Therefore the result of (1 + 2) + (2 * 3) is equal to 9.
"""
assistant = AssistantAgent(
    "assistant",
    description="A useful AI assistant.",
    system_message=dedent(ASSISTANT_SYSTEM_MESSAGE),
    human_input_mode="NEVER",
    llm_config=llm_config,
    code_execution_config=False,
)

def answer_or_reply(answer: Annotated[str, "Your answer or reply."]) -> str:
    return "TERMINATE"

def calculator(a: Annotated[int, "First number"], b: Annotated[int, "Second number"], operator: Annotated[Operator, "operator"]) -> Annotated[int, "Result"]:
    if operator == "+":
        return a + b
    elif operator == "-":
        return a - b
    elif operator == "*":
        return a * b
    elif operator == "/":
        return int(a / b)
    else:
        raise ValueError("Invalid operator")

CALCULATOR_DESCRIPTION = """\
    A simple calculator. Only accept two numbers and one operator.
    Args:
        a (int): First number.
        b (int): Second number.
        operator (str): Operator to be applied to the numbers. A single character, can be * or / or - or +.
    Returns:
        int: result of the operation on the numbers.

"""
register_function(calculator, caller=assistant, executor=admin, description=dedent(CALCULATOR_DESCRIPTION))

ANSWER_OR_REPLY_DESCRIPTION = """\
    Use this function to reply without using any function at all, e.g., ask for more information or to give final answer after calling a function.
"""
register_function(answer_or_reply, caller=assistant, executor=admin, description=dedent(ANSWER_OR_REPLY_DESCRIPTION))

def reset_agents():
    admin.reset()
    assistant.reset()

def main():
    reset_agents()

    question = "What is (1 + 1) * 2?"
    admin.initiate_chat(assistant, message=question)

if __name__ == "__main__":
    main()

The result:

admin (to assistant):

What is (1 + 1) * 2?

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_d28c5ddd-50e6-4e06-98da-e103157345f6): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "+"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_d28c5ddd-50e6-4e06-98da-e103157345f6) *****
2
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_611188ed-7e6e-4082-aeee-5dd0daa013f1): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "+"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_611188ed-7e6e-4082-aeee-5dd0daa013f1) *****
2
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_77f1ec68-fb25-4dba-90e9-ce0b6d70aee5): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "+"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_77f1ec68-fb25-4dba-90e9-ce0b6d70aee5) *****
2
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_d36f8276-a7b2-43a1-a1c5-26f3a7800095): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_d36f8276-a7b2-43a1-a1c5-26f3a7800095) *****
1
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_ad5cfbd7-b1ab-4e69-82cf-86fb4e7e3ad6): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_ad5cfbd7-b1ab-4e69-82cf-86fb4e7e3ad6) *****
1
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_0ae8f877-738e-46b3-abc7-886969c3038f): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_0ae8f877-738e-46b3-abc7-886969c3038f) *****
1
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_aebe8377-8378-47c7-9681-74adf548e51f): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_aebe8377-8378-47c7-9681-74adf548e51f) *****
1
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_6a67011b-2e66-4e2b-84cd-0c51b9d89ad4): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "+"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_6a67011b-2e66-4e2b-84cd-0c51b9d89ad4) *****
2
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_94a33f76-9d98-4e12-8bfb-0a040dd9533e): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_94a33f76-9d98-4e12-8bfb-0a040dd9533e) *****
1
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_39e0ecfb-699a-4f2f-8fa5-2f77621ffffb): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_39e0ecfb-699a-4f2f-8fa5-2f77621ffffb) *****
1
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_5029da31-11eb-4cc5-b9fe-abbc50ded211): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_5029da31-11eb-4cc5-b9fe-abbc50ded211) *****
1
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_fda1a294-ceca-4424-b30f-3d41d9b864b1): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_fda1a294-ceca-4424-b30f-3d41d9b864b1) *****
1
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_27989cd8-2708-4c67-9f7a-77da3a4c56ce): calculator *****
Arguments:
{"a": 1, "b": 1, "operator": "*"}
***************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION calculator...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_27989cd8-2708-4c67-9f7a-77da3a4c56ce) *****
1
**********************************************************************************

--------------------------------------------------------------------------------
assistant (to admin):

***** Suggested tool call (call_2c5af7a8-b8ae-44dc-a83a-2b2bc3f771c6): answer_or_reply *****
Arguments:
{"answer": "First, calculate (1 + 1) which will result to 2\nSecond, multiply that by 2.\nTherefore the result of (1 + 1) * 2 is equal to 4."}
********************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION answer_or_reply...
admin (to assistant):

admin (to assistant):

***** Response from calling tool (call_2c5af7a8-b8ae-44dc-a83a-2b2bc3f771c6) *****
TERMINATE
**********************************************************************************

@geoffroy-noel-ddh
Copy link
Author

@yonitjio Thank you for taking the time to reproduce the code and give some answers. That's really appreciated.

From what you explained & showed, it looks like local models can overcome problem 1 (calls not matching the function signature) & 3 (never terminating). I've now experimented with larger models (e.g. mixtral:8x22b) and they are indeed better at producing inputs that match the function arguments.

However the causes for problem 2 (correct decomposition of the expression into smaller parts) & 4 (agents not taking into account the function output) remains unclear to me.

It looks like local models are able to properly decompose the expression when they don't have to call a function.

Here's an example of a chat with mixtral (using ollama):

Prompt: Given a function f(operand1, operand2, operator), call f as many times as necessary to resolve the following calculation: (44232 + 13312 / (232 - 32)) * 5.

Response:

BODMAS rule (Brackets, Orders [i.e., powers and square roots], Division and Multiplication, Addition and Subtraction).

1. Calculate value inside parentheses: 232 - 32 = f(232, 32, '-') = 200
2. Perform the division operation: 13312 / 200 = f(13312, 200, '/')
3. Add the result of step 2 with 44232: 44232 + (result of step 2) = f(44232, result of step 2, '+')
4. Multiply the result from step 3 by 5: (result of step 3) * 5 = f(result of step 3, 5, '*')

This is logically correct. The question is why is the agent unable to work as well when producing function inputs (problem 2)?

Problem 4 remains a mystery to me. In the execution log you've have shared, my understanding is that the function never returns 4, because the agent never builds on the previous output (2) to send the following input (2, 2, *). In your example, the model give the correct answer "4" at the end by bypassing the calculator entirely. I don't see much sign that it uses previous outputs.

{"answer": "First, calculate (1 + 1) which will result to 2\nSecond, multiply that by 2.\nTherefore the result of (1 + 1) * 2 is equal to 4."}

I don't understand why no local model seem to be able to build on previous output to progress with resolving the expression. Just like the Autogen calculator example using OpenAI does. And I still wonder if this problem is b/c:

  • no open model has been trained to do that (i.e. looking at the call history which is passed to them)
  • this is a shortcoming of litellm or ollama
  • this is a shortcoming of autogen

So my question remain. If anyone is able to make the example work with a local model I'd love to hear how. If not I'd like to know where the limitation lies exactly.

@yonitjio
Copy link

yonitjio commented Jun 18, 2024

Well, so far my understanding is, it is mainly the models. They just aren't that good with these kind of task, i.e., not trained for the task.

This also should be the reason why it can answer 2 * 2 = 4 without using calculator, because 2*2=4 was in the training data. I guess it's similar on how we know by "heart", 2*2=4. Silly, but I can't think any other answer. Well there's other possibility that it was just winging it... LOL.

As I mentioned before, there are other factors, in your case, using litellm helps since Ollama's Open AI API implementation doesn't support function calling yet, but litellm does with a limitation. The limitation is, it works by adding prompts for the function call. Unfortunately, different models may use different prompt format. Afaik, using different format than what the model was train on, degrades its performance.

And as for Autogen, well, I would not call it a shortcoming, but since this is about generative text after all, generally it does contribute to everything because of the default prompts which may not suitable for the model when it's used, but imo, in this example it doesn't have anything at all to the issue.

So, to answer your question, imo, the only way to make the calculator example works "perfectly" is to finetune the model to use the calculator function and use the prompt format that the model was trained on.

Otherwise, it just the same with everything else, which is finding the better prompt and hope it works all the time.

Btw, I did add this in the prompt 1. Do not use function unless it is necessary..

@scruffynerf
Copy link

scruffynerf commented Jun 19, 2024

Interesting... I hadn't considering the 'extra' arguments problem, but it's a good point. I just wrote 'tools for the toolless' (so essentially the very models under discussion), and I will look at adding some 'guiderails' to avoid picking up extra arguments, either rejecting fully (which I do if it makes a tool call for something not in the tool list, like making up a 'talk_to_user' tool, with argument of message="Yes, I did the thing")

Do not use function unless it is necessary

Btw, I find the opposite problem: it'll use tools only if prompted to do so....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alt-models Pertains to using alternate, non-GPT, models (e.g., local models, llama, etc.)
Projects
None yet
Development

No branches or pull requests

4 participants