-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When is the code executed? #414
Comments
Yes, code is executed when it appears as a code block in Markdown, generated in the previous message. However this only occurs if code execution is configured. Can you confirm that your user_proxy is instantiated something like this?
|
Note that code_execution can be a Dict or Boolean ie False. When False, that agent does not execute code. user_proxy = UserProxyAgent(
"user_proxy",
code_execution_config=False,
max_consecutive_auto_reply=10,
) |
Thank you, I understand. During my use, I encountered useragent and assistant sending code to each other, but the generated code never be executed. Even though I entered "please run it" and my configuration files were configured correctly, the code provided by the language model sometimes still did not execute. Therefore, I think why is this?
…------------------ Original ------------------
From: Victor Dibia ***@***.***>
Date: Wed,Oct 25,2023 11:43 PM
To: microsoft/autogen ***@***.***>
Cc: MrSchnappi ***@***.***>, Author ***@***.***>
Subject: Re: [microsoft/autogen] When is the code executed? (Issue #414)
Note that code_execution can be a Dict or Boolean ie False. When False, that agent does not execute code.
user_proxy = UserProxyAgent( "user_proxy", code_execution_config=False, max_consecutive_auto_reply=10, )
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
This is an important observation. It has to do with how code is extracted. Short story is that we have some logic to extract code from a response. CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n```" Some models may generate code that is formatted differently causing this extraction to fail. The result is that you may see code generated but no code block is extracted and run. What model are you using, and can you post some examples of the chat history where code was generated but not executed? This way we can debug/verify that extraction is the culprit here. Overall, as we gather more information on the behaviors of different models (how they generate code), we can improve default prompting for agents (e.g., steer models better towards well formed code blocks) and also improve our code extraction logic (e.g., #399 ) |
I'm also wondering if indentation of other factors are playing into this failure to detect. As an example, I have a trace from the testbed the looks like this (below). Despite many code blocks, none are extracted. [edit] Actually, re-reading the regular expression, these fail because there is whitespace before the closing backticks, due to indentation. More generally, I think we should consider even a small python code classifier, and toss the codeblock back to the agent for reformatting in case it looks suspiciously like python code by fails extraction -- similar to how we handle incorrectly formatted json. (Note I replaced ` with ' in the listing to escape it for rendering here on GH. In reality all the single quotes are backticks):
|
I actually follow the Juptyer notebook example from official repo called 'agentchat_auto_feedback_from_code_execution.ipynb' and run step by step. But I cannot find the result.txt or the stock_price_ytd.png though the chat proceed normally without any error. I'm not sure if the code is really execute. Do I need to do any modifications on this notebook or something? |
It seems less stable, the same code, sometimes can generate files, sometimes can not generate files. |
I use Mistral-7B and WizardCoder 13B as the base model. And there is a prompt from autogen for the assistant that tells him when the task is done, give 'TERMINATE' at the end of the generated text. Then there is a default configuration that when detect the keywords 'TERMINATE' then finish the task. So, when my agent generated the code each time it gives 'TERMINATE' that it thinks the job is done, the proxy agent detected 'TERMINATE' and finish the flow. If I remove the configuration for the proxy agent to detect 'TERMINATE', then it can execute the code, but it could easily go to the wired loop that the conversation will never stop. Then the max number of conversion should be set, but not very smart since somthimes, it needs more steps, but sometime it doesn't and more stupid actions are forced to be added even though the task has already been done perfectly. |
I would like to know how the system determines when to execute the code generated by the language model. I understand that there is a file that parses the generated strings from the model, and if it contains code, the code is extracted and executed. However, I'm unsure about the conditions that trigger this operation. In my application, I have experienced situations where Assistant and User Proxy (auto-reply) modify the code many times but do not execute the code portion.
The text was updated successfully, but these errors were encountered: