When is the code executed? #414

MrSchnappi · 2023-10-25T05:39:02Z

I would like to know how the system determines when to execute the code generated by the language model. I understand that there is a file that parses the generated strings from the model, and if it contains code, the code is extracted and executed. However, I'm unsure about the conditions that trigger this operation. In my application, I have experienced situations where Assistant and User Proxy (auto-reply) modify the code many times but do not execute the code portion.

afourney · 2023-10-25T15:08:25Z

Yes, code is executed when it appears as a code block in Markdown, generated in the previous message. However this only occurs if code execution is configured.

Can you confirm that your user_proxy is instantiated something like this?

user_proxy = UserProxyAgent(
    "user_proxy",
    code_execution_config={
        "work_dir": "coding",
    },
    max_consecutive_auto_reply=10,
)

victordibia · 2023-10-25T15:43:43Z

Note that code_execution can be a Dict or Boolean ie False. When False, that agent does not execute code.

user_proxy = UserProxyAgent(
    "user_proxy",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

MrSchnappi · 2023-10-25T15:54:59Z

Thank you, I understand. During my use, I encountered useragent and assistant sending code to each other, but the generated code never be executed. Even though I entered "please run it" and my configuration files were configured correctly, the code provided by the language model sometimes still did not execute. Therefore, I think why is this?

…

------------------ Original ------------------ From: Victor Dibia ***@***.***> Date: Wed,Oct 25,2023 11:43 PM To: microsoft/autogen ***@***.***> Cc: MrSchnappi ***@***.***>, Author ***@***.***> Subject: Re: [microsoft/autogen] When is the code executed? (Issue #414) Note that code_execution can be a Dict or Boolean ie False. When False, that agent does not execute code. user_proxy = UserProxyAgent( "user_proxy", code_execution_config=False, max_consecutive_auto_reply=10, ) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

victordibia · 2023-10-25T16:12:14Z

This is an important observation.
I know of atleast one reason why code might not execute even when your code_execution is configured correctly.

It has to do with how code is extracted. Short story is that we have some logic to extract code from a response.
This logic makes some assumption about the structure of that code e.g. that it is wrapped in a codeblock in markdown.

CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n```"

Some models may generate code that is formatted differently causing this extraction to fail. The result is that you may see code generated but no code block is extracted and run.

What model are you using, and can you post some examples of the chat history where code was generated but not executed? This way we can debug/verify that extraction is the culprit here.

Overall, as we gather more information on the behaviors of different models (how they generate code), we can improve default prompting for agents (e.g., steer models better towards well formed code blocks) and also improve our code extraction logic (e.g., #399 )

afourney · 2023-10-25T16:39:55Z

I'm also wondering if indentation of other factors are playing into this failure to detect. As an example, I have a trace from the testbed the looks like this (below). Despite many code blocks, none are extracted.

[edit] Actually, re-reading the regular expression, these fail because there is whitespace before the closing backticks, due to indentation. CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n\s*```" would solve the problem here.

More generally, I think we should consider even a small python code classifier, and toss the codeblock back to the agent for reformatting in case it looks suspiciously like python code by fails extraction -- similar to how we handle incorrectly formatted json.

(Note I replaced ` with ' in the listing to escape it for rendering here on GH. In reality all the single quotes are backticks):


assistant (to user_proxy):

To plot and save a chart of NVDA (NVIDIA Corporation) and TESLA stock price year-to-date (YTD), you can use the Python library matplotlib. You will also need to install the pandas and pandas_datareader libraries to fetch the stock data.

Here is a step-by-step plan to accomplish this task:

1. Install the required libraries:
   - pandas: '!pip install pandas'
   - pandas_datareader: '!pip install pandas_datareader'
   - matplotlib: '!pip install matplotlib'

2. Import the necessary libraries:
   '''python
   import pandas as pd
   import pandas_datareader as pdr
   import matplotlib.pyplot as plt
   '''

3. Fetch the stock data for NVDA and TSLA using the 'pdr.DataReader' function:
   '''python
   # Specify the start and end dates for the data
   start_date = '2022-01-01'
   end_date = '2022-12-31'

   # Fetch the stock data for NVDA and TSLA
   nvda_data = pdr.DataReader('NVDA', 'yahoo', start_date, end_date)
   tsla_data = pdr.DataReader('TSLA', 'yahoo', start_date, end_date)
   '''

4. Plot the stock prices using matplotlib:
   '''python
   # Create a new figure
   plt.figure(figsize=(12, 6))

   # Plot NVDA stock price
   plt.plot(nvda_data.index, nvda_data['Close'], label='NVDA')

   # Plot TSLA stock price
   plt.plot(tsla_data.index, tsla_data['Close'], label='TSLA')

   # Set the x-axis label and the title
   plt.xlabel('Date')
   plt.title('NVDA and TSLA Stock Price YTD')

   # Rotate the x-axis tick labels for better readability
   plt.xticks(rotation=45)

   # Add a legend
   plt.legend()

   # Show the plot
   plt.show()
   '''

5. Save the plot to a file using the 'savefig' function:
   '''python
   # Specify the file path to save the plot
   file_path = 'stock_price_ytd.png'

   # Save the plot to the specified file path
   plt.savefig(file_path, bbox_inches='tight')
   '''

You can execute the code snippet above in a Python environment to plot and save the chart of NVDA and TSLA stock price YTD. Make sure to replace the 'file_path' variable with your desired file path for saving the plot.

Give it a try and let me know if you encounter any issues.

--------------------------------------------------------------------------------
user_proxy (to assistant):

--------------------------------------------------------------------------------

ruifengma · 2023-11-13T03:15:29Z

I actually follow the Juptyer notebook example from official repo called 'agentchat_auto_feedback_from_code_execution.ipynb' and run step by step. But I cannot find the result.txt or the stock_price_ytd.png though the chat proceed normally without any error. I'm not sure if the code is really execute. Do I need to do any modifications on this notebook or something?

tianyalangzi · 2023-11-13T07:31:48Z

It seems less stable, the same code, sometimes can generate files, sometimes can not generate files.

ruifengma · 2023-11-20T02:32:05Z

It seems less stable, the same code, sometimes can generate files, sometimes can not generate files.

I use Mistral-7B and WizardCoder 13B as the base model. And there is a prompt from autogen for the assistant that tells him when the task is done, give 'TERMINATE' at the end of the generated text. Then there is a default configuration that when detect the keywords 'TERMINATE' then finish the task. So, when my agent generated the code each time it gives 'TERMINATE' that it thinks the job is done, the proxy agent detected 'TERMINATE' and finish the flow. If I remove the configuration for the proxy agent to detect 'TERMINATE', then it can execute the code, but it could easily go to the wired loop that the conversation will never stop. Then the max number of conversion should be set, but not very smart since somthimes, it needs more steps, but sometime it doesn't and more stupid actions are forced to be added even though the task has already been done perfectly.

MrSchnappi changed the title ~~hen is the code executed?~~ When is the code executed? Oct 25, 2023

afourney added the code-execution execute generated code label Oct 25, 2023

This was referenced Oct 25, 2023

Local LLM Message Cleanup : Code Execution in code_utils.py #399

Open

Added a simple Testbed tool for repeatedly running templated Autogen scenarios with tightly-controlled initial conditions. #455

Merged

make infer_lang() more robust #452

Closed

afourney mentioned this issue Nov 7, 2023

improve CODE_BLOCK_PATTERN for a more robust code match #571

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When is the code executed? #414

When is the code executed? #414

MrSchnappi commented Oct 25, 2023

afourney commented Oct 25, 2023

victordibia commented Oct 25, 2023

MrSchnappi commented Oct 25, 2023 via email

victordibia commented Oct 25, 2023

afourney commented Oct 25, 2023 •

edited

ruifengma commented Nov 13, 2023

tianyalangzi commented Nov 13, 2023

ruifengma commented Nov 20, 2023

When is the code executed? #414

When is the code executed? #414

Comments

MrSchnappi commented Oct 25, 2023

afourney commented Oct 25, 2023

victordibia commented Oct 25, 2023

MrSchnappi commented Oct 25, 2023 via email

victordibia commented Oct 25, 2023

afourney commented Oct 25, 2023 • edited

ruifengma commented Nov 13, 2023

tianyalangzi commented Nov 13, 2023

ruifengma commented Nov 20, 2023

afourney commented Oct 25, 2023 •

edited