Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When is the code executed? #414

Open
MrSchnappi opened this issue Oct 25, 2023 · 8 comments
Open

When is the code executed? #414

MrSchnappi opened this issue Oct 25, 2023 · 8 comments
Labels
code-execution execute generated code

Comments

@MrSchnappi
Copy link

I would like to know how the system determines when to execute the code generated by the language model. I understand that there is a file that parses the generated strings from the model, and if it contains code, the code is extracted and executed. However, I'm unsure about the conditions that trigger this operation. In my application, I have experienced situations where Assistant and User Proxy (auto-reply) modify the code many times but do not execute the code portion.

@MrSchnappi MrSchnappi changed the title hen is the code executed? When is the code executed? Oct 25, 2023
@afourney
Copy link
Member

Yes, code is executed when it appears as a code block in Markdown, generated in the previous message. However this only occurs if code execution is configured.

Can you confirm that your user_proxy is instantiated something like this?

user_proxy = UserProxyAgent(
    "user_proxy",
    code_execution_config={
        "work_dir": "coding",
    },
    max_consecutive_auto_reply=10,
)

@afourney afourney added the code-execution execute generated code label Oct 25, 2023
@victordibia
Copy link
Collaborator

Note that code_execution can be a Dict or Boolean ie False. When False, that agent does not execute code.

user_proxy = UserProxyAgent(
    "user_proxy",
    code_execution_config=False,
    max_consecutive_auto_reply=10,
)

@MrSchnappi
Copy link
Author

MrSchnappi commented Oct 25, 2023 via email

@victordibia
Copy link
Collaborator

This is an important observation.
I know of atleast one reason why code might not execute even when your code_execution is configured correctly.

It has to do with how code is extracted. Short story is that we have some logic to extract code from a response.
This logic makes some assumption about the structure of that code e.g. that it is wrapped in a codeblock in markdown.

CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n```"

Some models may generate code that is formatted differently causing this extraction to fail. The result is that you may see code generated but no code block is extracted and run.

What model are you using, and can you post some examples of the chat history where code was generated but not executed? This way we can debug/verify that extraction is the culprit here.

Overall, as we gather more information on the behaviors of different models (how they generate code), we can improve default prompting for agents (e.g., steer models better towards well formed code blocks) and also improve our code extraction logic (e.g., #399 )

@afourney
Copy link
Member

afourney commented Oct 25, 2023

I'm also wondering if indentation of other factors are playing into this failure to detect. As an example, I have a trace from the testbed the looks like this (below). Despite many code blocks, none are extracted.

[edit] Actually, re-reading the regular expression, these fail because there is whitespace before the closing backticks, due to indentation. CODE_BLOCK_PATTERN = r"```(\w*)\n(.*?)\n\s*```" would solve the problem here.

More generally, I think we should consider even a small python code classifier, and toss the codeblock back to the agent for reformatting in case it looks suspiciously like python code by fails extraction -- similar to how we handle incorrectly formatted json.

(Note I replaced ` with ' in the listing to escape it for rendering here on GH. In reality all the single quotes are backticks):


assistant (to user_proxy):

To plot and save a chart of NVDA (NVIDIA Corporation) and TESLA stock price year-to-date (YTD), you can use the Python library matplotlib. You will also need to install the pandas and pandas_datareader libraries to fetch the stock data.

Here is a step-by-step plan to accomplish this task:

1. Install the required libraries:
   - pandas: '!pip install pandas'
   - pandas_datareader: '!pip install pandas_datareader'
   - matplotlib: '!pip install matplotlib'

2. Import the necessary libraries:
   '''python
   import pandas as pd
   import pandas_datareader as pdr
   import matplotlib.pyplot as plt
   '''

3. Fetch the stock data for NVDA and TSLA using the 'pdr.DataReader' function:
   '''python
   # Specify the start and end dates for the data
   start_date = '2022-01-01'
   end_date = '2022-12-31'

   # Fetch the stock data for NVDA and TSLA
   nvda_data = pdr.DataReader('NVDA', 'yahoo', start_date, end_date)
   tsla_data = pdr.DataReader('TSLA', 'yahoo', start_date, end_date)
   '''

4. Plot the stock prices using matplotlib:
   '''python
   # Create a new figure
   plt.figure(figsize=(12, 6))

   # Plot NVDA stock price
   plt.plot(nvda_data.index, nvda_data['Close'], label='NVDA')

   # Plot TSLA stock price
   plt.plot(tsla_data.index, tsla_data['Close'], label='TSLA')

   # Set the x-axis label and the title
   plt.xlabel('Date')
   plt.title('NVDA and TSLA Stock Price YTD')

   # Rotate the x-axis tick labels for better readability
   plt.xticks(rotation=45)

   # Add a legend
   plt.legend()

   # Show the plot
   plt.show()
   '''

5. Save the plot to a file using the 'savefig' function:
   '''python
   # Specify the file path to save the plot
   file_path = 'stock_price_ytd.png'

   # Save the plot to the specified file path
   plt.savefig(file_path, bbox_inches='tight')
   '''

You can execute the code snippet above in a Python environment to plot and save the chart of NVDA and TSLA stock price YTD. Make sure to replace the 'file_path' variable with your desired file path for saving the plot.

Give it a try and let me know if you encounter any issues.

--------------------------------------------------------------------------------
user_proxy (to assistant):

--------------------------------------------------------------------------------

@ruifengma
Copy link
Collaborator

I actually follow the Juptyer notebook example from official repo called 'agentchat_auto_feedback_from_code_execution.ipynb' and run step by step. But I cannot find the result.txt or the stock_price_ytd.png though the chat proceed normally without any error. I'm not sure if the code is really execute. Do I need to do any modifications on this notebook or something?

@tianyalangzi
Copy link

It seems less stable, the same code, sometimes can generate files, sometimes can not generate files.

@ruifengma
Copy link
Collaborator

It seems less stable, the same code, sometimes can generate files, sometimes can not generate files.

I use Mistral-7B and WizardCoder 13B as the base model. And there is a prompt from autogen for the assistant that tells him when the task is done, give 'TERMINATE' at the end of the generated text. Then there is a default configuration that when detect the keywords 'TERMINATE' then finish the task. So, when my agent generated the code each time it gives 'TERMINATE' that it thinks the job is done, the proxy agent detected 'TERMINATE' and finish the flow. If I remove the configuration for the proxy agent to detect 'TERMINATE', then it can execute the code, but it could easily go to the wired loop that the conversation will never stop. Then the max number of conversion should be set, but not very smart since somthimes, it needs more steps, but sometime it doesn't and more stupid actions are forced to be added even though the task has already been done perfectly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-execution execute generated code
Projects
None yet
Development

No branches or pull requests

5 participants