Is there a better way to execute the generated code? #768

thinkall · 2023-11-26T01:50:42Z

The current code execute logic has an issue with code generated in several blocks and later blocks depends on the former ones.

For example:

User_Proxy (to Assistant_Agent):

Write a python function to compute the sum of numbers.

Assistant_Agent (to User_Proxy):

Sure! Here's a Python function that computes the sum of numbers:
def sum_numbers(numbers):
    return sum(numbers)
You can use this function by passing a list of numbers to it, like this:
numbers = [1, 2, 3, 4, 5]
sum_result = sum_numbers(numbers)
print(sum_result)  # Output: 15
This function uses the sum() built-in function to calculate the sum of the numbers in the list. It returns the result as a single number.

If you want to compute the sum of an iterable such as a tuple or an iterator, you can also modify the function to accept those inputs instead of a list.

For example, if you have a tuple of numbers like (1, 2, 3, 4) you can pass it to the function like this:
numbers_tuple = (1, 2, 3, 4)
sum_result = sum_numbers(numbers_tuple)
print(sum_result)  # Output: 10
This will work for any iterable that contains numbers and you can modify the function accordingly to handle different types of inputs.

USING AUTO REPLY...xxxx

EXECUTING CODE BLOCK 0 (inferred language is python)...

EXECUTING CODE BLOCK 1 (inferred language is python)...
User_Proxy (to Assistant_Agent):

exitcode: 1 (execution failed)
Code output:

Traceback (most recent call last):
File "", line 2, in
sum_result = sum_numbers(numbers)
NameError: name 'sum_numbers' is not defined

The generated code is good but the execution is failed because the code blocks are exectuted separately. With the feedback NameError: name 'sum_numbers' is not defined, GPT-3.5-turbo usually can merge the blocks into one block, but other models which are not as good as GPT-3.5 will not merge them and keep failing. Anyway, it would be better to be able to correctly execute the blocks.

The text was updated successfully, but these errors were encountered:

victordibia · 2023-11-28T06:09:33Z

Good observation here.
My initial intuition is the model thinks the code being generated is part of a human-in-the-loop chat (which is expected for a chat finetuned model), where the human will aggregate and execute the blocks. One solution might be to improve prompting (e.g., return full executable blocks of code .. maybe with some few shot examples?)

I recall that @afourney also noted a few cases where the order of code blocks had some impact on the results (e.g. installing deps before code)

afourney · 2023-11-28T06:36:07Z

Yes @victordibia that is issue #430. I think prompting can help, but the existing prompt is pretty strong. It states:

"The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes."

I think one problem is that the system prompt is right at the top of the conversation, and can be forgotten in longer conversations. Perhaps a floating system prompt would be better (moving it dynamically to right before the generation)

thinkall · 2023-11-28T09:00:10Z

Thanks @victordibia , @afourney , it seems that many weaker models will not follow the prompt as expected.
It would be great if we can merge the code blocks in a single response in the post processing step. But this is a non-trival process.

afourney · 2023-11-28T15:14:51Z

What if, instead of executing code, we have the user proxy return a static message when the extracted block count is > 1 (and languages match). Something like "Please consolidate this into only one self-contained code block."

This would result in an extra call, but would use the LLMs coding abilities to hopefully correctly synthesize the code.

thinkall · 2023-11-29T12:28:26Z

What if, instead of executing code, we have the user proxy return a static message when the extracted block count is > 1 (and languages match). Something like "Please consolidate this into only one self-contained code block."

This would result in an extra call, but would use the LLMs coding abilities to hopefully correctly synthesize the code.

Sounds good! Maybe we can create a function to consolidate the code, and in the function, it actually calls the LLM to do it. In this way, we could save some token usage.

jackgerrits · 2024-03-29T20:51:07Z

Is this solved by using the stateful jupyter code executor?

thinkall · 2024-04-17T12:23:33Z

Is this solved by using the stateful jupyter code executor?

I get AttributeError: 'WebSocket' object has no attribute 'send_text', the code is

autogen/autogen/coding/jupyter/jupyter_client.py

Line 139 in c4e5703

self._websocket.send_text(json.dumps(message))

Replace send_text with send worked for me.

ekzhu · 2024-04-17T16:50:06Z

@jackgerrits please see the above comment.

jackgerrits · 2024-04-18T17:34:38Z

Is this solved by using the stateful jupyter code executor?

I get AttributeError: 'WebSocket' object has no attribute 'send_text', the code is

autogen/autogen/coding/jupyter/jupyter_client.py

Line 139 in c4e5703

self._websocket.send_text(json.dumps(message))

Replace send_text with send worked for me.

What version of websocket-client is installed?

thinkall · 2024-04-19T13:31:15Z

Is this solved by using the stateful jupyter code executor?

I get AttributeError: 'WebSocket' object has no attribute 'send_text', the code is

autogen/autogen/coding/jupyter/jupyter_client.py

Line 139 in c4e5703

self._websocket.send_text(json.dumps(message))

Replace send_text with send worked for me.

What version of websocket-client is installed?

websocket-client 1.6.4
websockets 12.0

jackgerrits · 2024-04-19T13:37:25Z

Could you retry using websocket-client==1.7.0?

thinkall · 2024-04-22T01:25:04Z

Could you retry using websocket-client==1.7.0?

1.7.0 works well. I checked the code of websocket-client, send_text was added in 1.7.0. Since send works in our case as well, I'd suggest use send instead of send_text. Or at least update the extras in setup.py.

rysweet · 2024-10-18T16:49:54Z

closing resolved

thinkall added the code-execution execute generated code label Nov 26, 2023

rysweet added 0.2 Issues which are related to the pre 0.4 codebase needs-triage labels Oct 2, 2024

rysweet removed the needs-triage label Oct 18, 2024

rysweet closed this as completed Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a better way to execute the generated code? #768

Is there a better way to execute the generated code? #768

thinkall commented Nov 26, 2023

victordibia commented Nov 28, 2023

afourney commented Nov 28, 2023 •

edited

Loading

thinkall commented Nov 28, 2023

afourney commented Nov 28, 2023 •

edited

Loading

thinkall commented Nov 29, 2023

jackgerrits commented Mar 29, 2024

thinkall commented Apr 17, 2024

ekzhu commented Apr 17, 2024

jackgerrits commented Apr 18, 2024

thinkall commented Apr 19, 2024

jackgerrits commented Apr 19, 2024

thinkall commented Apr 22, 2024

rysweet commented Oct 18, 2024

Is there a better way to execute the generated code? #768

Is there a better way to execute the generated code? #768

Comments

thinkall commented Nov 26, 2023

victordibia commented Nov 28, 2023

afourney commented Nov 28, 2023 • edited Loading

thinkall commented Nov 28, 2023

afourney commented Nov 28, 2023 • edited Loading

thinkall commented Nov 29, 2023

jackgerrits commented Mar 29, 2024

thinkall commented Apr 17, 2024

ekzhu commented Apr 17, 2024

jackgerrits commented Apr 18, 2024

thinkall commented Apr 19, 2024

jackgerrits commented Apr 19, 2024

thinkall commented Apr 22, 2024

rysweet commented Oct 18, 2024

afourney commented Nov 28, 2023 •

edited

Loading

afourney commented Nov 28, 2023 •

edited

Loading