Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a better way to execute the generated code? #768

Closed
thinkall opened this issue Nov 26, 2023 · 13 comments
Closed

Is there a better way to execute the generated code? #768

thinkall opened this issue Nov 26, 2023 · 13 comments
Labels
0.2 Issues which were filed before re-arch to 0.4 code-execution execute generated code

Comments

@thinkall
Copy link
Collaborator

The current code execute logic has an issue with code generated in several blocks and later blocks depends on the former ones.

For example:

User_Proxy (to Assistant_Agent):

Write a python function to compute the sum of numbers.


Assistant_Agent (to User_Proxy):

Sure! Here's a Python function that computes the sum of numbers:

def sum_numbers(numbers):
    return sum(numbers)

You can use this function by passing a list of numbers to it, like this:

numbers = [1, 2, 3, 4, 5]
sum_result = sum_numbers(numbers)
print(sum_result)  # Output: 15

This function uses the sum() built-in function to calculate the sum of the numbers in the list. It returns the result as a single number.

If you want to compute the sum of an iterable such as a tuple or an iterator, you can also modify the function to accept those inputs instead of a list.

For example, if you have a tuple of numbers like (1, 2, 3, 4) you can pass it to the function like this:

numbers_tuple = (1, 2, 3, 4)
sum_result = sum_numbers(numbers_tuple)
print(sum_result)  # Output: 10

This will work for any iterable that contains numbers and you can modify the function accordingly to handle different types of inputs.


USING AUTO REPLY...xxxx

EXECUTING CODE BLOCK 0 (inferred language is python)...

EXECUTING CODE BLOCK 1 (inferred language is python)...
User_Proxy (to Assistant_Agent):

exitcode: 1 (execution failed)
Code output:

Traceback (most recent call last):
File "", line 2, in
sum_result = sum_numbers(numbers)
NameError: name 'sum_numbers' is not defined

The generated code is good but the execution is failed because the code blocks are exectuted separately. With the feedback NameError: name 'sum_numbers' is not defined, GPT-3.5-turbo usually can merge the blocks into one block, but other models which are not as good as GPT-3.5 will not merge them and keep failing. Anyway, it would be better to be able to correctly execute the blocks.

@thinkall thinkall added the code-execution execute generated code label Nov 26, 2023
@victordibia
Copy link
Collaborator

Good observation here.
My initial intuition is the model thinks the code being generated is part of a human-in-the-loop chat (which is expected for a chat finetuned model), where the human will aggregate and execute the blocks. One solution might be to improve prompting (e.g., return full executable blocks of code .. maybe with some few shot examples?)

I recall that @afourney also noted a few cases where the order of code blocks had some impact on the results (e.g. installing deps before code)

@afourney
Copy link
Member

afourney commented Nov 28, 2023

Yes @victordibia that is issue #430. I think prompting can help, but the existing prompt is pretty strong. It states:

"The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes."

I think one problem is that the system prompt is right at the top of the conversation, and can be forgotten in longer conversations. Perhaps a floating system prompt would be better (moving it dynamically to right before the generation)

@thinkall
Copy link
Collaborator Author

Thanks @victordibia , @afourney , it seems that many weaker models will not follow the prompt as expected.
It would be great if we can merge the code blocks in a single response in the post processing step. But this is a non-trival process.

@afourney
Copy link
Member

afourney commented Nov 28, 2023

What if, instead of executing code, we have the user proxy return a static message when the extracted block count is > 1 (and languages match). Something like "Please consolidate this into only one self-contained code block."

This would result in an extra call, but would use the LLMs coding abilities to hopefully correctly synthesize the code.

@thinkall
Copy link
Collaborator Author

What if, instead of executing code, we have the user proxy return a static message when the extracted block count is > 1 (and languages match). Something like "Please consolidate this into only one self-contained code block."

This would result in an extra call, but would use the LLMs coding abilities to hopefully correctly synthesize the code.

Sounds good! Maybe we can create a function to consolidate the code, and in the function, it actually calls the LLM to do it. In this way, we could save some token usage.

@jackgerrits
Copy link
Member

Is this solved by using the stateful jupyter code executor?

@thinkall
Copy link
Collaborator Author

Is this solved by using the stateful jupyter code executor?

I get AttributeError: 'WebSocket' object has no attribute 'send_text', the code is

self._websocket.send_text(json.dumps(message))

Replace send_text with send worked for me.

@ekzhu
Copy link
Collaborator

ekzhu commented Apr 17, 2024

@jackgerrits please see the above comment.

@jackgerrits
Copy link
Member

Is this solved by using the stateful jupyter code executor?

I get AttributeError: 'WebSocket' object has no attribute 'send_text', the code is

self._websocket.send_text(json.dumps(message))

Replace send_text with send worked for me.

What version of websocket-client is installed?

@thinkall
Copy link
Collaborator Author

Is this solved by using the stateful jupyter code executor?

I get AttributeError: 'WebSocket' object has no attribute 'send_text', the code is

self._websocket.send_text(json.dumps(message))

Replace send_text with send worked for me.

What version of websocket-client is installed?

websocket-client 1.6.4
websockets 12.0

@jackgerrits
Copy link
Member

Could you retry using websocket-client==1.7.0?

@thinkall
Copy link
Collaborator Author

Could you retry using websocket-client==1.7.0?

1.7.0 works well. I checked the code of websocket-client, send_text was added in 1.7.0. Since send works in our case as well, I'd suggest use send instead of send_text. Or at least update the extras in setup.py.

@rysweet rysweet added 0.2 Issues which were filed before re-arch to 0.4 needs-triage labels Oct 2, 2024
@rysweet
Copy link
Collaborator

rysweet commented Oct 18, 2024

closing resolved

@rysweet rysweet closed this as completed Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.2 Issues which were filed before re-arch to 0.4 code-execution execute generated code
Projects
None yet
Development

No branches or pull requests

6 participants