-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support ask_review by llm #959
base: code_interpreter
Are you sure you want to change the base?
support ask_review by llm #959
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## code_interpreter #959 +/- ##
====================================================
+ Coverage 82.70% 82.73% +0.03%
====================================================
Files 223 223
Lines 13129 13164 +35
====================================================
+ Hits 10858 10891 +33
- Misses 2271 2273 +2 ☔ View full report in Codecov by Sentry. |
metagpt/strategy/planner.py
Outdated
if confirmed and task_result and task_result.code: | ||
review_msg = ( | ||
"The code was executed successfully. Please review the code and execution results to evaluate" | ||
f"execution results are: {task_result.result}" | ||
"whether the task was completed." | ||
) | ||
elif not confirmed and task_result and task_result.code: | ||
review_msg = ( | ||
"The code execution failed. Please reflect on the reason for the failure based on the above content" | ||
f"execution results are: {task_result.result}" | ||
"and try another method to generate new code." | ||
) | ||
else: | ||
review_msg = "Pleas review above content." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
建议全部写进一个prompt,没必要在代码上做条件分叉。从上下文,llm应该能知道前面的代码是成功了还是失败了
metagpt/actions/mi/ask_review.py
Outdated
context: list[Message] = [], | ||
plan: Plan = None, | ||
trigger: str = ReviewConst.TASK_REVIEW_TRIGGER, | ||
review_type: Literal["human", "llm", "confirm_all"] = "human", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirm_all可以改为disabled。另外,这里多了一个入参,上层planner和interpreter却没看到有引入。可以去掉auto_run,使用review_type属性,并向下做透传
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
planner和interpreter中的auto_run也删掉吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对的
Prefer a more practical example. Review is useful when errors occur. Perhaps give an example showing how it handles errors, such as suggesting "redo" with feedback, or "change" with updated current task instruction? |
…upyter notebook. - add 3 review_type : ["human", "llm", "confirm_all"]; - update DEFAULT_SYSTEM_MSG in BaseWriteAnalysisCode, Make it clear that the executor is jupyter notebook.
- Refine sys_msg as a class ReviewConst variable SYS_MSG - Use rsp.strip() instead of removing \n explicitly - Rename _rsp to llm_rsp for clarity - Correct indentation
- Remove `confirm_all` parameter and use `review_type` instead - Consolidate prompts into a single one for better context - Add tests for the changes This commit addresses the code review comments: 1. Removed the `confirm_all` parameter and introduced the `review_type` attribute which is propagated down from the planner and interpreter layers. The `auto_run` parameter has been removed as well. 2. Consolidated the prompts into a single one to provide better context to the LLM about the state of the previous code execution.
f4f68b1
to
30641fe
Compare
add ask_review by llm, and adjust the generated code to fit the jupyter notebook.
Features
Result
If the generated code is executed successfully, but the code does not match the task requirements, it can be detected through
llm's ask review
, and it will make modification suggestions. The example is as follows,