support ask_review by llm #959

orange-crow · 2024-03-05T01:57:55Z

add ask_review by llm, and adjust the generated code to fit the jupyter notebook.

Features

add 3 review_type : ["human", "llm", "confirm_all"];

Result

If the generated code is executed successfully, but the code does not match the task requirements, it can be detected through llm's ask review, and it will make modification suggestions. The example is as follows,

@pytest.mark.asyncio
async def test_ask_review_llm():
    context = [
        Message("Train a model to predict wine class using the training set."),
        Message(
               """
               from sklearn.datasets import load_wine
               wine_data = load_wine()
               plt.hist(wine_data.target, bins=len(wine_data.target_names))
               plt.xlabel('Class')
               plt.ylabel('Number of Samples')
               plt.title('Distribution of Wine Classes')
               plt.xticks(range(len(wine_data.target_names)), wine_data.target_names)
               plt.show()
               """
        ),
    ]
    rsp, confirmed = await AskReview().run(context, review_type="llm")
    assert rsp.startswith(("redo", "change"))   # -> True
    assert not confirmed                        # -> True
    pirnt(rsp)
    # ```
    # redo the task, the provided code only includes data loading and visualization, but does not include any steps related 
    # to training a model.
    # ```

codecov-commenter · 2024-03-05T02:10:17Z

Codecov Report

Attention: Patch coverage is 92.30769% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 82.73%. Comparing base (0271cd7) to head (5ff3cd1).
Report is 17 commits behind head on code_interpreter.

Files	Patch %	Lines
metagpt/actions/di/ask_review.py	91.30%	2 Missing ⚠️
metagpt/roles/di/data_interpreter.py	75.00%	1 Missing ⚠️
metagpt/strategy/planner.py	93.33%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@                 Coverage Diff                  @@
##           code_interpreter     #959      +/-   ##
====================================================
+ Coverage             82.70%   82.73%   +0.03%     
====================================================
  Files                   223      223              
  Lines                 13129    13164      +35     
====================================================
+ Hits                  10858    10891      +33     
- Misses                 2271     2273       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

metagpt/actions/mi/ask_review.py

tests/metagpt/actions/mi/test_ask_review.py

garylin2099 · 2024-03-05T02:26:27Z

metagpt/strategy/planner.py

+            if confirmed and task_result and task_result.code:
+                review_msg = (
+                    "The code was executed successfully. Please review the code and execution results to evaluate"
+                    f"execution results are: {task_result.result}"
+                    "whether the task was completed."
+                )
+            elif not confirmed and task_result and task_result.code:
+                review_msg = (
+                    "The code execution failed. Please reflect on the reason for the failure based on the above content"
+                    f"execution results are: {task_result.result}"
+                    "and try another method to generate new code."
+                )
+            else:
+                review_msg = "Pleas review above content."


建议全部写进一个prompt，没必要在代码上做条件分叉。从上下文，llm应该能知道前面的代码是成功了还是失败了

garylin2099 · 2024-03-05T02:32:26Z

metagpt/actions/mi/ask_review.py

+        context: list[Message] = [],
+        plan: Plan = None,
+        trigger: str = ReviewConst.TASK_REVIEW_TRIGGER,
+        review_type: Literal["human", "llm", "confirm_all"] = "human",


confirm_all可以改为disabled。另外，这里多了一个入参，上层planner和interpreter却没看到有引入。可以去掉auto_run，使用review_type属性，并向下做透传

planner和interpreter中的auto_run也删掉吗？

garylin2099 · 2024-03-05T02:41:07Z

add ask_review by llm, and adjust the generated code to fit the jupyter notebook.

Features

add 3 review_type : ["human", "llm", "confirm_all"];
update DEFAULT_SYSTEM_MSG in BaseWriteAnalysisCode, Make it clear that the executor is jupyter notebook.

Result

If the generated code is executed successfully, but the code does not match the task requirements, it can be detected through llm's ask review, and it will make modification suggestions. The example is as follows,

@pytest.mark.asyncio
async def test_ask_review_llm():
    context = [
        Message("Train a model to predict wine class using the training set."),
        Message(
               """
               from sklearn.datasets import load_wine
               wine_data = load_wine()
               plt.hist(wine_data.target, bins=len(wine_data.target_names))
               plt.xlabel('Class')
               plt.ylabel('Number of Samples')
               plt.title('Distribution of Wine Classes')
               plt.xticks(range(len(wine_data.target_names)), wine_data.target_names)
               plt.show()
               """
        ),
    ]
    rsp, confirmed = await AskReview().run(context, review_type="llm")
    assert rsp.startswith(("redo", "change"))   # -> True
    assert not confirmed                        # -> True
    pirnt(rsp)
    # ```
    # change task current task, split the task into two separate tasks: 
    # 1. Train a model to predict wine class using the training set.
    # 2. Visualize the distribution of wine classes with a histogram.
    # ```

Prefer a more practical example. Review is useful when errors occur. Perhaps give an example showing how it handles errors, such as suggesting "redo" with feedback, or "change" with updated current task instruction?

…upyter notebook. - add 3 review_type : ["human", "llm", "confirm_all"]; - update DEFAULT_SYSTEM_MSG in BaseWriteAnalysisCode, Make it clear that the executor is jupyter notebook.

- Refine sys_msg as a class ReviewConst variable SYS_MSG - Use rsp.strip() instead of removing \n explicitly - Rename _rsp to llm_rsp for clarity - Correct indentation

- Remove `confirm_all` parameter and use `review_type` instead - Consolidate prompts into a single one for better context - Add tests for the changes This commit addresses the code review comments: 1. Removed the `confirm_all` parameter and introduced the `review_type` attribute which is propagated down from the planner and interpreter layers. The `auto_run` parameter has been removed as well. 2. Consolidated the prompts into a single one to provide better context to the LLM about the state of the previous code execution.

orange-crow had a problem deploying to unittest March 5, 2024 01:57 — with GitHub Actions Failure

orange-crow changed the title ~~support ask_review by llm, and adjust the generated code to fit the j…~~ support ask_review by llm, and adjust the generated code to fit the jupyter notebook. Mar 5, 2024

garylin2099 reviewed Mar 5, 2024

View reviewed changes

geekan assigned garylin2099 Mar 5, 2024

geekan added the enhancement New feature or request label Mar 5, 2024

orange-crow had a problem deploying to unittest March 6, 2024 02:54 — with GitHub Actions Failure

orange-crow had a problem deploying to unittest March 8, 2024 08:00 — with GitHub Actions Failure

orange-crow added 5 commits March 16, 2024 17:02

support ask_review by llm, and adjust the generated code to fit the j…

fbc0d97

…upyter notebook. - add 3 review_type : ["human", "llm", "confirm_all"]; - update DEFAULT_SYSTEM_MSG in BaseWriteAnalysisCode, Make it clear that the executor is jupyter notebook.

Refactor sys_msg usage and response handling

81944de

- Refine sys_msg as a class ReviewConst variable SYS_MSG - Use rsp.strip() instead of removing \n explicitly - Rename _rsp to llm_rsp for clarity - Correct indentation

chore.

1111c5f

refine: set default value of review_type as disabled.

30641fe

orange-crow force-pushed the feature_llm_review branch from f4f68b1 to 30641fe Compare March 16, 2024 14:15

orange-crow had a problem deploying to unittest March 16, 2024 14:15 — with GitHub Actions Failure

feat: add PLAN_REVIEW_INSTRUCTION.

5d5ddd3

orange-crow had a problem deploying to unittest March 16, 2024 16:55 — with GitHub Actions Failure

chore.

27c6ffc

orange-crow had a problem deploying to unittest March 16, 2024 16:56 — with GitHub Actions Failure

feature: review plan by llm.

33a97f9

orange-crow had a problem deploying to unittest March 18, 2024 04:31 — with GitHub Actions Failure

refine PLAN_REVIEW_INSTRUCTION.

4fe1e66

orange-crow had a problem deploying to unittest March 18, 2024 05:54 — with GitHub Actions Failure

refine PLAN_REVIEW_INSTRUCTION.

508b046

orange-crow had a problem deploying to unittest March 18, 2024 07:16 — with GitHub Actions Failure

orange-crow changed the title ~~support ask_review by llm, and adjust the generated code to fit the jupyter notebook.~~ support ask_review by llm Mar 19, 2024

refine: system message for review code, task.

e2d88cb

orange-crow had a problem deploying to unittest March 19, 2024 07:24 — with GitHub Actions Failure

chore.

5ff3cd1

orange-crow had a problem deploying to unittest March 19, 2024 07:29 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support ask_review by llm #959

support ask_review by llm #959

orange-crow commented Mar 5, 2024 •

edited

Loading

codecov-commenter commented Mar 5, 2024 •

edited

Loading

garylin2099 Mar 5, 2024

garylin2099 Mar 5, 2024

orange-crow Mar 6, 2024

garylin2099 Mar 7, 2024

garylin2099 commented Mar 5, 2024

support ask_review by llm #959

Are you sure you want to change the base?

support ask_review by llm #959

Conversation

orange-crow commented Mar 5, 2024 • edited Loading

codecov-commenter commented Mar 5, 2024 • edited Loading

Codecov Report

garylin2099 Mar 5, 2024

Choose a reason for hiding this comment

garylin2099 Mar 5, 2024

Choose a reason for hiding this comment

orange-crow Mar 6, 2024

Choose a reason for hiding this comment

garylin2099 Mar 7, 2024

Choose a reason for hiding this comment

garylin2099 commented Mar 5, 2024

orange-crow commented Mar 5, 2024 •

edited

Loading

codecov-commenter commented Mar 5, 2024 •

edited

Loading