Improving Resilience of MRKL Agent #3269

svdeepak99 · 2023-04-21T04:16:18Z

Finally Solved the ValueError(f"Could not parse LLM output: `{llm_output}`") error, whenever llm (especially gpt-3.5-turbo) does not follow the format of MRKL Agent, while returning "Action:" & "Action Input:".

Note: If this pull request gets approved, I can then apply this feature to react, self_ask_with_search & conversational agents too.

The way I am solving this error is by responding back to the llm with the messages "Invalid Format: Missing 'Action:' after 'Thought:'" & "Invalid Format: Missing 'Action Input:' after 'Action:'" whenever Action: and Action Input: are not present in the llm output respectively.

The following are 2 errors that kept coming from the Pandas Dataframe Agent & VectorStore Agent respectively (both of them use the MRKL agent):

Error-1: (pandas dataframe agent - error message at the end)

df_agent.run("How many policies do Aaron Pope have, contain 'Homeowners'?")

> Entering new AgentExecutor chain...
Thought: We need to filter the dataframe to only include policies for Aaron Pope that contain 'Homeowners', and then count the number of rows.
Action: python_repl_ast
Action Input:
```
aaron_pope_df = df[(df['Insured Name'] == 'Aaron Pope') & (df['Policy Type'].str.contains('Homeowners'))]
len(aaron_pope_df)
```
Observation: KeyError: 'Insured Name'
Thought:The error message indicates that the column 'Insured Name' does not exist in the dataframe. I need to check the column names to make sure I am using the correct ones.
Action: python_repl_ast
Action Input: `print(df.columns)`
Observation: Index(['Account ID', 'Account Name', 'Policy ID', 'Policy Name',
       'Description of Risk', 'Status'],
      dtype='object')
Thought:The column name for the insured name is actually 'Policy Name'. I need to update my filter.
Action: python_repl_ast
Action Input:
```
aaron_pope_df = df[(df['Policy Name'] == 'Aaron Pope') & (df['Policy Type'].str.contains('Homeowners'))]
len(aaron_pope_df)
```
Observation: Traceback (most recent call last): KeyError: 'Policy Type'
Thought:The error message indicates that the column 'Policy Type' does not exist in the dataframe. I need to check the column names to make sure I am using the correct ones.
Action: python_repl_ast
Action Input: `print(df.columns)`
Observation: Index(['Account ID', 'Account Name', 'Policy ID', 'Policy Name',
       'Description of Risk', 'Status'],
      dtype='object')
Thought:The column name for the policy type is actually 'Description of Risk'. I need to update my filter.
Action: python_repl_ast
Action Input:
```
aaron_pope_df = df[(df['Policy Name'] == 'Aaron Pope') & (df['Description of Risk'].str.contains('Homeowners'))]
len(aaron_pope_df)
```
Observation: 0
Thought:Traceback (most recent call last):
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\IPython\core\interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-9ff9c120b0a9>", line 1, in <module>
    df_agent.run("How many policies do Aaron Pope have, contain 'Homeowners'?")
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\chains\base.py", line 213, in run
    return self(args[0])[self.output_keys[0]]
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\chains\base.py", line 116, in __call__
    raise e
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\chains\base.py", line 113, in __call__
    outputs = self._call(inputs)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\agent.py", line 812, in _call
    next_step_output = self._take_next_step(
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\agent.py", line 692, in _take_next_step
    output = self.agent.plan(intermediate_steps, **inputs)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\agent.py", line 403, in plan
    action = self._get_next_action(full_inputs)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\agent.py", line 365, in _get_next_action
    parsed_output = self._extract_tool_and_input(full_output)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\mrkl\base.py", line 140, in _extract_tool_and_input
    return get_action_and_input(text)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\mrkl\base.py", line 48, in get_action_and_input
    raise ValueError(f"Could not parse LLM output: `{llm_output}`")
ValueError: Could not parse LLM output: `There are no policies for Aaron Pope that contain 'Homeowners'.`

Error-2: (VectorStore agent - error message at the end)

VectorStore_agent.run("Do condo units have pools? (as mentioned in the insurance docs)")

> Entering new AgentExecutor chain...

This question seems to be related to insurance coverage for condo units. I should use the insurance_quotes_faq tool to find the answer.
Action: insurance_quotes_faq
Action Input: "Are condo unit pools covered under insurance policies?"
This question seems to be related to insurance coverage for condo units. I should use the insurance_quotes_faq tool to find the answer.
Observation:  No, condo unit pools are not typically covered under insurance policies.
Thought:
Traceback (most recent call last):
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\IPython\core\interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-92714a8d910d>", line 1, in <module>
    vs_agent.run("Do condo units have pools? (as mentioned in the insurance docs)")
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\chains\base.py", line 213, in run
    return self(args[0])[self.output_keys[0]]
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\chains\base.py", line 116, in __call__
    raise e
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\chains\base.py", line 113, in __call__
    outputs = self._call(inputs)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\agent.py", line 812, in _call
    next_step_output = self._take_next_step(
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\agent.py", line 692, in _take_next_step
    output = self.agent.plan(intermediate_steps, **inputs)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\agent.py", line 403, in plan
    action = self._get_next_action(full_inputs)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\agent.py", line 365, in _get_next_action
    parsed_output = self._extract_tool_and_input(full_output)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\mrkl\base.py", line 140, in _extract_tool_and_input
    return get_action_and_input(text)
  File "D:\Deepak\mambaforge\envs\torch_1_13\lib\site-packages\langchain\agents\mrkl\base.py", line 48, in get_action_and_input
    raise ValueError(f"Could not parse LLM output: `{llm_output}`")
ValueError: Could not parse LLM output: `The answer was straightforward and did not require any sources.`

Successful run after making this pull request change: (pandas dataframe agent - after modifying mrkl agent):

df_agent.run("How many policies that Aaron Pope have, contain 'Homeowners'?")

> Entering new AgentExecutor chain...
Thought: We need to filter the dataframe to only include policies for Aaron Pope and then count how many of those policies contain 'Homeowners'.
Action: 
python_repl_ast
```
df[df['Account Name'] == 'Aaron Pope']['Policy Type'].str.contains('Homeowners').sum()
```
I need to provide the input for the action.
Action: python_repl_ast
Action Input: df[df['Account Name'] == 'Aaron Pope']['Policy Type'].str.contains('Homeowners').sum()
Observation: KeyError: 'Policy Type'
Thought: The column name 'Policy Type' does not exist in the dataframe. I need to check the column names to see what the correct name is.
Action: python_repl_ast
Action Input: df.columns
Observation: Index(['Account ID', 'Account Name', 'Policy ID', 'Policy Name',
       'Description of Risk', 'Status'],
      dtype='object')
Thought: The correct column name for policy type is 'Policy Name'. I need to update the input for the action.
Action: python_repl_ast
Action Input: df[df['Account Name'] == 'Aaron Pope']['Policy Name'].str.contains('Homeowners').sum()
Observation: 4
Thought: The final answer is that Aaron Pope has 4 policies that contain 'Homeowners'. 
Final Answer: 4
> Finished chain.
Out[4]: '4'

I also ran the callback function and logged the final prompt sent into the llm model (gpt-3.5-turbo) + output:

You are working with a pandas dataframe in Python. The name of the dataframe is `df`.
You should use the tools below to answer the question posed of you:
python_repl_ast: A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [python_repl_ast]
Action Input: the input to the action
Observation: the result of the action
 (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
This is the result of `print(df.head())`:
        Account ID  ...     Status
0  0015x00002GS9tM  ...  Cancelled
1  0015x00002GSF8u  ...     Active
2  0015x00002GSF8u  ...    Expired
3  0015x00002GSF8u  ...     Active
4  0015x000025lH0H  ...  Cancelled
[5 rows x 6 columns]
Begin!
Question: How many policies that Aaron Pope have, contain 'Homeowners'?
Thought: We need to filter the dataframe to only include policies for Aaron Pope and then count how many of those policies contain 'Homeowners'.
Action: 
python_repl_ast
```
df[df['Account Name'] == 'Aaron Pope']['Policy Type'].str.contains('Homeowners').sum()
```
Observation: Invalid Format: Missing 'Action Input:' after 'Action:'
Thought:I need to provide the input for the action.
Action: python_repl_ast
Action Input: df[df['Account Name'] == 'Aaron Pope']['Policy Type'].str.contains('Homeowners').sum()
Observation: KeyError: 'Policy Type'
Thought:The column name 'Policy Type' does not exist in the dataframe. I need to check the column names to see what the correct name is.
Action: python_repl_ast
Action Input: df.columns
Observation: Index(['Account ID', 'Account Name', 'Policy ID', 'Policy Name',
       'Description of Risk', 'Status'],
      dtype='object')
Thought:The correct column name for policy type is 'Policy Name'. I need to update the input for the action.
Action: python_repl_ast
Action Input: df[df['Account Name'] == 'Aaron Pope']['Policy Name'].str.contains('Homeowners').sum()
Observation: 4
Thought:The final answer is that Aaron Pope has 4 policies that contain 'Homeowners'. 
Final Answer: 4

As you can see from the prompt log above, the model made a mistake of outputting the code without the Action Input: keyword. But after sending the error message Invalid Format: Missing 'Action Input:' after 'Action:' as observation to the llm, it self-corrected the output format in it's next response, allowing the Agent to progress towards finding the final answer without errors.

Let me know your thoughts, and I can apply this feature to the other 3 agents as well, if this pull request gets approved.

Updated formatting by running the "poetry run black ." command.

Solved the `langchain\agents\agent.py:703: error: Incompatible types in assignment (expression has type "Union[str, Dict[Any, Any]]", variable has type "str") [assignment]` error that was raised when running the `poetry run mypy .` command.

Updated test_bad_action_input_line() and test_bad_action_line() to expecting a self-correction prompt, instead of raising an exception. In the commit (langchain-ai@b48bb11) , I added ability to the MRKL agent to communicate back to the llm if the "Action: & Action Input:" format is not followed, and get it self corrected. This is very effective in terms of the number of calls since, an additional call to llm is made only if the format is not followed, which would otherwise raise OutputParserException.

svdeepak99 · 2023-04-23T02:35:43Z

The code is ready to be merged now.

I made 3 new commits, ensuring that all tests & linting are passed (followed all instructions in the contributing guidelines). The last 3 commits included the following changes:

(i) Corrected a lint formatting issue in the mrkl/output_parser.py file.
(ii) Corrected a minor typecasting issue raised by poetry run mypy . command in the agents/agent.py file.
(iii) Updated test_bad_action_input_line() and test_bad_action_line() cases from the test_mrkl.py file, to expect a self-correction prompt, instead of raising an exception.

As a Summary: In this pull request, I added the ability for the MRKL agent to communicate back to LLM if the "Action: & Action Input:" format is not followed, and get it self-corrected. This is very effective in terms of the number of calls since, an additional call to LLM is made only if the format is not followed, which would otherwise raise the OutputParserException.

Davidkloving · 2023-04-24T20:29:23Z

This is a badly-needed improvement!

hwchase17

would a more general solution be to try/catch on output parsing errors

eg

try:
   output = self.agent.plan(intermediate_steps, **inputs)
except OutputParserException as e:
    if self.catch_errors:
         output = AgentAction("Error parsing", ....)
    else:
        raise e

svdeepak99 · 2023-04-28T02:55:55Z

@hwchase17 I could do that. But the only issue with that approach is, the "AgentAction" carries the 'Action_Input' and 'log' (which is a total 2 parameters).

In Action_Input - I would be passing the error correction message such as Invalid Format: Missing 'Action Input:' after 'Action:'
In log - We have to pass the raw output from the llm, in order to attach it in the subsequent prompt history (for future llm calls).

So that would require me to pass a dictionary or tuple as raise OutputParserException(tuple or dictionary here) in order to catch those 2 parameters in the code block you provided (method: https://stackoverflow.com/questions/63366430/pass-a-dictionary-in-try-except-clause). If I do that, then I will also have to modify the default OutputParserException(f"Could not parse LLM output: `{text}`") string Exception present in all the agents, to dict or tuple.

Do you want me to do that, or do you think of a more effective way to do it? (to pass both parameters during a parsing error)

Note: This wouldn't a problem if AgentAction required only either of action_input or log. But since it needs both parameters for the functioning of the code, the problem occurs with the template you provided.

svdeepak99 · 2023-04-28T05:21:46Z

@hwchase17
Update: Since @Davidkloving and others have pointed out this as a needed feature right now, I have solved the minor lint error, and pushed it again, and is ready to merge now (In case, lint error still persists please help me fix it - as I fixed every error that was raised locally). And about my previous comment regarding your suggestion, I will try to implement it based on your feedback on another pull request.

Davidkloving · 2023-05-01T14:47:26Z

I actually worked around this issue by subclassing MRKLOutputParser and overriding parse to still return something when OutputParserException is raised. It seems that the LLMs I'm using are robust to this, and will adapt and continue with reasonable behavior.

svdeepak99 · 2023-05-03T02:06:51Z

@hwchase17, is there anything blocking this merge?

@hwchase17

This is a highly optimized update to the pull request #3269 Summary: 1) Added ability to MRKL agent to self solve the ValueError(f"Could not parse LLM output: `{llm_output}`") error, whenever llm (especially gpt-3.5-turbo) does not follow the format of MRKL Agent, while returning "Action:" & "Action Input:". 2) The way I am solving this error is by responding back to the llm with the messages "Invalid Format: Missing 'Action:' after 'Thought:'" & "Invalid Format: Missing 'Action Input:' after 'Action:'" whenever Action: and Action Input: are not present in the llm output respectively. For a detailed explanation, look at the previous pull request. New Updates: 1) Since @hwchase17 , requested in the previous PR to communicate the self correction (error) message, using the OutputParserException, I have added new ability to the OutputParserException class to store the observation & previous llm_output in order to communicate it to the next Agent's prompt. This is done, without breaking/modifying any of the functionality OutputParserException previously performs (i.e. OutputParserException can be used in the same way as before, without passing any observation & previous llm_output too). --------- Co-authored-by: Deepak S V <svdeepak99@users.noreply.github.com>

svdeepak99 · 2023-05-23T03:23:02Z

Reopened this PR & merged in #5014

svdeepak99 added 4 commits April 20, 2023 20:14

Improving Resilience of MRKL Agent

b48bb11

lint formatting correction

9a8e8d9

Updated formatting by running the "poetry run black ." command.

lint typecast correction

f3eb0c7

Solved the `langchain\agents\agent.py:703: error: Incompatible types in assignment (expression has type "Union[str, Dict[Any, Any]]", variable has type "str") [assignment]` error that was raised when running the `poetry run mypy .` command.

svdeepak99 force-pushed the patch-2 branch from 535ed8e to bbd8518 Compare April 23, 2023 02:30

hwchase17 reviewed Apr 25, 2023

View reviewed changes

Fix lint errors attempt langchain-ai#2

1549ca7

svdeepak99 mentioned this pull request May 20, 2023

Improving Resilience of MRKL Agent #5014

Merged

svdeepak99 closed this May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Resilience of MRKL Agent #3269

Improving Resilience of MRKL Agent #3269

svdeepak99 commented Apr 21, 2023

svdeepak99 commented Apr 23, 2023

Davidkloving commented Apr 24, 2023

hwchase17 left a comment

svdeepak99 commented Apr 28, 2023

svdeepak99 commented Apr 28, 2023

Davidkloving commented May 1, 2023 •

edited

svdeepak99 commented May 3, 2023

svdeepak99 commented May 23, 2023

Improving Resilience of MRKL Agent #3269

Improving Resilience of MRKL Agent #3269

Conversation

svdeepak99 commented Apr 21, 2023

svdeepak99 commented Apr 23, 2023

Davidkloving commented Apr 24, 2023

hwchase17 left a comment

Choose a reason for hiding this comment

svdeepak99 commented Apr 28, 2023

svdeepak99 commented Apr 28, 2023

Davidkloving commented May 1, 2023 • edited

svdeepak99 commented May 3, 2023

svdeepak99 commented May 23, 2023

Davidkloving commented May 1, 2023 •

edited