-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typing of tool input in AgentAction is broken for StructuredTool, input error handling and StructuredChatOutputParser #2710
Comments
|
🤖 Thank you for your detailed analysis and suggestions on improving type safety and consistency in the LangChainJS framework. I agree with your points and here's how we can address them:
async call(
arg: z.input<T>,
configArg?: Callbacks | RunnableConfig,
/** @deprecated */
tags?: string[]
): Promise<string> {
let parsed;
try {
parsed = await this.schema.parseAsync(arg);
} catch (e) {
this.handleError(e);
}
// ...
}
class AgentAction {
toolInput: z.input<T>;
// ...
}
class AgentActionExecutor {
// ...
private handleParsingErrors(e: Error): void {
// Handle parsing errors here
}
private _call(action: AgentAction): void {
try {
// ...
} catch (e) {
this.handleParsingErrors(e);
}
}
}
class StructuredChatOutputParser {
// ...
parse(output: string): z.input<T> {
// Parse the output to JSON here
}
}Please note that these changes will require updating all places where these classes and their methods are used. Also, these changes will make the code more type safe, but they may not cover all edge cases. Therefore, thorough testing is recommended after implementing these changes. I hope this helps! If you have any further questions or need more clarification, please let me know. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
|
@dosu-bot Your recommendations are exactly what I already proposed. |
|
🤖 Thank you for your detailed proposal. Here are my suggestions on how to implement these changes without causing breaking changes:
Please note that these are just suggestions and might need to be adapted to your specific use case. Also, remember to thoroughly test all changes to ensure that they do not introduce any new bugs. Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
|
I'm also experiencing this issue. Any news on that? |
|
Hi, @RunOrVeith, I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, the issue you raised highlighted type inconsistency and parsing errors in the LangChainJS framework. Proposed improvements included modifying the Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, please let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days. Thank you! |
This issue requires a bit of a lengthy explanation, but the overall problem is:
TLDR:
The types of
StructuredTool,AgentAction, parsing error handling inAgentExecutorandStructuredChatOutputParserdon't fit together, and it only typechecks kind of by accident at the moment.See also the summary at the bottom.
Explanation
I am going to explain the issue with an example. Let's assume we have a
StructuredToolthat can click on an element on a web page:When you look at the signature and implementation of
StructuredTool.call, it seems like we already know what the input is, but in reality, the validation only happens inside of that function:In our case, our schema can not be string, so this simplifies to the type
{ selector: string }.The signature reads like we have already made some assumptions about the input, where in reality we are only at a signature that looks like this:
But even that has more issues:
z.output<T> extends string, something somewhere must have already decided that string is an acceptable input format for the tool, and we do not need to parse JSON. (A string by itself is not valid JSON). This actually does not happen anywhere, so the casez.output<T> extends stringcan never be true. Also,z.string()is not a child ofz.ZodObject, which is required by the generic inStructuredTool.This is where the
OutputParsercomes in. The part that we really care about is theparsemethod:This is where the string that the model produces is parsed to JSON, and errors are thrown if it is not valid JSON.
A non-json string will throw a syntax error if passed into
JSON.parse.From parsing, we get an
AgentAction(we can ignore AgentFinish for now) that looks like this:toolInputis typed asstring, even though we just usedJSON.parseto get it!Consider the positive case where the model produced output that matches our schema, let's say the string
"{\"selector\": \"myCoolButton\"}".Using
JSON.parse, this will deserialize to an object{ selector: "myCoolButton" }, and not astring.But because
JSON.parse's return type isany, the typescript compiler has no chance of realizing this.To understand why this is troublesome, we need to look into the execution loop where the
AgentActions are used to actually invoke the tool.This happens here in
AgentExecutor._call. I've split the relevant parts of the method into these two smaller methods and simplied a bit to show my point:This is where the parsing using the
OutputParsercomes in, and where its exceptions are handled.You can see that in the case of an error, the
toolInputfield will always be a string (ifthis.handleParsingErrorsis a function, the return type is alsostring).But we have just seen above, that in the non-error case
toolInputwill be parsed JSON!This is inconsistent behavior, we never parse the output of
handleParsingErrorsto JSON, so we are now in a state wheretoolInputis sometimes a string, and sometimes parsed json.The next step is to call the selected tool with the given input:
We only pass the previously computed output on to the tool in
tool.call(action.toolInput)! We do not actually have any guarantees for the input types to the tool!And in case this causes another error, we re-use the same function to handle parsing errors that will return a string that is supposed to be the tool output in the error case.
Summary
const stringSchema = z.string()const stringSchema = z.string()as the tool schema will not type check at all, since the generic argument ofStructuredToolisT extends z.ZodObject<any, any, any, any>, andtypeof stringSchemadoes not fulfill that constrainttool.callallows this to type check, since we don't know specifically which tool we have at the moment, so string and any json is potentially validStrucStep.actionturedTool._callis abstract, you will always get what the schema indicates, butStructuredTool.callwill fail, even if you have supplied a functionhandleParsingErrors.AgentAction.toolInput: string, which is not correctly typed, it is actually either string or JSON at the momentAgentStepss with wrongly typedAgentActions, since it is possible to request them as a return value of the overall loop usingreturnIntermediateSteps=true. Whatever the user now does with theAgentSteps is not type safe.When is that an actual issue?
We noticed that the model often times makes the same errors with the tool input format.
Recall our
ClickSchema, which is just{ selector: string }.In our clicking example it would either generate according to the schema, or
{ element: string }, or just a string which was the value we want, like"myCoolButton".So we built and auto-fixer for these common error cases.
The fixer basically just checks whether it can fix the input using either of the options above.
The earliest we can inject this code without overwriting a lot of the planning logic that LangChain provides is in
StructuredTool.call.We can not handle it using
handleParsingErrors, since that receives only the error as input, and not the causing text.Once you are overwriting
StructuredTool.call, you are relying on the signature of that function to be correct, which we just saw is not the case.It would also be great if the corrected tool input could be serialized in the intermediate steps, which we can only do through some hacks at the moment, because the steps are not part of the error handling process. Separate issue for this: #2711 .
At this point, you are stuck having to figure out all of the above to see why you are getting wrongly typed inputs to
calland in the resultingintermediateSteps.Improvement ideas
Unfortunately, anything that really fixes this is a breaking change. Nonetheless, this is what I would propose:
StructuredTool.call, so that the option of this being a string at all is gone and we only get json:StructuredTool.callinto its own method that can be overwritten in specific implementations ofStructuredToolAgentAction.toolInputto json (probably problematic with the non-structured agents, I have not looked into those. Could be solved by making thetoolInputtype generic), or keep it as string but use explicitJSON.stringifywhen creating anAgentAction.handleParsingErrorstype needs to match the type oftoolInput. I want to be able to provide parsed JSON as a solution to a parsing error, so either add aJSON.parsearound that or change the type of the callback and string case for that argument to Json.AgentExecutor._callinto their own methods for easier customization and improved readability ( see also my other issue Feature suggestion: Improve customizability of handling tool input parsing error handling #2711 for some improvement ideas here).I would be willing to contribute here if we can find a good solution.
I have not looked into the python code for this, but assume it has the same problem.
The text was updated successfully, but these errors were encountered: