-
Notifications
You must be signed in to change notification settings - Fork 425
Description
Problem Statement
To implement standalone input-side guardrails (for e.g. PII, toxic content, prompt attack prevention), we'd like to place a Hook as early as possible in the invocation. In particular, we want to make sure it runs (and has the opportunity to redact the user's input message) before the message gets added to memory by e.g. AgentCoreMemorySessionManager, which hooks on MessageAddedEvent
.
However, the current BeforeInvocationEvent
hook only receives a reference to the agent and has no visibility of the incoming messages
because they haven't been added to agent.messages
yet.
Proposed Solution
Extend BeforeInvocationEvent
to also include the messages
received during _run_loop()
start-up
- The simplest implementation of this would implicitly allow hook authors to edit the messages in-place, giving them the opportunity to redact the message but go ahead with the invocation. 👍
- A guardrail hook could also choose to just raise error and abort the whole agent invocation, which could save some money in cases where the manner of redaction leaves nothing useful the agent could do (see also [FEATURE] Add ability to bypass LLM invocation and provide custom responses in hooks #758)
In structured_output_async()
, this event is also currently raised but before agent.messages
and the optional prompt
argument have been combined to form the temp_messages
that'll ultimately be used in the invocation.
- AFAICT there's nothing stateful about the construction of temp_messages and its only failure modes seem to be for malformed inputs - so I'd suggest to move the invocation of the BeforeInvocationEvent hooks to straight after
temp_messages
is set up. - Specifically, I'd suggest to pull
temp_messages
and theif not self.messages and not prompt
guard clause forward out of the tracing span - treating them as input validation activities that don't count towards the span duration (negligible anyway) but also wouldn't triggerAfterInvocationEvent
in case they fail due to malformed input.
Use Case
As mentioned above, primary use-case here is for input guardrail checks to prevent PII / toxic / prompt-attack content from entering the agent as early as possible in the invocation lifecycle.
Alternatives Solutions
Today I think the next-earliest workaround is for an input guardrail to hook on to MessageAddedEvent
instead (since this'll get called as soon as the agent initializes its messages list, before BeforeModelCallEvent
)... But this is not ideal because MessageAdded is a typical place for session/memory managers (like AgentCoreMemorySessionManager) to hook - so relies on users to connect their guardrail and memory hooks in the right sequence to avoid leakage of sensitive/malicious input into memory. It should work, but is easy to mis-configure without realizing.
Additional Context
No response