-
Notifications
You must be signed in to change notification settings - Fork 362
Development and Testing of REST MVC‐enabled Agent
This document explains how to integrate Embabel agents with Spring MVC to create Human-in-the-Loop (HITL) workflows. The pattern allows agents to pause execution, wait for user input via HTTP REST APIs, and then resume processing with the user's response.
https://github.com/embabel/embabel-agent/blob/main/embabel-agent-api/src/test/kotlin/com/embabel/agent/api/hitl/WaitForMvcIntegrationTest.kt
waitFor() is a function that interrupts agent execution to request user input. When an agent action calls waitFor(), it:
-
Throws an exception (
AwaitableResponseException) instead of returning normally - The exception carries an Awaitable object containing the question/form for the user
- The agent framework catches this exception and transitions the process to
WAITINGstate - The Awaitable is stored in the blackboard for later retrieval
- Execution stops - the agent literally pauses mid-action
A waitFor() call looks like it returns a value:
val userChoice: UserChoice = waitFor(choiceAwaitable)But waitFor() always throws an exception and never actually returns! The return type serves a different purpose:
- It's a type contract telling the framework what type to expect in the blackboard after the user responds
- It enables action chaining - the next action can receive this type as a parameter
- It allows the planner to formulate a plan - it can see that this action "produces" UserChoice
Think of it like this: The action promises to eventually produce a UserChoice, but not right now - only after the user responds.
Agents work by chaining actions where Action A's output becomes Action B's input parameter:
Action 1: getChoice()
- Declares return type: UserChoice
- Actually: throws exception, enters WAITING
- Framework records: "This action produces UserChoice"
[User responds via HTTP]
Action 2: processChoice(choice: UserChoice)
- Needs UserChoice as input
- Framework injects UserChoice from blackboard
- Executes and produces final result
The framework's planner examines available actions and creates a plan:
- "getChoice produces UserChoice"
- "processChoice needs UserChoice"
- "So I'll execute getChoice, then processChoice"
Without the correct return type, the planner cannot formulate a plan and the agent gets STUCK.
An Awaitable has two key responsibilities:
1. Payload (what the next action needs)
- The awaitable's generic payload type MUST match what the next action expects
- This is NOT what you show the user, but what you'll put in the blackboard after they respond
- Example:
AbstractAwaitable<UserChoice, UserChoiceResponse>- payload is UserChoice
2. onResponse (transformation logic)
- When the user responds via HTTP,
onResponse()is called - It transforms the HTTP response into a domain object
- It adds that domain object to the blackboard
- This is what makes the "return value" of waitFor magically appear
Think of the Awaitable as a suspended function call - it captures what needs to happen when the answer arrives.
Every action should accept ActionContext as a parameter:
fun getChoice(input: UserInput, context: ActionContext): UserChoiceThe framework uses ActionContext to:
- Provide access to the current process state
- Allow actions to set values for later use
- Enable logging and debugging
- Pass configuration and services
Without ActionContext, the action cannot properly integrate with the framework.
The REST controller manages the lifecycle of waiting agent processes. It has two main responsibilities:
What it does:
- Creates an agent process - instantiates the agent, creates a blackboard, sets up initial state
-
Runs the agent - calls
agentProcess.run()which executes actions until reaching a terminal state - Detects the state - checks if the process is WAITING, COMPLETED, FAILED, or STUCK
- Saves the process - persists to repository so it can be resumed later
-
Returns appropriate response:
- If WAITING: extracts the Awaitable from blackboard, returns the question/choices to user
- If COMPLETED: extracts the final result from blackboard, returns it to user
Why it might enter WAITING:
When the first action calls waitFor(), the framework catches the exception and sets status to WAITING instead of continuing to the next action.
Why it might complete immediately:
If the agent doesn't call waitFor() in the first action, it continues executing all actions and completes.
What it does:
- Loads the waiting process - retrieves from repository using processId
- Validates it's waiting - confirms status is WAITING (not already completed)
- Finds the awaitable - retrieves the specific Awaitable from the blackboard
- Calls onResponse - transforms user's HTTP input into domain object, adds to blackboard
-
Resumes execution - calls
agentProcess.run()again to continue from where it paused - Detects the new state - checks if still WAITING or now COMPLETED
- Saves the updated process - persists the new state
-
Returns appropriate response:
- If WAITING again: there's another
waitFor()call, return the next question - If COMPLETED: return the final result
- If WAITING again: there's another
Why it might enter WAITING again:
The agent can have multiple waitFor() calls in sequence (e.g., ask for username, then password). Each requires a separate HTTP call.
The critical detail:
The onResponse() call is what makes the magic happen - it puts the user's choice into the blackboard with the exact type the next action expects, so the framework can inject it as a parameter.
Both endpoints return ResponseEntity<Any> because they can return different types:
-
Awaitable response when status = WAITING
- Contains: processId, awaitableId, prompt, options
- Tells user: "Here are your choices, call /continue with your answer"
-
Final result when status = COMPLETED
- Contains: the actual result (AdventureResult, LoginResult, etc.)
- Tells user: "Done! Here's your answer"
This flexibility enables multi-step workflows where you don't know in advance how many user interactions are needed.
The controller uses InMemoryAgentProcessRepository to:
- Persist agent state between HTTP calls (otherwise, state is lost)
- Enable resumption - load the exact same process instance with all its history
- Support multiple concurrent processes - each user can have their own process ID
In production, you'd use a real database (PostgreSQL, MongoDB, etc.) instead of in-memory storage.
The MockMVC integration test validates the complete workflow end-to-end.
The test creates a minimal Spring Boot application to avoid scanning unnecessary components:
-
WaitForTestApplication- minimal @SpringBootApplication that only scans the test package -
WaitForTestConfig- providesInMemoryAgentProcessRepositoryas a bean - Avoids
AgentApiTestApplicationwhich scans too much and causes dependency issues
This approach ensures:
- Fast test startup (no unnecessary beans)
- No mocking needed (except MockMvc itself)
- Real component integration (controller, repository, agent)
What it validates:
Phase 1 - Starting:
- POST /start with playerName
- Expects HTTP 200 with awaitable response (processId, awaitableId, prompt, options)
- Verifies process exists in repository with status = WAITING
- Verifies awaitable exists in blackboard
Phase 2 - Continuing: 5. POST /continue with processId and user's choice 6. Expects HTTP 200 with final result 7. Verifies process status changed to COMPLETED 8. Verifies final result exists in blackboard with correct value
What it proves:
- Agent successfully pauses at waitFor()
- State persists between HTTP calls
- User input correctly flows through onResponse → blackboard → next action
- Agent resumes and completes successfully
- Action chaining works (getChoice output → processChoice input)
What it validates:
Same flow as Test 1, but with a different choice ("Forest" instead of "Castle").
Why it matters:
- Proves the system isn't hardcoded to one specific choice
- Validates that user input truly drives the outcome
- Ensures onResponse correctly handles different values
What it validates:
- POST /continue with non-existent processId
- Expects HTTP 404 with error message
Why it matters:
- Validates error handling for invalid process IDs
- Ensures controller fails gracefully (no crashes)
- Proves repository lookup works correctly
The test includes extensive logging at key points:
- Controller entry points - when HTTP requests arrive
- Agent action start/complete - when actions execute
- Status transitions - when process enters WAITING or COMPLETED
- onResponse calls - when user input is processed
This logging serves two purposes:
- During development - see exactly what's happening step by step
- During debugging - diagnose issues when tests fail
Example log sequence for successful flow:
=== POST /adventure/start - player: Player1 ===
=== getChoice action starting for player: Player1 ===
Action method entering wait state: Awaitable response exception
=== Agent process status after run: WAITING ===
=== POST /adventure/{id}/continue - choice: Castle ===
=== onResponse called, resuming agent process ===
=== processChoice action starting with choice: Castle ===
=== processChoice action completed with result: You chose: Castle ===
=== Agent process status after resume: COMPLETED ===
This trace shows the complete journey from start to finish.
The awaitable's payload type determines what the planner thinks the action produces:
-
AbstractAwaitable<ChoiceRequest, ...>→ planner thinks action produces ChoiceRequest -
AbstractAwaitable<UserChoice, ...>→ planner thinks action produces UserChoice
If the next action expects UserChoice but the awaitable payload is ChoiceRequest, the planner cannot connect them and the agent gets STUCK.
The solution: Always make the payload type match what the next action's parameter expects.
Understanding how data flows through the system:
- User submits choice via HTTP → controller receives JSON
- Controller creates UserChoiceResponse → contains awaitableId + choice
- Controller calls awaitable.onResponse() → receives UserChoiceResponse
- onResponse creates UserChoice → domain object with just the choice value
-
onResponse adds to blackboard →
blackboard.addObject(UserChoice("Castle")) - Agent resumes → planner looks for next action
- Framework injects parameter → finds UserChoice in blackboard, passes to processChoice
- processChoice executes → receives UserChoice("Castle"), produces result
Each step transforms the data:
- HTTP JSON → UserChoiceResponse (REST layer)
- UserChoiceResponse → UserChoice (domain layer)
- UserChoice → AdventureResult (business logic layer)
This separation of concerns keeps the agent code clean and focused on business logic.
Think of the agent as having a "pause button":
- Agent executes action getChoice
- Hits waitFor() - PAUSE ⏸️
- State saved, HTTP returns to user
- (Time passes... seconds, minutes, hours)
- User responds via HTTP
- State loaded from repository
- onResponse injects user's answer
- Agent resumes - PLAY
▶️ - Continues with processChoice
The agent doesn't "remember" it was paused - from its perspective, waitFor() simply returned the UserChoice and execution continued normally. The framework handles all the complexity of pausing, saving, loading, and resuming.
The WaitFor pattern with Spring MVC enables interactive, conversational agents by:
- Agents pause execution when calling waitFor(), entering WAITING state
- Controllers manage lifecycle - start processes, save state, resume on user input
- Awaitables bridge HTTP and domain - transform user responses into domain objects
- Action chaining connects steps - output of one action becomes input of next
- Tests validate end-to-end - prove the complete flow works with real components
This pattern unlocks powerful use cases:
- Multi-step forms and wizards
- Approval workflows
- Interactive troubleshooting
- Conversational interfaces
- Human-in-the-loop AI systems
The key innovation is treating user input as just another data source - the agent doesn't care if data comes from a database, API, or human. It just waits for the framework to provide it.
(c) Embabel Software Inc 2024-2025.