Skip to content

improve session resilience#1612

Open
joshadambell wants to merge 2 commits intokagent-dev:mainfrom
joshadambell:feat/session-resilience
Open

improve session resilience#1612
joshadambell wants to merge 2 commits intokagent-dev:mainfrom
joshadambell:feat/session-resilience

Conversation

@joshadambell
Copy link
Copy Markdown
Contributor

@joshadambell joshadambell commented Apr 1, 2026

Summary

Adds session resilience: automatic session recovery on backend and stream reconnection on frontend for seamless experience when sessions are deleted or pages reload during active tasks.

  • Backend session recovery: Automatically recreates sessions on 404 errors during event append, with in-flight task detection
  • Frontend stream resumption: Reconnects to in-flight tasks (working/submitted state) when page is reloaded
  • Bug fix: Session events now properly populated on load (events=events instead of events=[])

What Changed

Backend (python/packages/kagent-adk)

  • _session_service.py: New _recreate_session() method handles 404 during append_event() — recreates the session preserving ID/user/agent, handles 409 conflicts from concurrent recreation, fetches and logs in-flight tasks after recovery. Single retry prevents infinite recursion.
  • _session_service.py: Bug fix — get_session() now returns events=events instead of empty list
  • test_session_service.py: 9 new tests covering append 404 recovery, session recreation, conflict handling, and in-flight task detection

Frontend (ui/src)

  • a2aClient.ts: New resubscribeTask() method sends tasks/resubscribe JSON-RPC to resume SSE streams
  • messageHandlers.ts: extractMessagesFromTasks() now returns TaskExtractionResult with optional pendingTask for in-flight task detection
  • ChatInterface.tsx: On page load, detects pending tasks and auto-resubscribes to their streams. Properly cleans up abort controller on unmount. HITL approval takes priority over resubscribe when both apply.
  • AgentCallDisplay.tsx: Updated for new extractMessagesFromTasks return type

Tests

  • a2aClient.test.ts: New test suite for resubscribeTask() and existing client methods
  • messageHandlers.test.ts: New tests for pending task detection (working, submitted, completed states)

How It Works

  1. Session deleted during active task → Backend 404 handler recreates session, logs in-flight tasks
  2. Page reload during active task → Frontend detects pending task via extractMessagesFromTasks
  3. Frontend calls resubscribeTask() → Reconnects to SSE stream for the in-flight task
  4. Stream resumes → User sees continued updates without interruption
  5. If HITL approval is pending → Approval takes priority over resubscribe (gated by !hasPendingApproval)

@joshadambell joshadambell force-pushed the feat/session-resilience branch 3 times, most recently from 314094b to 14683da Compare April 1, 2026 23:48
Signed-off-by: Josh Bell <joshadambell@me.com>
Signed-off-by: jobell <jobell@ancestry.com>
@joshadambell joshadambell force-pushed the feat/session-resilience branch from 14683da to fb3d207 Compare April 1, 2026 23:51
@joshadambell joshadambell marked this pull request as ready for review April 2, 2026 23:39
Copilot AI review requested due to automatic review settings April 2, 2026 23:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements session resilience features to automatically recover from deleted sessions and reconnect to in-flight tasks after page reloads.

Changes:

  • Backend session recovery: Added _recreate_session() method in _session_service.py that recreates deleted sessions and detects in-flight tasks, with 404 handling in append_event() to trigger recreation before retrying
  • Frontend stream reconnection: Implemented resubscribeTask() method and enhanced extractMessagesFromTasks() to detect pending tasks (working/submitted states), with auto-resubscription on page load in ChatInterface
  • Bug fix: Corrected get_session() to populate events (changed events=[] to events=events)

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
python/packages/kagent-adk/src/kagent/adk/_session_service.py Added session recreation logic and 404 handling; fixed events population bug
python/packages/kagent-adk/tests/unittests/test_session_service.py Complete test rewrite with comprehensive coverage for append_event recovery and session recreation
ui/src/lib/messageHandlers.ts Enhanced to return TaskExtractionResult with pending task detection
ui/src/lib/a2aClient.ts Added resubscribeTask() method and updated request type parameters
ui/src/lib/__tests__/messageHandlers.test.ts Updated tests for new return type and added pending task detection tests
ui/src/lib/__tests__/a2aClient.test.ts New comprehensive test suite for A2A client methods
ui/src/components/chat/ChatInterface.tsx Added pending task detection and auto-resubscription logic with proper cleanup
ui/src/components/chat/AgentCallDisplay.tsx Updated to use new extractMessagesFromTasks return type

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

id=session_data["id"],
user_id=session_data["user_id"],
events=[],
events=events,
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug fix in get_session() changing events=[] to events=events lacks test coverage. While new tests were added for the append_event recovery logic, the tests that previously validated event loading in get_session were removed entirely. There should be at least one test ensuring that events returned from the API are properly populated in the session object.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +10
export interface TaskExtractionResult {
messages: Message[];
pendingTask?: { taskId: string; state: string };
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pendingTask return type uses state: string which is less precise than needed. Since the state is guaranteed to be either 'working' or 'submitted' (from the conditions on lines 22-23), consider typing this as { taskId: string; state: 'working' | 'submitted' } or { taskId: string; state: TaskState } for better type safety. This would eliminate the need for the as TaskState cast on line 162 of ChatInterface.tsx.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants