Skip to content

Redis-backed session service and orchestration plugin (state tracking, abort, crash recovery) #5048

@h-network

Description

@h-network

Is your feature request related to a specific problem?

ADK currently has no Redis-backed session service and lacks runtime orchestration primitives for production agent deployments. Specifically:

  1. No Redis session backend, existing options are InMemory (lost on restart), SQLite (single-node), Database (heavy), and VertexAI (vendor-locked). Redis is the standard for distributed, low-latency session state but is missing. (Add support for additional Memory Bank services: DatabaseMemoryService/RedisMemoryService #2524)
  2. No external abort/kill mechanism — there is no way to stop a running agent mid-execution from outside. Users have been requesting this since Feature Request: Add an Endpoint to Explicitly Stop/Terminate a Conversation #1621 and again in "Stop generating" — ability to stop run_async() from outside the agent #4796. In production, when an agent goes off-rails, you need an immediate kill switch, not a graceful timeout.
  3. No crash recovery, if a process dies mid-task, the task state is lost. There's no mechanism to detect orphaned tasks on restart and recover or fail them cleanly.
  4. No task lifecycle state tracking, no built-in way to track whether a task is running, completed, failed, timed out, or aborted.

Describe the Solution You'd Like

A RedisSessionService implementing BaseSessionService and a RedisOrchestrationPlugin extending BasePlugin that provides:

  • Redis session service: create_session, get_session, list_sessions, delete_session, append_event backed by Redis with configurable TTL
  • Task state machine: running → completed / failed / timed_out / aborted, tracked in Redis with Pub/Sub notifications
  • External abort: publish to a Redis channel to kill a running agent mid-execution. Not a polite stop — an immediate process-level kill
  • Crash recovery: on startup, scan for tasks stuck in "running" state and mark them failed with a recovery message

Plugin hooks mapping:

  • before_run_callback → register task as RUNNING
  • after_run_callback → mark COMPLETED/FAILED
  • before_agent_callback → check abort signal
  • on_model_error_callback → handle crash recovery

Impact on your work

I run multi-agent systems in managing infrastructure via Telegram, Discord, Slack, and web interfaces. Without these primitives, every production ADK deployment needs to build them from scratch. I've already built and battle-tested this with different frame works with 100 concurrent agents across 10,000 rounds. See my github

The agent safety model (dual-gate firewall with deterministic denylist + independent LLM judge) is documented in a separate IETF Internet-Draft: https://datatracker.ietf.org/doc/draft-baysal-asimov-safety-architecture/

Willingness to contribute

Yes. I have a working implementation ready to port to the adk-python. This addresses #2524, #4796, and #1621.


Describe Alternatives You've Considered

  • SQLite/Database session service — exists but doesn't provide the low-latency Pub/Sub needed for real-time abort signals
  • InMemory session service — lost on restart, not suitable for production
  • Building it outside ADK — works but fragments the ecosystem. This belongs in the framework.

Proposed API / Implementation

Session service

from google.adk.sessions import RedisSessionService

session_service = RedisSessionService(
redis_url="redis://localhost:6379",
key_prefix="adk:",
session_ttl=3600,
)

Orchestration plugin

from google.adk.plugins import RedisOrchestrationPlugin

plugin = RedisOrchestrationPlugin(
redis_url="redis://localhost:6379",
enable_state_tracking=True,
enable_abort=True,
enable_crash_recovery=True,
)

App with both

from google.adk.apps import App

app = App(
name="my_app",
agent=my_agent,
plugins=[plugin],
)

Runner

runner = Runner(
app=app,
session_service=session_service,
)

External abort from anywhere

import redis
r = redis.from_url("redis://localhost:6379")
r.publish("adk:abort:task-123", '{"action": "abort"}')

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    services[Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions