Skip to content

RFC: Alembic Adoption for Database Migrations in ADK-Python #4387

@achilleasatha

Description

@achilleasatha

Status: Draft

Author: @achilleasatha

Date: 2026-02-05

Related Issue: #3343 - Update database schema automatically


1. Summary

Adopt Alembic as the database migration framework for adk-python to enable automatic schema updates, migration-coordinated deployments with minimal service disruption, and proper rollback support for enterprise production environments.

2. Motivation

2.1 Current Problems

From GitHub issue #3343 and community feedback:

  1. Manual migrations are not viable for production: Users must manually execute migration scripts when upgrading ADK versions. For enterprise deployments with multiple environments and multiple ADK-based agents per environment, translates to a lot of manual work, SSHing into servers and running scripts.

  2. Breaking changes in minor versions: Schema changes in ADK 1.14.0, 1.17.0, and 1.19.0 broke deployments without clear migration paths or automation.

  3. No support for Kubernetes deployments: Cannot run migrations via Helm hooks or init containers. SSH access to every pod is not scalable or feasible.

  4. Race conditions: Multiple pods starting simultaneously could attempt migrations concurrently, causing failures. The current migration script provided does not offer a locking mechanism to protect from race conditions.

  5. No rollback capability: Current system doesn't support downgrade migrations, making failed deployments difficult to recover.

  6. Permission constraints: Service accounts may lack DDL execution permissions, requiring manual intervention.

3. Proposal

3.1 Overview

Integrate Alembic as the primary database migration framework with:

  • Automatic migrations on startup (feature-flagged with ADK_AUTO_MIGRATE_DB)
  • Distributed locking to prevent race conditions across multiple pods
  • Kubernetes Helm hook support for pre-deployment migrations
  • Rollback capability via Alembic's downgrade functionality
  • Comprehensive testing across PostgreSQL, MySQL, and SQLite (and/or any other DBs that ADK-python wants to support)

Responsibility model: ADK owns schema definition and migration logic; operators retain full control over when and how migrations are executed (via environment variables or Helm hooks).

3.2 Key Features

For Developers:

  • Generate migrations: adk migrate generate --message "add_column"
  • Alembic autogenerate compares SQLAlchemy models to database
  • Clear upgrade/downgrade paths with version tracking

For Operations:

  • Helm pre-install/pre-upgrade hooks run migrations before app deployment
  • Database locking (eg. PostgreSQL advisory locks)

For Enterprise:

  • Migration-coordinated deployments with proper synchronization
  • Audit logging of all migration events
  • Support for Cloud SQL Proxy, Workload Identity, and secret management
  • Comprehensive documentation with Helm chart examples

4. Design Principles

  1. Safe by default: Auto-migration disabled by default (ADK_AUTO_MIGRATE_DB=false) to prevent unexpected schema changes
  2. Backward compatible: Phased rollout over releases with parallel support (e.g., deprecation warnings on minor/patches, destructive operations only on major releases)
  3. Well-tested: Integration tests with real databases (PostgreSQL, MySQL, SQLite)
  4. Production-ready: Distributed locking, timeout handling, error recovery
  5. Well-documented: Migration guides, Helm examples, troubleshooting docs

5. Technical Approach

5.1 Architecture

Application Startup
        ↓
DatabaseSessionService._prepare_tables()
        ↓
Check: ADK_AUTO_MIGRATE_DB?
        ↓ (true)
AlembicRunner.check_needs_migration()
        ↓ (yes)
Acquire DatabaseMigrationLock
        ↓
Run Alembic upgrade to "head"
        ↓
Alembic updates alembic_version table
        ↓
Migration script updates adk_internal_metadata.schema_version
        ↓
Release lock

5.2 Distributed Locking

Version Tracking: Alembic uses alembic_version table to track migration revisions. ADK additionally maintains adk_internal_metadata.schema_version as a higher-level compatibility layer for existing tooling and schema detection logic.

PostgreSQL: Advisory locks (non-blocking, session-scoped)

SELECT pg_try_advisory_lock(1234567890);

MySQL/SQLite: Table-based locks with expiration

  • Lock expires after 5 minutes (default, configurable via ADK_MIGRATION_TIMEOUT)
  • Automatic cleanup of stale locks
  • Note: Best-effort locking to prevent accidental concurrent migrations. Not a strict consensus mechanism. Clock skew or stale locks may affect correctness in edge cases.

SQLite: Migrations assume single-writer deployments (e.g., local development)

Behavior:

  • Instance A acquires lock → runs migration
  • Instances B-E wait → poll for completion every 2 seconds
  • All instances verify schema after migration completes

When is locking used?

Deployment Mode Locking Needed? Reason
Helm hook (K8s) ❌ No Single Job pod runs migration before app pods start - sequential by design
Auto-migration (ADK_AUTO_MIGRATE_DB=true) ✅ Yes Multiple app instances may start simultaneously and call _prepare_tables()

Use cases requiring locks:

  • Cloud Run: Multiple instances scaling up simultaneously with shared Cloud SQL
  • Docker Compose: Multiple service replicas sharing a database
  • K8s without Helm hooks: Deploying with ADK_AUTO_MIGRATE_DB=true on multiple pods
  • Local development: Running multiple instances for testing

5.3 Kubernetes Integration

Migration Job (Helm pre-install/pre-upgrade hook):

apiVersion: batch/v1
kind: Job
metadata:
  annotations:
    "helm.sh/hook": pre-install,pre-upgrade
    "helm.sh/hook-weight": "-5"
spec:
  template:
    spec:
      containers:
      - name: migration
        command: ["python", "-m", "google.adk.cli.migration_entrypoint"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: adk-db-secret
              key: url

Application Deployment:

env:
- name: ADK_AUTO_MIGRATE_DB
  value: "false"  # Disabled when using migration Job

6. Migration Workflow

6.1 For Developers

Creating a new migration:

# 1. Generate migration template for session database schema
adk migrate generate --message "add_session_tags"

# 2. Review auto-generated migration file
# alembic/versions/004_add_session_tags_v2.py

# 3. Customize if needed (data transformations, etc.)

# 4. Test migration
pytest tests/unittests/sessions/migration/

# 5. Commit migration file
git add alembic/versions/004_add_session_tags_v2.py
git commit -m "feat(migration): add session tags column"

Applying migration:

# Local development (applies to session database schema)
adk migrate session

# Production (via Helm)
helm upgrade my-app ./chart --set image.tag=v1.26.0
# Migration runs automatically via pre-upgrade hook

6.2 For Operations

Migration happens automatically in the library - users don't write migration code or operate migration scripts. There are two deployment modes:

Mode 1: Helm Pre-Upgrade Hook (Recommended for Production/Kubernetes)

# Migration runs as separate Job BEFORE app pods start
# See section 5.3 for full example
# In application deployment, disable auto-migration
env:
  - name: ADK_AUTO_MIGRATE_DB
    value: "false"

Mode 2: Auto-Migration on Startup (Local dev, testing, simple deployments)

# Just set environment variable - no code needed
export ADK_AUTO_MIGRATE_DB=true

# Migration runs automatically when DatabaseSessionService initializes
# This happens in _prepare_tables() method inside the library

Monitoring:

  • Track migration duration (e.g., Prometheus histogram)
  • Alert on migration failures
  • Monitor schema version lag across environments

Rollback (automated via Helm pre-rollback hook):

⚠️ Warning: Downgrades are best-effort only. Not all schema or data migrations are reversible (e.g., dropping columns, destructive transformations). Operators must verify downgrade safety before relying on automated rollback in production.

# templates/migration-rollback-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ .Release.Name }}-migration-rollback
  annotations:
    "helm.sh/hook": pre-rollback
    "helm.sh/hook-weight": "-5"
    "helm.sh/hook-delete-policy": before-hook-creation
spec:
  template:
    spec:
      containers:
      - name: migration-rollback
        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
        command:
          - python
          - -m
          - google.adk.cli.migration_entrypoint
          - --db-url
          - $(DATABASE_URL)
          - --downgrade
          - "1"  # Number of migrations to rollback (should auto-calculate from app version)
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: {{ .Values.database.secretName }}
              key: url

Workflow:

# Deploy v1.26.0 (migration runs automatically)
helm upgrade my-app ./chart --version 1.26.0

# Rollback to v1.25.0 (DB downgrade runs automatically)
helm rollback my-app
# ↑ Helm pre-rollback hook downgrades DB before rolling back app

7. Testing Strategy

7.1 Test Coverage

Unit Tests (tests/unittests/sessions/migration/):

  • AlembicMigrationRunner functionality
  • DatabaseMigrationLock (PostgreSQL + table-based)
  • Individual migration scripts (upgrade/downgrade)
  • Schema version detection and synchronization

Integration Tests (tests/integration/sessions/):

  • Full migration paths (V0 → V1 → V2)
  • Auto-migration on startup
  • Concurrent migrations
  • Rollback scenarios
  • Data preservation through migrations

GitHub Actions Workflow (.github/workflows/test-migrations.yml):

strategy:
  matrix:
    db: [postgres:15, postgres:14, mysql:8.0]
    python: ["3.10", "3.11", "3.12"]

services:
  postgres:
    image: ${{ matrix.db }}
    env:
      POSTGRES_PASSWORD: testpass
    ports:
      - 5432:5432

8. Backward Compatibility

8.1 Phased Rollout

ADK 1.25.0 (Parallel Support):

  • Add Alembic support
  • Keep existing manual migration scripts
  • Document both approaches
  • CLI supports both: adk migrate session (Alembic) and adk migrate session --legacy (manual)

ADK 1.26.0 (Soft Deprecation):

  • Alembic becomes default
  • Deprecation warnings for manual migrations

ADK 1.27.0 (Hard Deprecation):

  • Remove manual migrations from main codebase
  • CLI only supports Alembic

8.2 Migration Path for Existing Users

V1 databases (without Alembic):

# Bootstrap Alembic - stamps baseline revision without running migrations
adk migrate session --bootstrap-alembic

This stamps the database with 002_baseline_v1 revision, indicating it's already at V1 schema.

V0 databases (legacy pickle-based):

# Migrate V0 → V1 via Alembic
adk migrate session --upgrade

This runs the 003_v0_to_v1_migration.py script to convert pickle to JSON.

Automatic detection:
When ADK_AUTO_MIGRATE_DB=true, the system automatically detects schema version and bootstraps Alembic if needed (stamps appropriate baseline revision).

9. Documentation Plan

9.1 New Documentation

User Guides:

  • docs/migration_guide.md - Developer workflow for creating migrations
  • docs/helm_migration_guide.md - Kubernetes deployment patterns

Helm Chart Examples:

  1. Simple PostgreSQL migration job
  2. Cloud SQL with Workload Identity
  3. Migration with health check init containers

Migration Template:
Enhanced alembic/script.py.mako with ADK-specific fields:

  • Database schema version
  • Compatible ADK versions
  • Description and data migration notes
  • Rollback considerations

9.2 Updated Documentation

  • src/google/adk/sessions/migration/README.md - Add Alembic workflow
  • Release notes for each version with migration instructions

10. References

Metadata

Metadata

Assignees

No one assigned

    Labels

    services[Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions