-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Status: Draft
Author: @achilleasatha
Date: 2026-02-05
Related Issue: #3343 - Update database schema automatically
1. Summary
Adopt Alembic as the database migration framework for adk-python to enable automatic schema updates, migration-coordinated deployments with minimal service disruption, and proper rollback support for enterprise production environments.
2. Motivation
2.1 Current Problems
From GitHub issue #3343 and community feedback:
-
Manual migrations are not viable for production: Users must manually execute migration scripts when upgrading ADK versions. For enterprise deployments with multiple environments and multiple ADK-based agents per environment, translates to a lot of manual work, SSHing into servers and running scripts.
-
Breaking changes in minor versions: Schema changes in ADK 1.14.0, 1.17.0, and 1.19.0 broke deployments without clear migration paths or automation.
-
No support for Kubernetes deployments: Cannot run migrations via Helm hooks or init containers. SSH access to every pod is not scalable or feasible.
-
Race conditions: Multiple pods starting simultaneously could attempt migrations concurrently, causing failures. The current migration script provided does not offer a locking mechanism to protect from race conditions.
-
No rollback capability: Current system doesn't support downgrade migrations, making failed deployments difficult to recover.
-
Permission constraints: Service accounts may lack DDL execution permissions, requiring manual intervention.
3. Proposal
3.1 Overview
Integrate Alembic as the primary database migration framework with:
- Automatic migrations on startup (feature-flagged with
ADK_AUTO_MIGRATE_DB) - Distributed locking to prevent race conditions across multiple pods
- Kubernetes Helm hook support for pre-deployment migrations
- Rollback capability via Alembic's downgrade functionality
- Comprehensive testing across PostgreSQL, MySQL, and SQLite (and/or any other DBs that ADK-python wants to support)
Responsibility model: ADK owns schema definition and migration logic; operators retain full control over when and how migrations are executed (via environment variables or Helm hooks).
3.2 Key Features
For Developers:
- Generate migrations:
adk migrate generate --message "add_column" - Alembic autogenerate compares SQLAlchemy models to database
- Clear upgrade/downgrade paths with version tracking
For Operations:
- Helm pre-install/pre-upgrade hooks run migrations before app deployment
- Database locking (eg. PostgreSQL advisory locks)
For Enterprise:
- Migration-coordinated deployments with proper synchronization
- Audit logging of all migration events
- Support for Cloud SQL Proxy, Workload Identity, and secret management
- Comprehensive documentation with Helm chart examples
4. Design Principles
- Safe by default: Auto-migration disabled by default (
ADK_AUTO_MIGRATE_DB=false) to prevent unexpected schema changes - Backward compatible: Phased rollout over releases with parallel support (e.g., deprecation warnings on minor/patches, destructive operations only on major releases)
- Well-tested: Integration tests with real databases (PostgreSQL, MySQL, SQLite)
- Production-ready: Distributed locking, timeout handling, error recovery
- Well-documented: Migration guides, Helm examples, troubleshooting docs
5. Technical Approach
5.1 Architecture
Application Startup
↓
DatabaseSessionService._prepare_tables()
↓
Check: ADK_AUTO_MIGRATE_DB?
↓ (true)
AlembicRunner.check_needs_migration()
↓ (yes)
Acquire DatabaseMigrationLock
↓
Run Alembic upgrade to "head"
↓
Alembic updates alembic_version table
↓
Migration script updates adk_internal_metadata.schema_version
↓
Release lock
5.2 Distributed Locking
Version Tracking: Alembic uses alembic_version table to track migration revisions. ADK additionally maintains adk_internal_metadata.schema_version as a higher-level compatibility layer for existing tooling and schema detection logic.
PostgreSQL: Advisory locks (non-blocking, session-scoped)
SELECT pg_try_advisory_lock(1234567890);MySQL/SQLite: Table-based locks with expiration
- Lock expires after 5 minutes (default, configurable via
ADK_MIGRATION_TIMEOUT) - Automatic cleanup of stale locks
- Note: Best-effort locking to prevent accidental concurrent migrations. Not a strict consensus mechanism. Clock skew or stale locks may affect correctness in edge cases.
SQLite: Migrations assume single-writer deployments (e.g., local development)
Behavior:
- Instance A acquires lock → runs migration
- Instances B-E wait → poll for completion every 2 seconds
- All instances verify schema after migration completes
When is locking used?
| Deployment Mode | Locking Needed? | Reason |
|---|---|---|
| Helm hook (K8s) | ❌ No | Single Job pod runs migration before app pods start - sequential by design |
| Auto-migration (ADK_AUTO_MIGRATE_DB=true) | ✅ Yes | Multiple app instances may start simultaneously and call _prepare_tables() |
Use cases requiring locks:
- Cloud Run: Multiple instances scaling up simultaneously with shared Cloud SQL
- Docker Compose: Multiple service replicas sharing a database
- K8s without Helm hooks: Deploying with
ADK_AUTO_MIGRATE_DB=trueon multiple pods - Local development: Running multiple instances for testing
5.3 Kubernetes Integration
Migration Job (Helm pre-install/pre-upgrade hook):
apiVersion: batch/v1
kind: Job
metadata:
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-5"
spec:
template:
spec:
containers:
- name: migration
command: ["python", "-m", "google.adk.cli.migration_entrypoint"]
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: adk-db-secret
key: urlApplication Deployment:
env:
- name: ADK_AUTO_MIGRATE_DB
value: "false" # Disabled when using migration Job6. Migration Workflow
6.1 For Developers
Creating a new migration:
# 1. Generate migration template for session database schema
adk migrate generate --message "add_session_tags"
# 2. Review auto-generated migration file
# alembic/versions/004_add_session_tags_v2.py
# 3. Customize if needed (data transformations, etc.)
# 4. Test migration
pytest tests/unittests/sessions/migration/
# 5. Commit migration file
git add alembic/versions/004_add_session_tags_v2.py
git commit -m "feat(migration): add session tags column"Applying migration:
# Local development (applies to session database schema)
adk migrate session
# Production (via Helm)
helm upgrade my-app ./chart --set image.tag=v1.26.0
# Migration runs automatically via pre-upgrade hook6.2 For Operations
Migration happens automatically in the library - users don't write migration code or operate migration scripts. There are two deployment modes:
Mode 1: Helm Pre-Upgrade Hook (Recommended for Production/Kubernetes)
# Migration runs as separate Job BEFORE app pods start
# See section 5.3 for full example# In application deployment, disable auto-migration
env:
- name: ADK_AUTO_MIGRATE_DB
value: "false"Mode 2: Auto-Migration on Startup (Local dev, testing, simple deployments)
# Just set environment variable - no code needed
export ADK_AUTO_MIGRATE_DB=true
# Migration runs automatically when DatabaseSessionService initializes
# This happens in _prepare_tables() method inside the libraryMonitoring:
- Track migration duration (e.g., Prometheus histogram)
- Alert on migration failures
- Monitor schema version lag across environments
Rollback (automated via Helm pre-rollback hook):
⚠️ Warning: Downgrades are best-effort only. Not all schema or data migrations are reversible (e.g., dropping columns, destructive transformations). Operators must verify downgrade safety before relying on automated rollback in production.
# templates/migration-rollback-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: {{ .Release.Name }}-migration-rollback
annotations:
"helm.sh/hook": pre-rollback
"helm.sh/hook-weight": "-5"
"helm.sh/hook-delete-policy": before-hook-creation
spec:
template:
spec:
containers:
- name: migration-rollback
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
command:
- python
- -m
- google.adk.cli.migration_entrypoint
- --db-url
- $(DATABASE_URL)
- --downgrade
- "1" # Number of migrations to rollback (should auto-calculate from app version)
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: {{ .Values.database.secretName }}
key: urlWorkflow:
# Deploy v1.26.0 (migration runs automatically)
helm upgrade my-app ./chart --version 1.26.0
# Rollback to v1.25.0 (DB downgrade runs automatically)
helm rollback my-app
# ↑ Helm pre-rollback hook downgrades DB before rolling back app7. Testing Strategy
7.1 Test Coverage
Unit Tests (tests/unittests/sessions/migration/):
- AlembicMigrationRunner functionality
- DatabaseMigrationLock (PostgreSQL + table-based)
- Individual migration scripts (upgrade/downgrade)
- Schema version detection and synchronization
Integration Tests (tests/integration/sessions/):
- Full migration paths (V0 → V1 → V2)
- Auto-migration on startup
- Concurrent migrations
- Rollback scenarios
- Data preservation through migrations
GitHub Actions Workflow (.github/workflows/test-migrations.yml):
strategy:
matrix:
db: [postgres:15, postgres:14, mysql:8.0]
python: ["3.10", "3.11", "3.12"]
services:
postgres:
image: ${{ matrix.db }}
env:
POSTGRES_PASSWORD: testpass
ports:
- 5432:54328. Backward Compatibility
8.1 Phased Rollout
ADK 1.25.0 (Parallel Support):
- Add Alembic support
- Keep existing manual migration scripts
- Document both approaches
- CLI supports both:
adk migrate session(Alembic) andadk migrate session --legacy(manual)
ADK 1.26.0 (Soft Deprecation):
- Alembic becomes default
- Deprecation warnings for manual migrations
ADK 1.27.0 (Hard Deprecation):
- Remove manual migrations from main codebase
- CLI only supports Alembic
8.2 Migration Path for Existing Users
V1 databases (without Alembic):
# Bootstrap Alembic - stamps baseline revision without running migrations
adk migrate session --bootstrap-alembicThis stamps the database with 002_baseline_v1 revision, indicating it's already at V1 schema.
V0 databases (legacy pickle-based):
# Migrate V0 → V1 via Alembic
adk migrate session --upgradeThis runs the 003_v0_to_v1_migration.py script to convert pickle to JSON.
Automatic detection:
When ADK_AUTO_MIGRATE_DB=true, the system automatically detects schema version and bootstraps Alembic if needed (stamps appropriate baseline revision).
9. Documentation Plan
9.1 New Documentation
User Guides:
docs/migration_guide.md- Developer workflow for creating migrationsdocs/helm_migration_guide.md- Kubernetes deployment patterns
Helm Chart Examples:
- Simple PostgreSQL migration job
- Cloud SQL with Workload Identity
- Migration with health check init containers
Migration Template:
Enhanced alembic/script.py.mako with ADK-specific fields:
- Database schema version
- Compatible ADK versions
- Description and data migration notes
- Rollback considerations
9.2 Updated Documentation
src/google/adk/sessions/migration/README.md- Add Alembic workflow- Release notes for each version with migration instructions
10. References
- GitHub Issue: #3343 - Update database schema automatically
- Alembic Documentation: https://alembic.sqlalchemy.org/
- LiteLLM Prisma Example: https://docs.litellm.ai/docs/proxy/prod#9-use-prisma-migrate-deploy
- Helm Hooks: https://helm.sh/docs/topics/charts_hooks/
- PostgreSQL Advisory Locks: https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS