Skip to content

kbagent check: Platform observability with AI-powered investigation #74

@padak

Description

@padak

Problem

Keboola users run dozens of components on schedules. When something goes wrong, they find out too late — and even when they notice, understanding why is the painful part.

Today's workflow:

  1. Write a SQL transformation to detect anomalies (tedious, per-metric)
  2. Wait for it to run on schedule
  3. When it fires: manually dig through GCP Cost Explorer / Keboola UI to figure out what happened
  4. 30-60 minutes of clicking around to find root cause
  5. Repeat for every metric you care about

What users actually want:

  • "Credits consumption doubled this week — who caused it and what changed?"
  • "This extractor is getting slower every day — since when and why?"
  • "Data imports dropped 80% on Tuesday — which pipeline broke?"

The gap: detection is possible with SQL, but investigation (root cause analysis) requires iterative exploration that's tedious for humans and perfect for AI.

Two New Building Blocks

1. kbagent observe — Local Platform Metadata Warehouse

A local DuckDB database that mirrors your Keboola project's operational metadata. Syncs incrementally from APIs, queryable in milliseconds, works offline.

2. kbagent check — Automated Assertions on Observe Data

YAML-defined checks (from templates) that run against the local observe database. No AI at runtime. When a check fails, it produces a structured briefing JSON that any AI agent can pick up for investigation.

┌─────────────────────────────────────────────────────────────┐
│  PHASE 1: OBSERVE SYNC (automated, incremental)              │
│                                                             │
│  Keboola APIs ──→ .kbagent/observe.db (DuckDB)             │
│  (jobs, storage events, config versions, table stats)       │
│                                                             │
│  First run: 30-60s │ Next runs: 2-5s (watermark-based)     │
└──────────────┬──────────────────────────────────────────────┘
               │
       ┌───────┴────────────┐
       │                    │
┌──────▼──────────┐  ┌──────▼──────────────────────────────┐
│ kbagent check   │  │ Your local AI                       │
│ run --all       │  │                                     │
│                 │  │ Claude Code / Agent SDK / KAI       │
│ Deterministic   │  │                                     │
│ SQL assertions  │  │ Ad-hoc: "Why did credits spike?"    │
│ from YAML       │  │ → kbagent observe query --sql "..." │
│ templates       │  │ → iterative drill-down              │
│                 │  │ → milliseconds, offline, free       │
│ → briefing JSON │  │                                     │
└─────────────────┘  └─────────────────────────────────────┘

kbagent observe — The Foundation

What's in observe.db

All data comes from existing Keboola APIs. Volume is small (thousands of records), perfect for DuckDB.

DuckDB table API source Key columns
jobs Queue API /search/jobs component_id, config_id, config_name, status, duration_seconds, credits_consumed, start_time, end_time, result_message
storage_events Storage API /events type (tableImport, tableExport, ...), table_id, bytes, created, creator_token_description
config_versions Storage API /components/*/configs/*/versions component_id, config_id, version, created, creator_token_description, change_description
table_stats Storage API /tables id, bucket_id, name, rows_count, data_size_bytes, last_import_date, last_change_date

Why DuckDB

Workspace (Snowflake/BQ) DuckDB (in-process)
Startup 5-15s + cloud compute ~0ms
Cost Snowflake credits / BQ slots Zero
SQL dialect Depends on project backend Always PostgreSQL-compatible
Dependency keboola-mcp-server + workspace pip install duckdb (12MB)
Offline No Yes
History Limited to API retention Permanent local archive

Incremental Sync

# First run: fetch all jobs for lookback period (30-60s)
# Subsequent runs: fetch only since last watermark (2-5s)
# .kbagent/observe.db grows ~5-20MB per project

Keboola storage events have ~30-day retention. The local observe.db keeps data forever, becoming a permanent archive of platform activity that outlives API retention.

CLI Commands

# Sync metadata (incremental by default)
kbagent observe sync [--project ALIAS]           # delta since last sync, 2-5s
kbagent observe sync --full [--project ALIAS]    # full refresh, 30-60s

# Query the local database directly
kbagent observe query --sql "
  SELECT component_id, DATE_TRUNC('day', start_time) as day,
         SUM(credits_consumed) as credits, COUNT(*) as runs
  FROM jobs
  WHERE start_time >= '2026-03-01'
  GROUP BY 1, 2
  ORDER BY credits DESC
  LIMIT 20
"

# Or connect directly from Python / AI agent
# duckdb.connect('.kbagent/observe.db').sql('SELECT ...')

# Stats about the local database
kbagent observe stats [--project ALIAS]
# → jobs: 12,450 (2025-12-01 to 2026-03-29)
# → storage_events: 34,200
# → config_versions: 890
# → db size: 8.2 MB
# → last sync: 2026-03-29T07:00:12Z

Relationship with sync pull

These are complementary, not competing commands:

sync pull observe sync
Purpose Edit and push-back configurations Analyze platform activity
Downloads Configs (YAML), storage metadata, 5 recent jobs per config Full job history, storage events, config version history
Format Files on disk (human-readable, git-friendly) Single DuckDB file (query-friendly)
Queryable No (read JSON files manually) Yes (DuckDB SQL, milliseconds)
Question it answers "What does my project look like now?" "What has been happening in my project?"

sync pull = state snapshot (for editing).
observe sync = time series (for analytics).


kbagent check — Assertions on Observe Data

Three-Phase Design, AI Only Where Needed

Phase 1: DEFINE (human + AI, one-time)
User and AI pick from strategy templates, customize thresholds → YAML file in .kbagent/checks/. AI is involved here. After this, everything is automated.

Phase 2: RUN (automated, NO AI, deterministic)
kbagent check run --all queries the local observe.db, evaluates pass/fail, produces briefing on failure. No AI calls. No cloud compute. Pure SQL.

Phase 3: INVESTIGATE (AI picks up the briefing, only on failure)
The briefing JSON is self-contained. Any AI agent can pick it up: Claude Code interactively, Claude Code /schedule automated, KAI bot, or custom Agent SDK app.

Strategy Templates

Users don't write SQL — they pick a template and set parameters.

credit-spike — Component burning more credits than usual

apiVersion: kbagent/v1
kind: Check
metadata:
  name: credit-spike
  description: "Alert when any component burns >50% more credits than usual"
  severity: warning
  tags: [credits, cost-monitoring]

spec:
  project: my-project
  template: credit-spike
  params:
    lookback_days: 14          # baseline period
    window_hours: 24           # detection window
    threshold_pct: 50          # alert if >50% above average
    min_daily_credits: 0.5     # ignore negligible usage
    group_by: component_id     # or: [component_id, config_id]

job-slowdown — Jobs running slower than their baseline

spec:
  template: job-slowdown
  params:
    lookback_days: 30
    slowdown_factor: 3         # alert if duration > 3x baseline
    baseline_metric: p95       # p50 | p95 | mean
    min_duration_seconds: 60
    status_filter: [success]

import-volume-drop — Data pipeline stopped or degraded

spec:
  template: import-volume-drop
  params:
    lookback_weeks: 4
    drop_threshold_pct: 50
    compare_by: weekday        # Mon vs Mon, Tue vs Tue (seasonality)
    timezone: "Europe/Prague"  # for correct weekday comparison

error-rate — Component failing too often

spec:
  template: error-rate
  params:
    window_hours: 24
    threshold_pct: 20
    min_jobs: 5                # ignore components with <5 runs (noisy)

data-freshness — Tables not updated on time

spec:
  template: data-freshness
  params:
    staleness_factor: 2        # alert if gap > 2x usual interval
    tables:                    # specific tables (optional: all if omitted)
      - in.c-billing.daily_costs
      - in.c-crm.contacts

custom — Raw DuckDB SQL for power users

spec:
  template: custom
  query: |
    -- DuckDB SQL against: jobs, storage_events, config_versions, table_stats
    -- Returns rows that VIOLATE the rule (empty result = pass)
    SELECT component_id, daily_credits, avg_credits, pct_change
    FROM ( ... )
    WHERE pct_change > 50
  expect:
    type: empty

The Briefing (failure output for AI)

When a check fails, it produces a self-contained JSON briefing designed for AI consumption:

{
  "briefing_version": "1",
  "generated_at": "2026-03-29T07:00:12Z",
  "project": {
    "alias": "billing-prod",
    "id": 12345,
    "stack": "connection.north-europe.azure.keboola.com"
  },
  "check": {
    "name": "credit-spike",
    "description": "Alert when any component burns >50% more credits than usual",
    "severity": "warning",
    "template": "credit-spike"
  },
  "result": {
    "status": "fail",
    "violations": [
      {
        "component_id": "keboola.ex-google-bigquery-v2",
        "config_id": "894231",
        "actual": 42.5,
        "expected": {
          "mean": 8.3,
          "p50": 7.9,
          "p95": 12.1
        },
        "pct_change": 412.0,
        "z_score": 4.8
      }
    ]
  },
  "context": {
    "metric_history": {
      "data_points": [
        {"date": "2026-03-25", "credits": 8.1},
        {"date": "2026-03-26", "credits": 7.9},
        {"date": "2026-03-27", "credits": 9.2},
        {"date": "2026-03-28", "credits": 42.5}
      ],
      "trend": "spike",
      "volatility": "low"
    },
    "recent_config_changes": [
      {
        "config_id": "894231",
        "config_name": "BQ Marketing Extract",
        "changed_at": "2026-03-27T14:30:00Z",
        "changed_by": "betka@company.com",
        "change_description": "Updated query parameters"
      }
    ]
  },
  "suggested_actions": [
    {
      "command": "kbagent --json config detail --project billing-prod --component-id keboola.ex-google-bigquery-v2 --config-id 894231",
      "description": "Inspect config that changed 1 day before spike",
      "priority": "high",
      "rationale": "Config 894231 was modified by betka@company.com on 2026-03-27, credits spiked on 2026-03-28"
    },
    {
      "command": "kbagent --json job list --project billing-prod --component-id keboola.ex-google-bigquery-v2 --limit 50",
      "description": "List recent jobs for this component",
      "priority": "medium",
      "rationale": "Compare job durations and credits before/after config change"
    },
    {
      "command": "kbagent observe query --sql \"SELECT DATE_TRUNC('day', start_time) as day, COUNT(*), SUM(credits_consumed) FROM jobs WHERE component_id = 'keboola.ex-google-bigquery-v2' AND start_time >= '2026-03-20' GROUP BY 1 ORDER BY 1\"",
      "description": "Daily trend for this component",
      "priority": "medium",
      "rationale": "See exact day the spike started"
    }
  ]
}

Key properties:

  • Self-contained: violation data + statistical context (mean, p50, p95) + config change correlation
  • Executable actions: not prose hints, but actual commands with priority and rationale — AI can run them directly
  • Human-readable: a human can scan it and understand the problem without AI

CLI Commands

# === Define (with AI assistance) ===
kbagent check templates                                    # list available templates
kbagent check create --template credit-spike --project X   # AI-assisted creation
kbagent check validate .kbagent/checks/credit-spike.yaml   # validate YAML

# === Run (no AI, deterministic) ===
kbagent check run --name credit-spike                       # single check
kbagent check run --all [--project ALIAS]                   # all checks
kbagent --json check run --all                              # JSON output with briefings
kbagent check run --name credit-spike --dry-run             # show SQL + what would run
kbagent check render --name credit-spike                    # show generated DuckDB SQL

# === Sync checks to/from Keboola ===
kbagent check push [--name NAME | --all] --project ALIAS   # → Storage Files (permanent)
kbagent check pull [--project ALIAS]                        # ← Storage Files
kbagent check list [--project ALIAS] [--remote]

# === History ===
kbagent check history [--name NAME] [--limit 10]

Who Runs It?

kbagent check run is a plain CLI command. Anything can schedule it:

Runner How AI investigation
Local cron 0 7 * * * kbagent observe sync && kbagent check run --all Manual (read briefing)
Claude Code /schedule Daily agent prompt Agent auto-investigates on failure
Keboola transformation Python subprocess call Write briefings to output table
KAI bot Platform-native trigger Bot investigates + notifies

Scope: Platform Observability, NOT Data Quality

This is an important distinction. kbagent check monitors platform behavior (credits, jobs, configs, pipelines), not data inside tables (row counts, null rates, column distributions).

┌─────────────────────┐
│ Data Quality Tools  │  Soda, Great Expectations, Elementary, Monte Carlo
│ (row counts, nulls, │  → monitor DATA inside tables
│  schema drift, etc) │  → requires warehouse access
└─────────────────────┘

┌─────────────────────┐
│ kbagent observe +   │  ← THIS PROPOSAL
│ kbagent check       │  → monitor PLATFORM behavior
│ (jobs, credits,     │  → zero infrastructure, local DuckDB
│  configs, imports)  │  → AI-ready investigation handoff
└─────────────────────┘

If users need data-inside-tables monitoring (e.g., "conversion rate is dropping"), that's a separate system built on top of kbagent — using kbagent workspace query or MCP tools for data access. The CLI provides the toolkit; the orchestration layer (Claude Code agent, Agent SDK app, KAI) handles the investigation loop.


Implementation Phases

Phase 1 — MVP: observe + 2 templates + briefing

  • kbagent observe sync with incremental DuckDB caching
  • kbagent observe query --sql "..." for ad-hoc queries
  • kbagent observe stats for database info
  • 2 check templates: credit-spike + custom
  • kbagent check run / check list / check validate
  • kbagent check render --name X (show generated SQL)
  • kbagent check run --dry-run
  • Briefing output with suggested_actions and statistical context

Phase 2 — Full templates + persistence

  • Remaining templates: error-rate, job-slowdown, import-volume-drop, data-freshness
  • Config version history in observe.db (config change correlation in briefings)
  • Storage Files push/pull for check definitions (permanent, tagged)
  • check history command
  • check run --store-results

Phase 3 — Assisted creation + plugin

  • kbagent check create with AI-assisted parameter tuning
  • kbagent check templates with descriptions and examples
  • kbagent plugin playbook for investigation patterns
  • Integration guide: Claude Code /schedule + check run + investigation loop

Open Questions

  1. Observe data scope: How many days of job history to fetch by default? 30? 90? Queue API has pagination for large projects.

  2. Multi-project observe: Should observe.db contain data from all projects (one DB) or per-project (one DB per alias)?

  3. Multi-project checks: Should one check span multiple projects? MVP: single project per check.

  4. Notification: Part of check YAML or runner responsibility? Leaning toward runner-handles-notification.

  5. Config versions fetch: Extra API calls per component/config for change correlation. Fetch eagerly during observe sync, or lazily only for components in violations?

  6. Noise controls: cooldown, min_consecutive_failures, snooze_until — should these be in the check YAML or in the runner?


Review Notes

This proposal was reviewed by OpenAI (Codex gpt-5.3), Google (Gemini), and Anthropic (Claude Sonnet). Key consensus points:

  • Templates are the right abstraction — with custom as escape hatch and check render for trust/debugging
  • Briefing with executable actions is the core value prop — not prose hints, but commands AI can run directly
  • Incremental caching is critical for UX — without it, every check run takes 30-60s (unacceptable)
  • Positioning: "Keboola-native operational guardrails + AI-ready root-cause handoff" — not competing with data quality tools (Soda, GE, Monte Carlo)
  • Phase 1 must feel magical end-to-end — one template (credit-spike) that detects, briefs, and enables AI investigation in under 5 seconds

Feedback welcome — especially on the observe/check split, template selection, briefing format, and whether this covers real monitoring use cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions