kbagent check: Platform observability with AI-powered investigation

## Problem

Keboola users run dozens of components on schedules. When something goes wrong, they find out too late — and even when they notice, understanding **why** is the painful part.

Today's workflow:
1. Write a SQL transformation to detect anomalies (tedious, per-metric)
2. Wait for it to run on schedule
3. When it fires: manually dig through GCP Cost Explorer / Keboola UI to figure out what happened
4. 30-60 minutes of clicking around to find root cause
5. Repeat for every metric you care about

What users actually want:
- "Credits consumption doubled this week — **who** caused it and **what** changed?"
- "This extractor is getting slower every day — **since when** and **why**?"
- "Data imports dropped 80% on Tuesday — **which pipeline** broke?"

The gap: detection is possible with SQL, but **investigation** (root cause analysis) requires iterative exploration that's tedious for humans and perfect for AI.

## Two New Building Blocks

### 1. `kbagent observe` — Local Platform Metadata Warehouse

A local DuckDB database that mirrors your Keboola project's operational metadata. Syncs incrementally from APIs, queryable in milliseconds, works offline.

### 2. `kbagent check` — Automated Assertions on Observe Data

YAML-defined checks (from templates) that run against the local observe database. No AI at runtime. When a check fails, it produces a structured **briefing** JSON that any AI agent can pick up for investigation.

```
┌─────────────────────────────────────────────────────────────┐
│  PHASE 1: OBSERVE SYNC (automated, incremental)              │
│                                                             │
│  Keboola APIs ──→ .kbagent/observe.db (DuckDB)             │
│  (jobs, storage events, config versions, table stats)       │
│                                                             │
│  First run: 30-60s │ Next runs: 2-5s (watermark-based)     │
└──────────────┬──────────────────────────────────────────────┘
               │
       ┌───────┴────────────┐
       │                    │
┌──────▼──────────┐  ┌──────▼──────────────────────────────┐
│ kbagent check   │  │ Your local AI                       │
│ run --all       │  │                                     │
│                 │  │ Claude Code / Agent SDK / KAI       │
│ Deterministic   │  │                                     │
│ SQL assertions  │  │ Ad-hoc: "Why did credits spike?"    │
│ from YAML       │  │ → kbagent observe query --sql "..." │
│ templates       │  │ → iterative drill-down              │
│                 │  │ → milliseconds, offline, free       │
│ → briefing JSON │  │                                     │
└─────────────────┘  └─────────────────────────────────────┘
```

## `kbagent observe` — The Foundation

### What's in observe.db

All data comes from existing Keboola APIs. Volume is small (thousands of records), perfect for DuckDB.

| DuckDB table | API source | Key columns |
|-------------|-----------|-------------|
| `jobs` | Queue API `/search/jobs` | component_id, config_id, config_name, status, duration_seconds, credits_consumed, start_time, end_time, result_message |
| `storage_events` | Storage API `/events` | type (tableImport, tableExport, ...), table_id, bytes, created, creator_token_description |
| `config_versions` | Storage API `/components/*/configs/*/versions` | component_id, config_id, version, created, creator_token_description, change_description |
| `table_stats` | Storage API `/tables` | id, bucket_id, name, rows_count, data_size_bytes, last_import_date, last_change_date |

### Why DuckDB

| | Workspace (Snowflake/BQ) | DuckDB (in-process) |
|---|---|---|
| Startup | 5-15s + cloud compute | ~0ms |
| Cost | Snowflake credits / BQ slots | Zero |
| SQL dialect | Depends on project backend | Always PostgreSQL-compatible |
| Dependency | keboola-mcp-server + workspace | `pip install duckdb` (12MB) |
| Offline | No | Yes |
| History | Limited to API retention | Permanent local archive |

### Incremental Sync

```python
# First run: fetch all jobs for lookback period (30-60s)
# Subsequent runs: fetch only since last watermark (2-5s)
# .kbagent/observe.db grows ~5-20MB per project
```

Keboola storage events have ~30-day retention. The local observe.db keeps data **forever**, becoming a permanent archive of platform activity that outlives API retention.

### CLI Commands

```bash
# Sync metadata (incremental by default)
kbagent observe sync [--project ALIAS]           # delta since last sync, 2-5s
kbagent observe sync --full [--project ALIAS]    # full refresh, 30-60s

# Query the local database directly
kbagent observe query --sql "
  SELECT component_id, DATE_TRUNC('day', start_time) as day,
         SUM(credits_consumed) as credits, COUNT(*) as runs
  FROM jobs
  WHERE start_time >= '2026-03-01'
  GROUP BY 1, 2
  ORDER BY credits DESC
  LIMIT 20
"

# Or connect directly from Python / AI agent
# duckdb.connect('.kbagent/observe.db').sql('SELECT ...')

# Stats about the local database
kbagent observe stats [--project ALIAS]
# → jobs: 12,450 (2025-12-01 to 2026-03-29)
# → storage_events: 34,200
# → config_versions: 890
# → db size: 8.2 MB
# → last sync: 2026-03-29T07:00:12Z
```

### Relationship with `sync pull`

These are **complementary, not competing** commands:

| | `sync pull` | `observe sync` |
|---|---|---|
| **Purpose** | Edit and push-back configurations | Analyze platform activity |
| **Downloads** | Configs (YAML), storage metadata, 5 recent jobs per config | Full job history, storage events, config version history |
| **Format** | Files on disk (human-readable, git-friendly) | Single DuckDB file (query-friendly) |
| **Queryable** | No (read JSON files manually) | Yes (DuckDB SQL, milliseconds) |
| **Question it answers** | "What does my project look like now?" | "What has been happening in my project?" |

`sync pull` = **state snapshot** (for editing).
`observe sync` = **time series** (for analytics).

---

## `kbagent check` — Assertions on Observe Data

### Three-Phase Design, AI Only Where Needed

**Phase 1: DEFINE** (human + AI, one-time)
User and AI pick from strategy templates, customize thresholds → YAML file in `.kbagent/checks/`. AI is involved here. After this, everything is automated.

**Phase 2: RUN** (automated, NO AI, deterministic)
`kbagent check run --all` queries the local observe.db, evaluates pass/fail, produces briefing on failure. No AI calls. No cloud compute. Pure SQL.

**Phase 3: INVESTIGATE** (AI picks up the briefing, only on failure)
The briefing JSON is self-contained. Any AI agent can pick it up: Claude Code interactively, Claude Code `/schedule` automated, KAI bot, or custom Agent SDK app.

### Strategy Templates

Users don't write SQL — they pick a template and set parameters.

#### `credit-spike` — Component burning more credits than usual

```yaml
apiVersion: kbagent/v1
kind: Check
metadata:
  name: credit-spike
  description: "Alert when any component burns >50% more credits than usual"
  severity: warning
  tags: [credits, cost-monitoring]

spec:
  project: my-project
  template: credit-spike
  params:
    lookback_days: 14          # baseline period
    window_hours: 24           # detection window
    threshold_pct: 50          # alert if >50% above average
    min_daily_credits: 0.5     # ignore negligible usage
    group_by: component_id     # or: [component_id, config_id]
```

#### `job-slowdown` — Jobs running slower than their baseline

```yaml
spec:
  template: job-slowdown
  params:
    lookback_days: 30
    slowdown_factor: 3         # alert if duration > 3x baseline
    baseline_metric: p95       # p50 | p95 | mean
    min_duration_seconds: 60
    status_filter: [success]
```

#### `import-volume-drop` — Data pipeline stopped or degraded

```yaml
spec:
  template: import-volume-drop
  params:
    lookback_weeks: 4
    drop_threshold_pct: 50
    compare_by: weekday        # Mon vs Mon, Tue vs Tue (seasonality)
    timezone: "Europe/Prague"  # for correct weekday comparison
```

#### `error-rate` — Component failing too often

```yaml
spec:
  template: error-rate
  params:
    window_hours: 24
    threshold_pct: 20
    min_jobs: 5                # ignore components with <5 runs (noisy)
```

#### `data-freshness` — Tables not updated on time

```yaml
spec:
  template: data-freshness
  params:
    staleness_factor: 2        # alert if gap > 2x usual interval
    tables:                    # specific tables (optional: all if omitted)
      - in.c-billing.daily_costs
      - in.c-crm.contacts
```

#### `custom` — Raw DuckDB SQL for power users

```yaml
spec:
  template: custom
  query: |
    -- DuckDB SQL against: jobs, storage_events, config_versions, table_stats
    -- Returns rows that VIOLATE the rule (empty result = pass)
    SELECT component_id, daily_credits, avg_credits, pct_change
    FROM ( ... )
    WHERE pct_change > 50
  expect:
    type: empty
```

### The Briefing (failure output for AI)

When a check fails, it produces a self-contained JSON briefing designed for AI consumption:

```json
{
  "briefing_version": "1",
  "generated_at": "2026-03-29T07:00:12Z",
  "project": {
    "alias": "billing-prod",
    "id": 12345,
    "stack": "connection.north-europe.azure.keboola.com"
  },
  "check": {
    "name": "credit-spike",
    "description": "Alert when any component burns >50% more credits than usual",
    "severity": "warning",
    "template": "credit-spike"
  },
  "result": {
    "status": "fail",
    "violations": [
      {
        "component_id": "keboola.ex-google-bigquery-v2",
        "config_id": "894231",
        "actual": 42.5,
        "expected": {
          "mean": 8.3,
          "p50": 7.9,
          "p95": 12.1
        },
        "pct_change": 412.0,
        "z_score": 4.8
      }
    ]
  },
  "context": {
    "metric_history": {
      "data_points": [
        {"date": "2026-03-25", "credits": 8.1},
        {"date": "2026-03-26", "credits": 7.9},
        {"date": "2026-03-27", "credits": 9.2},
        {"date": "2026-03-28", "credits": 42.5}
      ],
      "trend": "spike",
      "volatility": "low"
    },
    "recent_config_changes": [
      {
        "config_id": "894231",
        "config_name": "BQ Marketing Extract",
        "changed_at": "2026-03-27T14:30:00Z",
        "changed_by": "betka@company.com",
        "change_description": "Updated query parameters"
      }
    ]
  },
  "suggested_actions": [
    {
      "command": "kbagent --json config detail --project billing-prod --component-id keboola.ex-google-bigquery-v2 --config-id 894231",
      "description": "Inspect config that changed 1 day before spike",
      "priority": "high",
      "rationale": "Config 894231 was modified by betka@company.com on 2026-03-27, credits spiked on 2026-03-28"
    },
    {
      "command": "kbagent --json job list --project billing-prod --component-id keboola.ex-google-bigquery-v2 --limit 50",
      "description": "List recent jobs for this component",
      "priority": "medium",
      "rationale": "Compare job durations and credits before/after config change"
    },
    {
      "command": "kbagent observe query --sql \"SELECT DATE_TRUNC('day', start_time) as day, COUNT(*), SUM(credits_consumed) FROM jobs WHERE component_id = 'keboola.ex-google-bigquery-v2' AND start_time >= '2026-03-20' GROUP BY 1 ORDER BY 1\"",
      "description": "Daily trend for this component",
      "priority": "medium",
      "rationale": "See exact day the spike started"
    }
  ]
}
```

Key properties:
- **Self-contained**: violation data + statistical context (mean, p50, p95) + config change correlation
- **Executable actions**: not prose hints, but actual commands with priority and rationale — AI can run them directly
- **Human-readable**: a human can scan it and understand the problem without AI

### CLI Commands

```bash
# === Define (with AI assistance) ===
kbagent check templates                                    # list available templates
kbagent check create --template credit-spike --project X   # AI-assisted creation
kbagent check validate .kbagent/checks/credit-spike.yaml   # validate YAML

# === Run (no AI, deterministic) ===
kbagent check run --name credit-spike                       # single check
kbagent check run --all [--project ALIAS]                   # all checks
kbagent --json check run --all                              # JSON output with briefings
kbagent check run --name credit-spike --dry-run             # show SQL + what would run
kbagent check render --name credit-spike                    # show generated DuckDB SQL

# === Sync checks to/from Keboola ===
kbagent check push [--name NAME | --all] --project ALIAS   # → Storage Files (permanent)
kbagent check pull [--project ALIAS]                        # ← Storage Files
kbagent check list [--project ALIAS] [--remote]

# === History ===
kbagent check history [--name NAME] [--limit 10]
```

### Who Runs It?

`kbagent check run` is a plain CLI command. Anything can schedule it:

| Runner | How | AI investigation |
|--------|-----|-----------------|
| Local cron | `0 7 * * * kbagent observe sync && kbagent check run --all` | Manual (read briefing) |
| Claude Code /schedule | Daily agent prompt | Agent auto-investigates on failure |
| Keboola transformation | Python subprocess call | Write briefings to output table |
| KAI bot | Platform-native trigger | Bot investigates + notifies |

---

## Scope: Platform Observability, NOT Data Quality

This is an important distinction. `kbagent check` monitors **platform behavior** (credits, jobs, configs, pipelines), not **data inside tables** (row counts, null rates, column distributions).

```
┌─────────────────────┐
│ Data Quality Tools  │  Soda, Great Expectations, Elementary, Monte Carlo
│ (row counts, nulls, │  → monitor DATA inside tables
│  schema drift, etc) │  → requires warehouse access
└─────────────────────┘

┌─────────────────────┐
│ kbagent observe +   │  ← THIS PROPOSAL
│ kbagent check       │  → monitor PLATFORM behavior
│ (jobs, credits,     │  → zero infrastructure, local DuckDB
│  configs, imports)  │  → AI-ready investigation handoff
└─────────────────────┘
```

If users need data-inside-tables monitoring (e.g., "conversion rate is dropping"), that's a **separate system built on top of kbagent** — using `kbagent workspace query` or MCP tools for data access. The CLI provides the toolkit; the orchestration layer (Claude Code agent, Agent SDK app, KAI) handles the investigation loop.

---

## Implementation Phases

### Phase 1 — MVP: observe + 2 templates + briefing

- `kbagent observe sync` with incremental DuckDB caching
- `kbagent observe query --sql "..."` for ad-hoc queries
- `kbagent observe stats` for database info
- 2 check templates: `credit-spike` + `custom`
- `kbagent check run` / `check list` / `check validate`
- `kbagent check render --name X` (show generated SQL)
- `kbagent check run --dry-run`
- Briefing output with `suggested_actions` and statistical context

### Phase 2 — Full templates + persistence

- Remaining templates: `error-rate`, `job-slowdown`, `import-volume-drop`, `data-freshness`
- Config version history in observe.db (config change correlation in briefings)
- Storage Files push/pull for check definitions (permanent, tagged)
- `check history` command
- `check run --store-results`

### Phase 3 — Assisted creation + plugin

- `kbagent check create` with AI-assisted parameter tuning
- `kbagent check templates` with descriptions and examples
- kbagent plugin playbook for investigation patterns
- Integration guide: Claude Code /schedule + check run + investigation loop

---

## Open Questions

1. **Observe data scope**: How many days of job history to fetch by default? 30? 90? Queue API has pagination for large projects.

2. **Multi-project observe**: Should observe.db contain data from all projects (one DB) or per-project (one DB per alias)?

3. **Multi-project checks**: Should one check span multiple projects? MVP: single project per check.

4. **Notification**: Part of check YAML or runner responsibility? Leaning toward runner-handles-notification.

5. **Config versions fetch**: Extra API calls per component/config for change correlation. Fetch eagerly during observe sync, or lazily only for components in violations?

6. **Noise controls**: `cooldown`, `min_consecutive_failures`, `snooze_until` — should these be in the check YAML or in the runner?

---

## Review Notes

This proposal was reviewed by OpenAI (Codex gpt-5.3), Google (Gemini), and Anthropic (Claude Sonnet). Key consensus points:

- **Templates are the right abstraction** — with `custom` as escape hatch and `check render` for trust/debugging
- **Briefing with executable actions is the core value prop** — not prose hints, but commands AI can run directly
- **Incremental caching is critical for UX** — without it, every check run takes 30-60s (unacceptable)
- **Positioning: "Keboola-native operational guardrails + AI-ready root-cause handoff"** — not competing with data quality tools (Soda, GE, Monte Carlo)
- **Phase 1 must feel magical end-to-end** — one template (credit-spike) that detects, briefs, and enables AI investigation in under 5 seconds

---

*Feedback welcome — especially on the observe/check split, template selection, briefing format, and whether this covers real monitoring use cases.*

DuckDB table	API source	Key columns
`jobs`	Queue API `/search/jobs`	component_id, config_id, config_name, status, duration_seconds, credits_consumed, start_time, end_time, result_message
`storage_events`	Storage API `/events`	type (tableImport, tableExport, ...), table_id, bytes, created, creator_token_description
`config_versions`	Storage API `/components//configs//versions`	component_id, config_id, version, created, creator_token_description, change_description
`table_stats`	Storage API `/tables`	id, bucket_id, name, rows_count, data_size_bytes, last_import_date, last_change_date

	Workspace (Snowflake/BQ)	DuckDB (in-process)
Startup	5-15s + cloud compute	~0ms
Cost	Snowflake credits / BQ slots	Zero
SQL dialect	Depends on project backend	Always PostgreSQL-compatible
Dependency	keboola-mcp-server + workspace	`pip install duckdb` (12MB)
Offline	No	Yes
History	Limited to API retention	Permanent local archive

	`sync pull`	`observe sync`
Purpose	Edit and push-back configurations	Analyze platform activity
Downloads	Configs (YAML), storage metadata, 5 recent jobs per config	Full job history, storage events, config version history
Format	Files on disk (human-readable, git-friendly)	Single DuckDB file (query-friendly)
Queryable	No (read JSON files manually)	Yes (DuckDB SQL, milliseconds)
Question it answers	"What does my project look like now?"	"What has been happening in my project?"

Runner	How	AI investigation
Local cron	`0 7 * * * kbagent observe sync && kbagent check run --all`	Manual (read briefing)
Claude Code /schedule	Daily agent prompt	Agent auto-investigates on failure
Keboola transformation	Python subprocess call	Write briefings to output table
KAI bot	Platform-native trigger	Bot investigates + notifies

kbagent check: Platform observability with AI-powered investigation #74

Description

Problem

Two New Building Blocks

1. kbagent observe — Local Platform Metadata Warehouse

2. kbagent check — Automated Assertions on Observe Data

kbagent observe — The Foundation

What's in observe.db

Why DuckDB

Incremental Sync

CLI Commands

Relationship with sync pull

kbagent check — Assertions on Observe Data

Three-Phase Design, AI Only Where Needed

Strategy Templates

credit-spike — Component burning more credits than usual

job-slowdown — Jobs running slower than their baseline

import-volume-drop — Data pipeline stopped or degraded

error-rate — Component failing too often

data-freshness — Tables not updated on time

custom — Raw DuckDB SQL for power users

The Briefing (failure output for AI)

CLI Commands

Who Runs It?

Scope: Platform Observability, NOT Data Quality

Implementation Phases

Phase 1 — MVP: observe + 2 templates + briefing

Phase 2 — Full templates + persistence

Phase 3 — Assisted creation + plugin

Open Questions

Review Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `kbagent observe` — Local Platform Metadata Warehouse

2. `kbagent check` — Automated Assertions on Observe Data

`kbagent observe` — The Foundation

Relationship with `sync pull`

`kbagent check` — Assertions on Observe Data

`credit-spike` — Component burning more credits than usual

`job-slowdown` — Jobs running slower than their baseline

`import-volume-drop` — Data pipeline stopped or degraded

`error-rate` — Component failing too often

`data-freshness` — Tables not updated on time

`custom` — Raw DuckDB SQL for power users