Problem
Keboola users run dozens of components on schedules. When something goes wrong, they find out too late — and even when they notice, understanding why is the painful part.
Today's workflow:
- Write a SQL transformation to detect anomalies (tedious, per-metric)
- Wait for it to run on schedule
- When it fires: manually dig through GCP Cost Explorer / Keboola UI to figure out what happened
- 30-60 minutes of clicking around to find root cause
- Repeat for every metric you care about
What users actually want:
- "Credits consumption doubled this week — who caused it and what changed?"
- "This extractor is getting slower every day — since when and why?"
- "Data imports dropped 80% on Tuesday — which pipeline broke?"
The gap: detection is possible with SQL, but investigation (root cause analysis) requires iterative exploration that's tedious for humans and perfect for AI.
Two New Building Blocks
1. kbagent observe — Local Platform Metadata Warehouse
A local DuckDB database that mirrors your Keboola project's operational metadata. Syncs incrementally from APIs, queryable in milliseconds, works offline.
2. kbagent check — Automated Assertions on Observe Data
YAML-defined checks (from templates) that run against the local observe database. No AI at runtime. When a check fails, it produces a structured briefing JSON that any AI agent can pick up for investigation.
┌─────────────────────────────────────────────────────────────┐
│ PHASE 1: OBSERVE SYNC (automated, incremental) │
│ │
│ Keboola APIs ──→ .kbagent/observe.db (DuckDB) │
│ (jobs, storage events, config versions, table stats) │
│ │
│ First run: 30-60s │ Next runs: 2-5s (watermark-based) │
└──────────────┬──────────────────────────────────────────────┘
│
┌───────┴────────────┐
│ │
┌──────▼──────────┐ ┌──────▼──────────────────────────────┐
│ kbagent check │ │ Your local AI │
│ run --all │ │ │
│ │ │ Claude Code / Agent SDK / KAI │
│ Deterministic │ │ │
│ SQL assertions │ │ Ad-hoc: "Why did credits spike?" │
│ from YAML │ │ → kbagent observe query --sql "..." │
│ templates │ │ → iterative drill-down │
│ │ │ → milliseconds, offline, free │
│ → briefing JSON │ │ │
└─────────────────┘ └─────────────────────────────────────┘
kbagent observe — The Foundation
What's in observe.db
All data comes from existing Keboola APIs. Volume is small (thousands of records), perfect for DuckDB.
| DuckDB table |
API source |
Key columns |
jobs |
Queue API /search/jobs |
component_id, config_id, config_name, status, duration_seconds, credits_consumed, start_time, end_time, result_message |
storage_events |
Storage API /events |
type (tableImport, tableExport, ...), table_id, bytes, created, creator_token_description |
config_versions |
Storage API /components/*/configs/*/versions |
component_id, config_id, version, created, creator_token_description, change_description |
table_stats |
Storage API /tables |
id, bucket_id, name, rows_count, data_size_bytes, last_import_date, last_change_date |
Why DuckDB
|
Workspace (Snowflake/BQ) |
DuckDB (in-process) |
| Startup |
5-15s + cloud compute |
~0ms |
| Cost |
Snowflake credits / BQ slots |
Zero |
| SQL dialect |
Depends on project backend |
Always PostgreSQL-compatible |
| Dependency |
keboola-mcp-server + workspace |
pip install duckdb (12MB) |
| Offline |
No |
Yes |
| History |
Limited to API retention |
Permanent local archive |
Incremental Sync
# First run: fetch all jobs for lookback period (30-60s)
# Subsequent runs: fetch only since last watermark (2-5s)
# .kbagent/observe.db grows ~5-20MB per project
Keboola storage events have ~30-day retention. The local observe.db keeps data forever, becoming a permanent archive of platform activity that outlives API retention.
CLI Commands
# Sync metadata (incremental by default)
kbagent observe sync [--project ALIAS] # delta since last sync, 2-5s
kbagent observe sync --full [--project ALIAS] # full refresh, 30-60s
# Query the local database directly
kbagent observe query --sql "
SELECT component_id, DATE_TRUNC('day', start_time) as day,
SUM(credits_consumed) as credits, COUNT(*) as runs
FROM jobs
WHERE start_time >= '2026-03-01'
GROUP BY 1, 2
ORDER BY credits DESC
LIMIT 20
"
# Or connect directly from Python / AI agent
# duckdb.connect('.kbagent/observe.db').sql('SELECT ...')
# Stats about the local database
kbagent observe stats [--project ALIAS]
# → jobs: 12,450 (2025-12-01 to 2026-03-29)
# → storage_events: 34,200
# → config_versions: 890
# → db size: 8.2 MB
# → last sync: 2026-03-29T07:00:12Z
Relationship with sync pull
These are complementary, not competing commands:
|
sync pull |
observe sync |
| Purpose |
Edit and push-back configurations |
Analyze platform activity |
| Downloads |
Configs (YAML), storage metadata, 5 recent jobs per config |
Full job history, storage events, config version history |
| Format |
Files on disk (human-readable, git-friendly) |
Single DuckDB file (query-friendly) |
| Queryable |
No (read JSON files manually) |
Yes (DuckDB SQL, milliseconds) |
| Question it answers |
"What does my project look like now?" |
"What has been happening in my project?" |
sync pull = state snapshot (for editing).
observe sync = time series (for analytics).
kbagent check — Assertions on Observe Data
Three-Phase Design, AI Only Where Needed
Phase 1: DEFINE (human + AI, one-time)
User and AI pick from strategy templates, customize thresholds → YAML file in .kbagent/checks/. AI is involved here. After this, everything is automated.
Phase 2: RUN (automated, NO AI, deterministic)
kbagent check run --all queries the local observe.db, evaluates pass/fail, produces briefing on failure. No AI calls. No cloud compute. Pure SQL.
Phase 3: INVESTIGATE (AI picks up the briefing, only on failure)
The briefing JSON is self-contained. Any AI agent can pick it up: Claude Code interactively, Claude Code /schedule automated, KAI bot, or custom Agent SDK app.
Strategy Templates
Users don't write SQL — they pick a template and set parameters.
credit-spike — Component burning more credits than usual
apiVersion: kbagent/v1
kind: Check
metadata:
name: credit-spike
description: "Alert when any component burns >50% more credits than usual"
severity: warning
tags: [credits, cost-monitoring]
spec:
project: my-project
template: credit-spike
params:
lookback_days: 14 # baseline period
window_hours: 24 # detection window
threshold_pct: 50 # alert if >50% above average
min_daily_credits: 0.5 # ignore negligible usage
group_by: component_id # or: [component_id, config_id]
job-slowdown — Jobs running slower than their baseline
spec:
template: job-slowdown
params:
lookback_days: 30
slowdown_factor: 3 # alert if duration > 3x baseline
baseline_metric: p95 # p50 | p95 | mean
min_duration_seconds: 60
status_filter: [success]
import-volume-drop — Data pipeline stopped or degraded
spec:
template: import-volume-drop
params:
lookback_weeks: 4
drop_threshold_pct: 50
compare_by: weekday # Mon vs Mon, Tue vs Tue (seasonality)
timezone: "Europe/Prague" # for correct weekday comparison
error-rate — Component failing too often
spec:
template: error-rate
params:
window_hours: 24
threshold_pct: 20
min_jobs: 5 # ignore components with <5 runs (noisy)
data-freshness — Tables not updated on time
spec:
template: data-freshness
params:
staleness_factor: 2 # alert if gap > 2x usual interval
tables: # specific tables (optional: all if omitted)
- in.c-billing.daily_costs
- in.c-crm.contacts
custom — Raw DuckDB SQL for power users
spec:
template: custom
query: |
-- DuckDB SQL against: jobs, storage_events, config_versions, table_stats
-- Returns rows that VIOLATE the rule (empty result = pass)
SELECT component_id, daily_credits, avg_credits, pct_change
FROM ( ... )
WHERE pct_change > 50
expect:
type: empty
The Briefing (failure output for AI)
When a check fails, it produces a self-contained JSON briefing designed for AI consumption:
{
"briefing_version": "1",
"generated_at": "2026-03-29T07:00:12Z",
"project": {
"alias": "billing-prod",
"id": 12345,
"stack": "connection.north-europe.azure.keboola.com"
},
"check": {
"name": "credit-spike",
"description": "Alert when any component burns >50% more credits than usual",
"severity": "warning",
"template": "credit-spike"
},
"result": {
"status": "fail",
"violations": [
{
"component_id": "keboola.ex-google-bigquery-v2",
"config_id": "894231",
"actual": 42.5,
"expected": {
"mean": 8.3,
"p50": 7.9,
"p95": 12.1
},
"pct_change": 412.0,
"z_score": 4.8
}
]
},
"context": {
"metric_history": {
"data_points": [
{"date": "2026-03-25", "credits": 8.1},
{"date": "2026-03-26", "credits": 7.9},
{"date": "2026-03-27", "credits": 9.2},
{"date": "2026-03-28", "credits": 42.5}
],
"trend": "spike",
"volatility": "low"
},
"recent_config_changes": [
{
"config_id": "894231",
"config_name": "BQ Marketing Extract",
"changed_at": "2026-03-27T14:30:00Z",
"changed_by": "betka@company.com",
"change_description": "Updated query parameters"
}
]
},
"suggested_actions": [
{
"command": "kbagent --json config detail --project billing-prod --component-id keboola.ex-google-bigquery-v2 --config-id 894231",
"description": "Inspect config that changed 1 day before spike",
"priority": "high",
"rationale": "Config 894231 was modified by betka@company.com on 2026-03-27, credits spiked on 2026-03-28"
},
{
"command": "kbagent --json job list --project billing-prod --component-id keboola.ex-google-bigquery-v2 --limit 50",
"description": "List recent jobs for this component",
"priority": "medium",
"rationale": "Compare job durations and credits before/after config change"
},
{
"command": "kbagent observe query --sql \"SELECT DATE_TRUNC('day', start_time) as day, COUNT(*), SUM(credits_consumed) FROM jobs WHERE component_id = 'keboola.ex-google-bigquery-v2' AND start_time >= '2026-03-20' GROUP BY 1 ORDER BY 1\"",
"description": "Daily trend for this component",
"priority": "medium",
"rationale": "See exact day the spike started"
}
]
}
Key properties:
- Self-contained: violation data + statistical context (mean, p50, p95) + config change correlation
- Executable actions: not prose hints, but actual commands with priority and rationale — AI can run them directly
- Human-readable: a human can scan it and understand the problem without AI
CLI Commands
# === Define (with AI assistance) ===
kbagent check templates # list available templates
kbagent check create --template credit-spike --project X # AI-assisted creation
kbagent check validate .kbagent/checks/credit-spike.yaml # validate YAML
# === Run (no AI, deterministic) ===
kbagent check run --name credit-spike # single check
kbagent check run --all [--project ALIAS] # all checks
kbagent --json check run --all # JSON output with briefings
kbagent check run --name credit-spike --dry-run # show SQL + what would run
kbagent check render --name credit-spike # show generated DuckDB SQL
# === Sync checks to/from Keboola ===
kbagent check push [--name NAME | --all] --project ALIAS # → Storage Files (permanent)
kbagent check pull [--project ALIAS] # ← Storage Files
kbagent check list [--project ALIAS] [--remote]
# === History ===
kbagent check history [--name NAME] [--limit 10]
Who Runs It?
kbagent check run is a plain CLI command. Anything can schedule it:
| Runner |
How |
AI investigation |
| Local cron |
0 7 * * * kbagent observe sync && kbagent check run --all |
Manual (read briefing) |
| Claude Code /schedule |
Daily agent prompt |
Agent auto-investigates on failure |
| Keboola transformation |
Python subprocess call |
Write briefings to output table |
| KAI bot |
Platform-native trigger |
Bot investigates + notifies |
Scope: Platform Observability, NOT Data Quality
This is an important distinction. kbagent check monitors platform behavior (credits, jobs, configs, pipelines), not data inside tables (row counts, null rates, column distributions).
┌─────────────────────┐
│ Data Quality Tools │ Soda, Great Expectations, Elementary, Monte Carlo
│ (row counts, nulls, │ → monitor DATA inside tables
│ schema drift, etc) │ → requires warehouse access
└─────────────────────┘
┌─────────────────────┐
│ kbagent observe + │ ← THIS PROPOSAL
│ kbagent check │ → monitor PLATFORM behavior
│ (jobs, credits, │ → zero infrastructure, local DuckDB
│ configs, imports) │ → AI-ready investigation handoff
└─────────────────────┘
If users need data-inside-tables monitoring (e.g., "conversion rate is dropping"), that's a separate system built on top of kbagent — using kbagent workspace query or MCP tools for data access. The CLI provides the toolkit; the orchestration layer (Claude Code agent, Agent SDK app, KAI) handles the investigation loop.
Implementation Phases
Phase 1 — MVP: observe + 2 templates + briefing
kbagent observe sync with incremental DuckDB caching
kbagent observe query --sql "..." for ad-hoc queries
kbagent observe stats for database info
- 2 check templates:
credit-spike + custom
kbagent check run / check list / check validate
kbagent check render --name X (show generated SQL)
kbagent check run --dry-run
- Briefing output with
suggested_actions and statistical context
Phase 2 — Full templates + persistence
- Remaining templates:
error-rate, job-slowdown, import-volume-drop, data-freshness
- Config version history in observe.db (config change correlation in briefings)
- Storage Files push/pull for check definitions (permanent, tagged)
check history command
check run --store-results
Phase 3 — Assisted creation + plugin
kbagent check create with AI-assisted parameter tuning
kbagent check templates with descriptions and examples
- kbagent plugin playbook for investigation patterns
- Integration guide: Claude Code /schedule + check run + investigation loop
Open Questions
-
Observe data scope: How many days of job history to fetch by default? 30? 90? Queue API has pagination for large projects.
-
Multi-project observe: Should observe.db contain data from all projects (one DB) or per-project (one DB per alias)?
-
Multi-project checks: Should one check span multiple projects? MVP: single project per check.
-
Notification: Part of check YAML or runner responsibility? Leaning toward runner-handles-notification.
-
Config versions fetch: Extra API calls per component/config for change correlation. Fetch eagerly during observe sync, or lazily only for components in violations?
-
Noise controls: cooldown, min_consecutive_failures, snooze_until — should these be in the check YAML or in the runner?
Review Notes
This proposal was reviewed by OpenAI (Codex gpt-5.3), Google (Gemini), and Anthropic (Claude Sonnet). Key consensus points:
- Templates are the right abstraction — with
custom as escape hatch and check render for trust/debugging
- Briefing with executable actions is the core value prop — not prose hints, but commands AI can run directly
- Incremental caching is critical for UX — without it, every check run takes 30-60s (unacceptable)
- Positioning: "Keboola-native operational guardrails + AI-ready root-cause handoff" — not competing with data quality tools (Soda, GE, Monte Carlo)
- Phase 1 must feel magical end-to-end — one template (credit-spike) that detects, briefs, and enables AI investigation in under 5 seconds
Feedback welcome — especially on the observe/check split, template selection, briefing format, and whether this covers real monitoring use cases.
Problem
Keboola users run dozens of components on schedules. When something goes wrong, they find out too late — and even when they notice, understanding why is the painful part.
Today's workflow:
What users actually want:
The gap: detection is possible with SQL, but investigation (root cause analysis) requires iterative exploration that's tedious for humans and perfect for AI.
Two New Building Blocks
1.
kbagent observe— Local Platform Metadata WarehouseA local DuckDB database that mirrors your Keboola project's operational metadata. Syncs incrementally from APIs, queryable in milliseconds, works offline.
2.
kbagent check— Automated Assertions on Observe DataYAML-defined checks (from templates) that run against the local observe database. No AI at runtime. When a check fails, it produces a structured briefing JSON that any AI agent can pick up for investigation.
kbagent observe— The FoundationWhat's in observe.db
All data comes from existing Keboola APIs. Volume is small (thousands of records), perfect for DuckDB.
jobs/search/jobsstorage_events/eventsconfig_versions/components/*/configs/*/versionstable_stats/tablesWhy DuckDB
pip install duckdb(12MB)Incremental Sync
Keboola storage events have ~30-day retention. The local observe.db keeps data forever, becoming a permanent archive of platform activity that outlives API retention.
CLI Commands
Relationship with
sync pullThese are complementary, not competing commands:
sync pullobserve syncsync pull= state snapshot (for editing).observe sync= time series (for analytics).kbagent check— Assertions on Observe DataThree-Phase Design, AI Only Where Needed
Phase 1: DEFINE (human + AI, one-time)
User and AI pick from strategy templates, customize thresholds → YAML file in
.kbagent/checks/. AI is involved here. After this, everything is automated.Phase 2: RUN (automated, NO AI, deterministic)
kbagent check run --allqueries the local observe.db, evaluates pass/fail, produces briefing on failure. No AI calls. No cloud compute. Pure SQL.Phase 3: INVESTIGATE (AI picks up the briefing, only on failure)
The briefing JSON is self-contained. Any AI agent can pick it up: Claude Code interactively, Claude Code
/scheduleautomated, KAI bot, or custom Agent SDK app.Strategy Templates
Users don't write SQL — they pick a template and set parameters.
credit-spike— Component burning more credits than usualjob-slowdown— Jobs running slower than their baselineimport-volume-drop— Data pipeline stopped or degradederror-rate— Component failing too oftendata-freshness— Tables not updated on timecustom— Raw DuckDB SQL for power usersThe Briefing (failure output for AI)
When a check fails, it produces a self-contained JSON briefing designed for AI consumption:
{ "briefing_version": "1", "generated_at": "2026-03-29T07:00:12Z", "project": { "alias": "billing-prod", "id": 12345, "stack": "connection.north-europe.azure.keboola.com" }, "check": { "name": "credit-spike", "description": "Alert when any component burns >50% more credits than usual", "severity": "warning", "template": "credit-spike" }, "result": { "status": "fail", "violations": [ { "component_id": "keboola.ex-google-bigquery-v2", "config_id": "894231", "actual": 42.5, "expected": { "mean": 8.3, "p50": 7.9, "p95": 12.1 }, "pct_change": 412.0, "z_score": 4.8 } ] }, "context": { "metric_history": { "data_points": [ {"date": "2026-03-25", "credits": 8.1}, {"date": "2026-03-26", "credits": 7.9}, {"date": "2026-03-27", "credits": 9.2}, {"date": "2026-03-28", "credits": 42.5} ], "trend": "spike", "volatility": "low" }, "recent_config_changes": [ { "config_id": "894231", "config_name": "BQ Marketing Extract", "changed_at": "2026-03-27T14:30:00Z", "changed_by": "betka@company.com", "change_description": "Updated query parameters" } ] }, "suggested_actions": [ { "command": "kbagent --json config detail --project billing-prod --component-id keboola.ex-google-bigquery-v2 --config-id 894231", "description": "Inspect config that changed 1 day before spike", "priority": "high", "rationale": "Config 894231 was modified by betka@company.com on 2026-03-27, credits spiked on 2026-03-28" }, { "command": "kbagent --json job list --project billing-prod --component-id keboola.ex-google-bigquery-v2 --limit 50", "description": "List recent jobs for this component", "priority": "medium", "rationale": "Compare job durations and credits before/after config change" }, { "command": "kbagent observe query --sql \"SELECT DATE_TRUNC('day', start_time) as day, COUNT(*), SUM(credits_consumed) FROM jobs WHERE component_id = 'keboola.ex-google-bigquery-v2' AND start_time >= '2026-03-20' GROUP BY 1 ORDER BY 1\"", "description": "Daily trend for this component", "priority": "medium", "rationale": "See exact day the spike started" } ] }Key properties:
CLI Commands
Who Runs It?
kbagent check runis a plain CLI command. Anything can schedule it:0 7 * * * kbagent observe sync && kbagent check run --allScope: Platform Observability, NOT Data Quality
This is an important distinction.
kbagent checkmonitors platform behavior (credits, jobs, configs, pipelines), not data inside tables (row counts, null rates, column distributions).If users need data-inside-tables monitoring (e.g., "conversion rate is dropping"), that's a separate system built on top of kbagent — using
kbagent workspace queryor MCP tools for data access. The CLI provides the toolkit; the orchestration layer (Claude Code agent, Agent SDK app, KAI) handles the investigation loop.Implementation Phases
Phase 1 — MVP: observe + 2 templates + briefing
kbagent observe syncwith incremental DuckDB cachingkbagent observe query --sql "..."for ad-hoc querieskbagent observe statsfor database infocredit-spike+customkbagent check run/check list/check validatekbagent check render --name X(show generated SQL)kbagent check run --dry-runsuggested_actionsand statistical contextPhase 2 — Full templates + persistence
error-rate,job-slowdown,import-volume-drop,data-freshnesscheck historycommandcheck run --store-resultsPhase 3 — Assisted creation + plugin
kbagent check createwith AI-assisted parameter tuningkbagent check templateswith descriptions and examplesOpen Questions
Observe data scope: How many days of job history to fetch by default? 30? 90? Queue API has pagination for large projects.
Multi-project observe: Should observe.db contain data from all projects (one DB) or per-project (one DB per alias)?
Multi-project checks: Should one check span multiple projects? MVP: single project per check.
Notification: Part of check YAML or runner responsibility? Leaning toward runner-handles-notification.
Config versions fetch: Extra API calls per component/config for change correlation. Fetch eagerly during observe sync, or lazily only for components in violations?
Noise controls:
cooldown,min_consecutive_failures,snooze_until— should these be in the check YAML or in the runner?Review Notes
This proposal was reviewed by OpenAI (Codex gpt-5.3), Google (Gemini), and Anthropic (Claude Sonnet). Key consensus points:
customas escape hatch andcheck renderfor trust/debuggingFeedback welcome — especially on the observe/check split, template selection, briefing format, and whether this covers real monitoring use cases.