Skip to content

feat: managed database demo flow — explicit flags, catalog resolution, BM25 search#127

Merged
eddietejeda merged 9 commits into
mainfrom
feat/managed-db-demo-flow
Jun 4, 2026
Merged

feat: managed database demo flow — explicit flags, catalog resolution, BM25 search#127
eddietejeda merged 9 commits into
mainfrom
feat/managed-db-demo-flow

Conversation

@eddietejeda
Copy link
Copy Markdown
Contributor

Summary

  • Explicit flags for databases load: replaced positional target with --catalog, --schema, --table (plus --file/--url/--upload-id)
  • databases unset command: clears the active database from config
  • Star marker in databases list: highlights the active database (mirrors active workspace UX)
  • Auto-declare table on load: when a managed table was not declared at create time, the CLI deletes + recreates the database with the table declared and retries the load transparently
  • Catalog-first resolution in try_resolve_database: catalog alias matches are preferred over display name matches to avoid stale name collisions
  • Active-database preference in resolve_connection_id: checks the active database first (before scanning the connections list) so stale managed connections with the same catalog name are bypassed — fixes indexes create and search targeting the wrong connection
  • infer_for_search uses resolve_connection_id: catalog fallback + active-database preference flow through consistently
  • Fix duplicate score column in BM25 search: when --select already includes score, it is not appended again
  • Explicit flags for indexes create: replaced dot-notation with --catalog, --schema, --table, --column

Demo flow (now works end-to-end)

hotdata databases create --catalog airbnb

hotdata databases load   --catalog airbnb   --schema public   --table listings   --url https://hotdata.dev/data/sf-airbnb-listings.parquet

hotdata indexes create   --catalog airbnb   --schema public   --table listings   --column description   --type bm25

hotdata search "cozy apartment with a view"   --table airbnb.listings   --column description   --select name,description

Test plan

  • All 180 unit tests pass (cargo test)
  • databases create --catalog airbnb → creates DB, sets active
  • databases load --catalog airbnb --table listings --url ... → auto-declares + loads 7,535 rows with correct full_name: airbnb.public.listings
  • databases list → active DB marked with *
  • indexes create --catalog airbnb ... → creates index on the correct (active) connection, not a stale one
  • search ... --table airbnb.listings → returns ranked BM25 results with scores

…list

- Replace dot-notation positional <TARGET> on `databases load` with
  explicit --catalog, --schema, --table flags
- Add `databases unset` to clear the active database from config
- Show * marker on the active database in `databases list`
- Remove parse_db_target and its tests (no longer needed)
…e on load

- try_resolve_database now falls through to match by default_catalog when
  neither id nor name match, so --catalog works as a lookup key everywhere
- databases load auto-recovers from "not declared": deletes the empty
  database, recreates it with the table declared, then retries the load
- Add default_catalog to DatabaseSummary so the list response can be
  matched without a per-row fetch
…x duplicate score column

- resolve_connection_id falls back to managed database catalog lookup so
  `airbnb4.listings[description]` works in indexes create and search
- BM25 search no longer appends 'score' when --select already includes it
…og/--schema/--table/--column flags

Removes the positional `connection.table[col1,col2]` target argument and
parse_index_target helper. All index creation now uses named flags,
consistent with databases load and search.
@sentry
Copy link
Copy Markdown

sentry Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 5.62500% with 151 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/databases.rs 8.82% 93 Missing ⚠️
src/main.rs 0.00% 36 Missing ⚠️
src/connections.rs 0.00% 18 Missing ⚠️
src/api.rs 0.00% 3 Missing ⚠️
src/indexes.rs 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread src/databases.rs
Comment thread src/api.rs Outdated
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Blocking Issues

  • src/databases.rs (auto-declare-on-load, ~823-857): the not declared recovery deletes the entire managed database and recreates it declaring only the single table being loaded. This silently destroys any other declared tables and their loaded data, orphans indexes tied to the old default_connection_id, and discards expires_at. See inline comment for a suggested additive-declare alternative.

Action Required

  • Replace delete+recreate with an additive table declaration (e.g. PATCH /databases/{id}), or — if recreate is unavoidable — preserve the existing declared tables and expires_at and warn/confirm before deleting.

Non-blocking

  • src/api.rs:184 — workspace_id() splits test_new's doc comment (see inline comment).
  • src/databases.rs:656 — the databases create success hint still prints the old dot-notation databases load --file <path.parquet> {}.<table_name>, which no longer works after this PR's switch to --catalog/--schema/--table.

- databases load: explicit --catalog/--schema/--table flags (no more dot-notation)
- databases list: note * marker on active database
- databases set/unset: documented
- indexes create: --catalog option for managed databases (in addition to --connection-id)
- search: --type and --column are now optional (inferred from indexes)
- workflows: updated examples throughout
Comment thread src/databases.rs Outdated
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Blocking Issues

  • src/databases.rs:826-857 — The auto-declare path still deletes the entire managed database and recreates it with only the single target table declared and expires_at: None. This drops any other declared tables (and their loaded data), orphans indexes bound to the old default_connection_id, discards the original expiry, and does so transparently with no warning. This was flagged in cycle 1 and remains unaddressed in the code.

Action Required

  • Make table declaration additive (e.g. PATCH /databases/{id} via the existing ApiClient::patch) instead of delete+recreate. If delete+recreate must stay, preserve the existing declared tables and expires_at in the recreate request, and warn/confirm before deleting.

- auto-declare: collect existing tables before delete+recreate so they
  are preserved in the new database; also pass expires_at through
- databases create hint: update to new --catalog/--table flag syntax
- api.rs: fix workspace_id() doc comment placement
Comment thread src/databases.rs
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Blocking Issues

  • src/databases.rs (load auto-declare path, ~L826-874) — The undeclared-table load path still deletes the entire managed database and recreates it. This cycle correctly preserves the table declarations, name, default_catalog, and expires_at, but re-declaring a table does not restore its loaded data: any table already populated is left empty after the delete+recreate. It also still runs with no warning/confirmation, and mints a new default_connection_id (orphaning prior indexes). See inline comment.

Action Required

Either declare the new table additively via PATCH /databases/{id} (no data destroyed), or — if delete+recreate must stay — detect when any existing table is synced and warn/confirm before deleting, since that data will be lost.

When a 'not declared' error triggers delete+recreate, check if any
existing tables are synced. If so, show a yellow warning listing the
tables whose data will be lost and prompt for y/N confirmation.

In non-interactive mode (CI, piped stdin, --no-input) the command
errors out with a clear message instead of silently destroying data.
claude[bot]
claude Bot previously approved these changes Jun 4, 2026
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cycle 4 — all prior blocking feedback resolved.

The auto-declare path now implements the accepted fallback from cycle 3: it detects synced tables, warns explicitly which tables' data will be lost, prompts for [y/N] confirmation in interactive mode, and aborts in non-interactive mode rather than destroying data silently (databases.rs:842-880). The api.rs doc-comment nit is also addressed. No new blocking issues.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cycle 4 — all prior blocking issues resolved.

The data-loss concern is now addressed: the auto-declare path preserves existing table declarations, name, catalog, and expires_at, detects tables with synced data, warns listing exactly which tables would lose data, confirms before deleting in interactive mode, and refuses (aborts) in non-interactive mode rather than destroying data silently. This matches the previously-stated minimum acceptance criteria for the delete+recreate path.

The api.rs doc-comment nit is also resolved. Approving.

@eddietejeda eddietejeda merged commit 1f9f5eb into main Jun 4, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant