feat: dot notation for indexes create and databases load#92
Conversation
… notation for databases load - `hotdata indexes create airbnb.listings[col1,col2] --type bm25` replaces verbose --connection-id/--schema/--table/--columns flags - `hotdata databases load airbnb.listings --url ...` replaces `hotdata databases tables load airbnb listings --url ...` - `--name` on indexes create is now optional (auto-derived from table+cols+type) - Add `connections::resolve_connection_id` to look up connection by name or ID - Remove load hint from `databases create` success output Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
| ); | ||
| std::process::exit(1); | ||
| }); | ||
| let auto = format!("dataset_{type}", type = r#type); |
There was a problem hiding this comment.
nit: auto-name for the dataset branch is dataset_{type}, which collides for every additional index on the same dataset (e.g. creating BM25 indexes over two different columns both auto-name to dataset_bm25 and the second create will fail). The columns are already captured in cols here — mirroring the connection branch with something like format!("dataset_{cols}_{type}", cols = cols.replace(',', "_")) would make the default useful instead of guaranteed-collision. (not blocking)
| let table_part = &target[..bracket_pos]; | ||
| let cols_raw = target[bracket_pos + 1..].trim_end_matches(']'); |
There was a problem hiding this comment.
nit: this silently accepts unclosed/garbled brackets. airbnb.listings[col (missing ]) parses successfully as cols_raw = "col" because trim_end_matches(']') does nothing when there's no ]. Likewise airbnb.listings[col]extra parses as a single column "col]extra" and is sent to the server. Worth requiring target.ends_with(']') (or checking that bracket_pos is followed by a matching ] at the end) and rejecting otherwise, so users get a clear local error rather than a confusing API error. (not blocking)
| fn parse_db_target(target: &str) -> (String, String, String) { | ||
| let parts: Vec<&str> = target.splitn(4, '.').collect(); | ||
| match parts.as_slice() { | ||
| [db, tbl] => (db.to_string(), "public".to_string(), tbl.to_string()), | ||
| [db, schema, tbl] => (db.to_string(), schema.to_string(), tbl.to_string()), | ||
| _ => { | ||
| eprintln!( | ||
| "error: target must be 'database.table' or 'database.schema.table'" | ||
| ); | ||
| std::process::exit(1); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /// Parse an index target like `airbnb.listings[col1,col2]` or | ||
| /// `airbnb.public.listings[col1,col2]` into `(conn_name, schema, table, columns)`. | ||
| /// Schema defaults to `public` when only two dot-parts are given. | ||
| fn parse_index_target(target: &str) -> (String, String, String, Vec<String>) { | ||
| let Some(bracket_pos) = target.find('[') else { | ||
| eprintln!( | ||
| "error: target must include columns in brackets, e.g. airbnb.listings[col1,col2]" | ||
| ); | ||
| std::process::exit(1); | ||
| }; | ||
| let table_part = &target[..bracket_pos]; | ||
| let cols_raw = target[bracket_pos + 1..].trim_end_matches(']'); | ||
|
|
||
| let parts: Vec<&str> = table_part.splitn(4, '.').collect(); | ||
| let (conn, schema, table) = match parts.as_slice() { | ||
| [c, t] => (c.to_string(), "public".to_string(), t.to_string()), | ||
| [c, s, t] => (c.to_string(), s.to_string(), t.to_string()), | ||
| _ => { | ||
| eprintln!( | ||
| "error: target must be 'connection.table[cols]' or 'connection.schema.table[cols]'" | ||
| ); | ||
| std::process::exit(1); | ||
| } | ||
| }; | ||
|
|
||
| let columns: Vec<String> = cols_raw | ||
| .split(',') | ||
| .map(|s| s.trim().to_string()) | ||
| .filter(|s| !s.is_empty()) | ||
| .collect(); | ||
|
|
||
| if columns.is_empty() { | ||
| eprintln!("error: no columns specified in brackets"); | ||
| std::process::exit(1); | ||
| } | ||
|
|
||
| (conn, schema, table, columns) |
There was a problem hiding this comment.
nit: these two parsers (parse_db_target, parse_index_target) are pure, have several edge cases (2- vs 3-segment, default-public, bracket placement, empty cols, malformed dot counts), and are easy to regress. A small #[cfg(test)] mod tests covering happy paths + each error branch would lock the behavior down — especially since the existing databases_cli.rs tests don't exercise the new shorthand. (not blocking)
| /// Table and columns to index: `connection.table[col1,col2]` | ||
| /// or `connection.schema.table[col1,col2]`. Schema defaults to `public`. | ||
| #[arg(conflicts_with = "dataset_id")] | ||
| target: Option<String>, |
There was a problem hiding this comment.
super nit: [ and ] are glob metacharacters in zsh (and in bash with extglob/failglob), so hotdata indexes create airbnb.listings[description] --type bm25 will fail with no matches found for many users out of the box. Worth a line in the doc comment telling users to quote the target ('airbnb.listings[description]'). Same note would help on DatabasesCommands::Load even though it has no brackets, since the README example uses brackets in nearby commands. (not blocking)
| pub fn resolve_connection_id(api: &ApiClient, name_or_id: &str) -> String { | ||
| let body: ListResponse = api.get("/connections"); | ||
| match body | ||
| .connections | ||
| .iter() | ||
| .find(|c| c.id == name_or_id || c.name == name_or_id) | ||
| { | ||
| Some(conn) => conn.id.clone(), | ||
| None => { | ||
| use crossterm::style::Stylize; | ||
| eprintln!( | ||
| "{}", | ||
| format!("error: no connection named or with id '{name_or_id}'").red() | ||
| ); | ||
| std::process::exit(1); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
super nit: this lists every connection in the workspace even when the caller already has a raw ID. For workspaces with many connections (or once /connections paginates) this adds a needless round trip and could miss a valid ID past the first page. A cheap optimization: if name_or_id matches a known ID shape (e.g. starts with a known prefix or looks like a UUID), try GET /connections/{id} first and fall back to listing on 404. Fine to defer — the existing connections::list has the same limitation. (not blocking)
- Dataset auto-name now includes columns (dataset_{cols}_{type}) to avoid collision
- Validate bracket closure in parse_index_target; reject unclosed/garbled brackets
- Add quoting note to indexes create doc comment for shell glob safety
- resolve_connection_id tries GET /connections/{id} directly for conn* prefixed IDs
- Add unit tests for parse_db_target and parse_index_target (6 tests)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| /// Quote the argument to prevent shell glob expansion: | ||
| /// `hotdata indexes create 'airbnb.listings[description]' --type bm25` | ||
| #[arg(conflicts_with = "dataset_id")] | ||
| target: Option<String>, |
There was a problem hiding this comment.
nit: README.md is now out of sync with this command — lines 252–261 still document hotdata indexes create --connection-id <id> --schema <schema> --table <table> --name <name> --columns <cols> ..., but this PR removes those flags from Create. A user copying that example will hit a clap error. Worth updating the README block to show the new connection.table[cols] form (and the dataset-scope path that now requires --columns), plus a one-line mention of the hotdata databases load shorthand alongside tables load. (not blocking)
Summary
hotdata indexes create airbnb.listings[description] --type bm25— replaces--connection-id / --schema / --table / --columnsflags with a single positional arg using bracket notation for columnshotdata databases load airbnb.listings --url <url>— new top-level shorthand replacinghotdata databases tables load airbnb listings --url ...--nameonindexes createis now optional; auto-derived as{table}_{cols}_{type}when omittedpublicin both commands when onlydb.tableis given (no schema segment)connections::resolve_connection_id(api, name_or_id)to look up a connection by name or raw IDload: ...hint fromdatabases createsuccess outputExamples
Test plan
hotdata indexes create airbnb.listings[description] --type bm25resolves connection by name and creates indexhotdata indexes create airbnb.public.listings[name,description] --type bm25 --name my_idxuses explicit name and schemahotdata indexes create --dataset-id ds123 --columns col --type bm25still works for dataset scopehotdata databases load airbnb.listings --url <url>loads the tablehotdata databases load airbnb.public.listings --file ./data.parquetuses explicit schemahotdata databases createno longer prints theload:hint🤖 Generated with Claude Code