Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 19 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,11 +138,15 @@ hotdata datasets create --url "https://example.com/data.parquet" --label "My Dat
## Query

```sh
hotdata query "<sql>" [--workspace-id <id>] [--connection <connection_id>] [--format table|json|csv]
hotdata query "<sql>" [-w <id>] [--connection <connection_id>] [-o table|json|csv]
hotdata query status <query_run_id> [-o table|json|csv]
```

- Default format is `table`, which prints results with row count and execution time.
- Default output is `table`, which prints results with row count and execution time.
- Use `--connection` to scope the query to a specific connection.
- Long-running queries automatically fall back to async execution and return a `query_run_id`.
- Use `hotdata query status <query_run_id>` to poll for results.
- Exit codes for `query status`: `0` = succeeded, `1` = failed, `2` = still running (poll again).

## Saved Queries

Expand All @@ -163,13 +167,21 @@ hotdata queries run <query_id> [--format table|json|csv]
## Search

```sh
hotdata search "<query>" --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [--format table|json|csv]
# BM25 full-text search
hotdata search "query text" --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [-o table|json|csv]

# Vector search with --model (calls OpenAI to embed the query)
hotdata search "query text" --table <table> --column <vector_column> --model text-embedding-3-small [--limit <n>]

# Vector search with piped embedding
echo '[0.1, -0.2, ...]' | hotdata search --table <table> --column <vector_column> [--limit <n>]
```

- Full-text search using BM25 across a table column.
- Requires a BM25 index on the target column (see `indexes create`).
- Results are ordered by relevance score (descending).
- `--select` specifies which columns to return (comma-separated, defaults to all). The `score` column is automatically appended when `--select` is used.
- Without `--model` and with query text: BM25 full-text search. Requires a BM25 index on the target column.
- With `--model`: generates an embedding via OpenAI and performs vector search using `l2_distance`. Requires `OPENAI_API_KEY` env var.
- Without query text and with piped stdin: reads a vector (raw JSON array or OpenAI embedding response) and performs vector search.
- BM25 results are ordered by relevance score (descending). Vector results are ordered by distance (ascending).
- `--select` specifies which columns to return (comma-separated, defaults to all).

## Indexes

Expand Down
35 changes: 24 additions & 11 deletions skills/hotdata-cli/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,16 +163,21 @@ Use `hotdata datasets <dataset_id>` to look up the `table_name` before writing q

### Execute SQL Query
```
hotdata query "<sql>" [--workspace-id <workspace_id>] [--connection <connection_id>] [--format table|json|csv]
hotdata query "<sql>" [-w <workspace_id>] [--connection <connection_id>] [-o table|json|csv]
hotdata query status <query_run_id> [-o table|json|csv]
```
- Default format is `table`, which prints results with row count and execution time.
- Default output is `table`, which prints results with row count and execution time.
- Use `--connection` to scope the query to a specific connection.
- Use `hotdata tables list` to discover tables and columns — do not query `information_schema` directly.
- **Always use PostgreSQL dialect SQL.**
- Long-running queries automatically fall back to async execution and return a `query_run_id`.
- Use `hotdata query status <query_run_id>` to poll for results.
- Exit codes for `query status`: `0` = succeeded, `1` = failed, `2` = still running (poll again).
- **When a query returns a `query_run_id`, use `query status` to poll rather than re-running the query.**

### Get Query Result
```
hotdata results <result_id> [--workspace-id <workspace_id>] [--format table|json|csv]
hotdata results <result_id> [-w <workspace_id>] [-o table|json|csv]
```
- Retrieves a previously executed query result by its result ID.
- Query results include a `result-id` in the footer (e.g. `[result-id: rslt...]`).
Expand All @@ -195,23 +200,31 @@ hotdata queries run <query_id> [--format table|json|csv]

### Search
```
hotdata search "<query>" --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [--format table|json|csv]
# BM25 full-text search
hotdata search "query text" --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [-o table|json|csv]

# Vector search with --model (calls OpenAI to embed the query)
hotdata search "query text" --table <table> --column <vector_column> --model text-embedding-3-small [--limit <n>]

# Vector search with piped embedding
echo '[0.1, -0.2, ...]' | hotdata search --table <table> --column <vector_column> [--limit <n>]
```
- Full-text search using BM25 across a table column.
- Requires a BM25 index on the target column (see `indexes create`).
- Results are ordered by relevance score (descending).
- `--select` specifies which columns to return (comma-separated, defaults to all). The `score` column is automatically appended when `--select` is used.
- Without `--model` and with query text: BM25 full-text search. Requires a BM25 index on the target column.
- With `--model`: generates an embedding via OpenAI and performs vector search using `l2_distance`. Requires `OPENAI_API_KEY` env var. Supported models: `text-embedding-3-small`, `text-embedding-3-large`.
- Without query text and with piped stdin: reads a vector (raw JSON array or OpenAI embedding response) and performs vector search.
- BM25 results are ordered by relevance score (descending). Vector results are ordered by distance (ascending).
- `--select` specifies which columns to return (comma-separated, defaults to all).
- Default limit is 10.
- **For BM25 search, create a BM25 index on the target column first. For vector search, create a vector index.**

### Indexes
```
hotdata indexes list --connection-id <id> --schema <schema> --table <table> [--workspace-id <workspace_id>] [--format table|json|yaml]
hotdata indexes create --connection-id <id> --schema <schema> --table <table> --name <name> --columns <cols> [--type sorted|bm25|vector] [--metric l2|cosine|dot] [--async]
hotdata indexes list -c <connection_id> --schema <schema> --table <table> [-w <workspace_id>] [-o table|json|yaml]
hotdata indexes create -c <connection_id> --schema <schema> --table <table> --name <name> --columns <cols> [--type sorted|bm25|vector] [--metric l2|cosine|dot] [--async]
```
- `list` shows indexes on a table with name, type, columns, status, and creation date.
- `create` creates an index. Use `--type bm25` for full-text search, `--type vector` for vector search (requires `--metric`).
- `--async` submits index creation as a background job. Use `hotdata jobs <job_id>` to check status.
- **Before using `hotdata search`, create a BM25 index on the target column.**

### Jobs
```
Expand Down
19 changes: 16 additions & 3 deletions src/command.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,10 @@ pub enum Commands {
command: Option<DatasetsCommands>,
},

/// Execute a SQL query
/// Execute a SQL query, or check status of a running query
Query {
/// SQL query string
sql: String,
/// SQL query string (omit when using a subcommand)
sql: Option<String>,

/// Workspace ID (defaults to first workspace from login)
#[arg(long, short = 'w')]
Expand All @@ -41,6 +41,9 @@ pub enum Commands {
/// Output format
#[arg(long = "output", short = 'o', default_value = "table", value_parser = ["table", "json", "csv"])]
output: String,

#[command(subcommand)]
command: Option<QueryCommands>,
},

/// Manage workspaces
Expand Down Expand Up @@ -187,6 +190,16 @@ impl From<ShellChoice> for clap_complete::Shell {
}
}

#[derive(Subcommand)]
pub enum QueryCommands {
/// Check the status of a running query and retrieve results.
/// Exit codes: 0 = succeeded, 1 = failed, 2 = still running (poll again)
Status {
/// Query run ID
id: String,
},
}

#[derive(Subcommand)]
pub enum AuthCommands {
/// Remove authentication for a profile
Expand Down
21 changes: 18 additions & 3 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ mod workspace;

use anstyle::AnsiColor;
use clap::{Parser, builder::Styles};
use command::{AuthCommands, Commands, ConnectionsCommands, ConnectionsCreateCommands, DatasetsCommands, IndexesCommands, JobsCommands, QueriesCommands, ResultsCommands, SkillCommands, TablesCommands, WorkspaceCommands};
use command::{AuthCommands, Commands, ConnectionsCommands, ConnectionsCreateCommands, DatasetsCommands, IndexesCommands, JobsCommands, QueriesCommands, QueryCommands, ResultsCommands, SkillCommands, TablesCommands, WorkspaceCommands};

#[derive(Parser)]
#[command(name = "hotdata", version, about = concat!("Hotdata CLI - Command line interface for Hotdata (v", env!("CARGO_PKG_VERSION"), ")"), long_about = None, disable_version_flag = true)]
Expand Down Expand Up @@ -109,9 +109,24 @@ fn main() {
}
}
}
Commands::Query { sql, workspace_id, connection, output } => {
Commands::Query { sql, workspace_id, connection, output, command } => {
let workspace_id = resolve_workspace(workspace_id);
query::execute(&sql, &workspace_id, connection.as_deref(), &output)
match command {
Some(QueryCommands::Status { id }) => {
query::poll(&id, &workspace_id, &output)
}
None => {
match sql {
Some(sql) => query::execute(&sql, &workspace_id, connection.as_deref(), &output),
None => {
use clap::CommandFactory;
let mut cmd = Cli::command();
cmd.build();
cmd.find_subcommand_mut("query").unwrap().print_help().unwrap();
}
}
}
}
}
Commands::Workspaces { command } => match command {
WorkspaceCommands::List { output } => workspace::list(&output),
Expand Down
80 changes: 79 additions & 1 deletion src/query.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,21 @@ pub struct QueryResponse {
pub warning: Option<String>,
}

#[derive(Deserialize)]
struct AsyncResponse {
query_run_id: String,
status: String,
}

#[derive(Deserialize)]
struct QueryRunResponse {
id: String,
status: String,
result_id: Option<String>,
#[serde(default)]
error: Option<String>,
}

fn value_to_string(v: &Value) -> String {
match v {
Value::Null => "NULL".to_string(),
Expand All @@ -33,12 +48,40 @@ fn value_to_string(v: &Value) -> String {
pub fn execute(sql: &str, workspace_id: &str, connection: Option<&str>, format: &str) {
let api = ApiClient::new(Some(workspace_id));

let mut body = serde_json::json!({ "sql": sql });
let mut body = serde_json::json!({
"sql": sql,
"async": true,
"async_after_ms": 1000,
});
if let Some(conn) = connection {
body["connection_id"] = Value::String(conn.to_string());
}

let spinner = indicatif::ProgressBar::new_spinner();
spinner.set_style(
indicatif::ProgressStyle::with_template("{spinner:.cyan} {msg}")
.unwrap(),
);
spinner.set_message("running query...");
spinner.enable_steady_tick(std::time::Duration::from_millis(80));

let (status, resp_body) = api.post_raw("/query", &body);
spinner.finish_and_clear();

if status.as_u16() == 202 {
let async_resp: AsyncResponse = match serde_json::from_str(&resp_body) {
Ok(r) => r,
Err(e) => {
eprintln!("error parsing async response: {e}");
std::process::exit(1);
}
};
use crossterm::style::Stylize;
eprintln!("{}", format!("query still running (status: {})", async_resp.status).yellow());
eprintln!("query_run_id: {}", async_resp.query_run_id);
eprintln!("{}", format!("Poll with: hotdata query status {}", async_resp.query_run_id).dark_grey());
std::process::exit(2);
}

if !status.is_success() {
let message = serde_json::from_str::<Value>(&resp_body)
Expand All @@ -61,6 +104,41 @@ pub fn execute(sql: &str, workspace_id: &str, connection: Option<&str>, format:
print_result(&result, format);
}

/// Poll a query run by ID. If succeeded and has a result_id, fetch and display the result.
pub fn poll(query_run_id: &str, workspace_id: &str, format: &str) {
let api = ApiClient::new(Some(workspace_id));

let run: QueryRunResponse = api.get(&format!("/query-runs/{query_run_id}"));

match run.status.as_str() {
"succeeded" => {
match run.result_id {
Some(ref result_id) => {
let result: QueryResponse = api.get(&format!("/results/{result_id}"));
print_result(&result, format);
}
None => {
use crossterm::style::Stylize;
println!("{}", "Query succeeded but no result available.".yellow());
}
}
}
"failed" => {
use crossterm::style::Stylize;
let err = run.error.as_deref().unwrap_or("unknown error");
eprintln!("{}", format!("query failed: {err}").red());
std::process::exit(1);
}
status => {
use crossterm::style::Stylize;
eprintln!("{}", format!("query status: {status}").yellow());
eprintln!("query_run_id: {}", run.id);
eprintln!("{}", format!("Poll again with: hotdata query status {}", run.id).dark_grey());
std::process::exit(2);
}
Comment thread
pthurlow marked this conversation as resolved.
}
}

pub fn print_result(result: &QueryResponse, format: &str) {
if let Some(ref warning) = result.warning {
Comment thread
pthurlow marked this conversation as resolved.
eprintln!("warning: {warning}");
Expand Down
Loading