diff --git a/README.md b/README.md
index ae5ac99..82ee6d8 100644
--- a/README.md
+++ b/README.md
@@ -71,7 +71,8 @@ API key priority (lowest to highest): config file → `HOTDATA_API_KEY` env var
 | `query` | | Execute a SQL query |
 | `queries` | `list` | Inspect query run history |
 | `search` | | Full-text search across a table column |
-| `indexes` | `list`, `create` | Manage indexes on a table |
+| `indexes` | `list`, `create`, `delete` | Manage indexes on a table or dataset |
+| `embedding-providers` | `list`, `get`, `create`, `update`, `delete` | Manage embedding providers used by vector indexes |
 | `results` | `list` | Retrieve stored query results |
 | `jobs` | `list` | Manage background jobs |
 | `sandbox` | `list`, `new`, `set`, `read`, `update`, `run` | Manage sandboxes |
@@ -101,13 +102,16 @@ hotdata workspaces set [<workspace_id>]
 ```sh
 hotdata connections list [-w <id>] [-o table|json|yaml]
 hotdata connections <connection_id> [-w <id>] [-o table|json|yaml]
-hotdata connections refresh <connection_id> [-w <id>]
+hotdata connections refresh <connection_id> [-w <id>] [--data] [--schema <name> --table <name>] [--async] [--include-uncached]
 hotdata connections new [-w <id>]
 ```
 
 - `list` returns `id`, `name`, `source_type` for each connection.
 - Pass a connection ID to view details (id, name, source type, table counts).
-- `refresh` triggers a schema refresh for a connection.
+- `refresh` triggers a schema refresh by default. Pass `--data` to refresh cached row data instead.
+- `--schema` and `--table` narrow a data refresh to a single table (must be supplied together).
+- `--async` submits a data refresh as a background job and returns a job ID; poll with `hotdata jobs <job_id>`. Only valid with `--data` — schema refresh is always synchronous.
+- `--include-uncached` includes tables that haven't been cached yet in a connection-wide data refresh. Only valid with `--data` and no `--table`.
 - `new` launches an interactive connection creation wizard.
 
 ### Create a connection
@@ -143,6 +147,7 @@ hotdata datasets create --file data.csv [--label "My Dataset"] [--table-name my_
 hotdata datasets create --sql "SELECT ..." --label "My Dataset"
 hotdata datasets create --url "https://example.com/data.parquet" --label "My Dataset"
 hotdata datasets update <dataset_id> [--label "New Label"] [--table-name new_table]
+hotdata datasets refresh <dataset_id> [--workspace-id <id>] [--async]
 ```
 
 - Datasets are queryable as `datasets.main.<table_name>`.
@@ -150,6 +155,8 @@ hotdata datasets update <dataset_id> [--label "New Label"] [--table-name new_tab
 - `--url` imports data directly from a URL (supports csv, json, parquet).
 - Format is auto-detected from file extension or content.
 - Piped stdin is supported: `cat data.csv | hotdata datasets create --label "My Dataset"`
+- `refresh` re-runs the dataset's source (URL fetch or saved query) and creates a new version. Not supported for upload-source datasets.
+- `--async` submits the refresh as a background job and returns a job ID; poll with `hotdata jobs <job_id>`.
 
 ## Workspace context
 
@@ -194,33 +201,62 @@ hotdata queries <query_run_id> [-o table|json|yaml]
 
 ## Search
 
-```sh
-# BM25 full-text search
-hotdata search "query text" --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [-o table|json|csv]
+`--type` is **required** — no default. Pass either `vector` (similarity search via the index's embedding provider) or `bm25` (full-text search). Both run entirely server-side.
 
-# Vector search with --model (calls OpenAI to embed the query)
-hotdata search "query text" --table <table> --column <vector_column> --model text-embedding-3-small [--limit <n>]
+```sh
+# BM25 full-text search (requires a BM25 index on the column)
+hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [-o table|json|csv]
 
-# Vector search with piped embedding
-echo '[0.1, -0.2, ...]' | hotdata search --table <table> --column <vector_column> [--limit <n>]
+# Vector search (requires a vector index with auto-embedding on the column)
+hotdata search "<query>" --type vector --table <table> --column <source_text_column> [--limit <n>]
 ```
 
-- Without `--model` and with query text: BM25 full-text search. Requires a BM25 index on the target column.
-- With `--model`: generates an embedding via OpenAI and performs vector search using `l2_distance`. Requires `OPENAI_API_KEY` env var.
-- Without query text and with piped stdin: reads a vector (raw JSON array or OpenAI embedding response) and performs vector search.
-- BM25 results are ordered by relevance score (descending). Vector results are ordered by distance (ascending).
+- **`--type vector`** — pass your query as **plain text**, name the **source text column** (e.g. `title`). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — so distance metric, model, and dimensions all match automatically. No `OPENAI_API_KEY`, no client-side embedding, no need to know about the auto-generated `_embedding` column. Generated SQL: `vector_distance(col, 'query')` server-side.
+- **`--type bm25`** runs `bm25_search(table, col, 'query')` — requires a BM25 index on the column.
+- **No vector index, or want to use a different model than the index?** Skip `hotdata search` and use raw SQL via `hotdata query` (e.g. `SELECT *, cosine_distance(col, [<your_vec>]) FROM ...`). The SQL reference covers the available distance functions and table UDFs.
+- BM25 results sort by score (descending). Vector results sort by distance (ascending).
 - `--select` specifies which columns to return (comma-separated, defaults to all).
+- The previous `--model` flag and stdin-piped-vector path are **removed** — both hardcoded `l2_distance` regardless of the index's actual metric, which silently produced wrong rankings on cosine indexes. For client-side embedding or precomputed-vector workflows, use raw SQL via `hotdata query` (e.g. `SELECT *, cosine_distance(col, [<vec>]) ...`).
 
 ## Indexes
 
+Indexes attach to either a connection-table (`--connection-id` + `--schema` + `--table`) or a dataset (`--dataset-id`). The two scopes are mutually exclusive.
+
+```sh
+# Connection-table scope
+hotdata indexes list   --connection-id <id> --schema <schema> --table <table> [-o table|json|yaml]
+hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
+  --name <name> --columns <cols> --type sorted|bm25|vector \
+  [--metric l2|cosine|dot] [--async] \
+  [--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
+hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
+
+# Dataset scope
+hotdata indexes list   --dataset-id <id> [-o table|json|yaml]
+hotdata indexes create --dataset-id <id> --name <name> --columns <cols> --type sorted|bm25|vector ...
+hotdata indexes delete --dataset-id <id> --name <name>
+```
+
+- `--type` is **required** — choose `sorted` (B-tree-like), `bm25` (full-text), or `vector` (similarity).
+- `--type vector` requires exactly one column.
+- `--async` submits index creation as a background job and returns a job ID; poll with `hotdata jobs <job_id>`.
+- **Auto-embedding (text → vector):** when `--type vector` is used on a text column, embeddings are generated automatically. The embedding provider can be specified with `--embedding-provider-id`; if omitted, the first system provider is used. The generated column defaults to `{column}_embedding` and can be overridden with `--output-column`.
+
+## Embedding providers
+
 ```sh
-hotdata indexes list --connection-id <id> --schema <schema> --table <table> [--workspace-id <id>] [--format table|json|yaml]
-hotdata indexes create --connection-id <id> --schema <schema> --table <table> --name <name> --columns <cols> [--type sorted|bm25|vector] [--metric l2|cosine|dot] [--async]
+hotdata embedding-providers list [-o table|json|yaml]
+hotdata embedding-providers get <id> [-o table|json|yaml]
+hotdata embedding-providers create --name <name> --provider-type service|local \
+  [--config '<json>'] [--provider-api-key <key> | --secret-name <name>]
+hotdata embedding-providers update <id> [--name <name>] [--config '<json>'] \
+  [--provider-api-key <key> | --secret-name <name>]
+hotdata embedding-providers delete <id>
 ```
 
-- `list` shows indexes on a table with name, type, columns, status, and creation date.
-- `create` creates an index. Use `--type bm25` for full-text search, `--type vector` for vector search (requires `--metric`).
-- `--async` submits index creation as a background job.
+- `list`/`get` show registered providers (system providers like `sys_emb_openai` come pre-configured).
+- `--provider-api-key` auto-creates a managed secret for the provider; `--secret-name` references an existing secret. They are mutually exclusive.
+- `--provider-api-key` pairs with `--provider-type` and avoids colliding with the global `--api-key` (Hotdata auth).
 
 ## Results
 
@@ -239,7 +275,7 @@ hotdata jobs <job_id> [--workspace-id <id>] [--format table|json|yaml]
 ```
 
 - `list` shows only active jobs (`pending` and `running`) by default. Use `--all` to see all jobs.
-- `--job-type` accepts: `data_refresh_table`, `data_refresh_connection`, `create_index`.
+- `--job-type` accepts: `data_refresh_table`, `data_refresh_connection`, `dataset_refresh`, `create_index`, `create_dataset_index`.
 - `--status` accepts: `pending`, `running`, `succeeded`, `partially_succeeded`, `failed`.
 
 ## Sandboxes
diff --git a/skills/hotdata/SKILL.md b/skills/hotdata/SKILL.md
index 576a748..626868a 100644
--- a/skills/hotdata/SKILL.md
+++ b/skills/hotdata/SKILL.md
@@ -94,12 +94,15 @@ hotdata connections <connection_id> [--workspace-id <workspace_id>] [--output ta
 - `list` returns `id`, `name`, `source_type` for each connection.
 - Pass a connection ID to view details (id, name, source type, table counts).
 
-### Refresh connection schema
+### Refresh connection schema or data
 ```
-hotdata connections refresh <connection_id> [--workspace-id <workspace_id>]
+hotdata connections refresh <connection_id> [--workspace-id <workspace_id>] [--data] [--schema <name> --table <name>] [--async] [--include-uncached]
 ```
-- Refreshes the connection’s catalog so new or changed tables and columns appear in `hotdata tables list` and queries.
-- Use after DDL or other changes in the source database when the workspace view is stale.
+- Default (no flags) refreshes the connection’s catalog so new or changed tables and columns appear in `hotdata tables list` and queries. Use after DDL or other changes in the source database when the workspace view is stale.
+- `--data` re-syncs cached row data from the source instead of refreshing the catalog.
+- `--schema` and `--table` narrow a data refresh to a single table (must be supplied together).
+- `--async` submits a data refresh as a background job and returns a job ID; poll with `hotdata jobs <job_id>`. Only valid with `--data` — schema refresh is always synchronous.
+- `--include-uncached` includes tables that haven't been cached yet in a connection-wide data refresh. Only valid with `--data` and no `--table`.
 
 ### Create a Connection
 
@@ -212,6 +215,14 @@ hotdata datasets create --label "My Dataset" --upload-id <upload_id> [--format c
 - `--table-name` is optional — derived from the label if omitted.
 - After **`datasets create`**, the CLI prints a **`full_name`** line (for example `datasets.main.my_table` or `datasets.s_ufmblmvq.tac_csat`). **Always use that `full_name` in SQL**—do not assume `datasets.main`.
 
+#### Refresh a dataset
+```
+hotdata datasets refresh <dataset_id> [--workspace-id <workspace_id>] [--async]
+```
+- Re-runs the dataset's source (URL fetch or saved query) and creates a **new version**. Use after the upstream source has changed.
+- **Not supported for upload-source datasets** — those have no remote source to re-pull from. The CLI surfaces the server's `400` directly.
+- `--async` submits the refresh as a background job and returns a `job_id`; poll with **`hotdata jobs <job_id>`**.
+
 #### Querying datasets
 
 Qualified dataset tables are **`datasets.<schema>.<table_name>`**: **`main`** for workspace-scoped datasets (created outside a sandbox), or the **sandbox id** for sandbox-created data (e.g. `datasets.s_ufmblmvq.tac_csat`). The create output’s **`full_name`** is authoritative—copy it into `FROM` / `JOIN` clauses instead of guessing `datasets.main.…`.
@@ -286,32 +297,59 @@ These commands use the **active workspace only** (the `queries` command has no `
 To create a dataset from a **saved query** still registered for the workspace, use **`hotdata datasets create --query-id <saved_query_id>`** (this CLI does not expose separate saved-query create/run subcommands).
 
 ### Search
-```
-# BM25 full-text search
-hotdata search "query text" --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [--output table|json|csv]
 
-# Vector search with --model (calls OpenAI to embed the query)
-hotdata search "query text" --table <table> --column <vector_column> --model text-embedding-3-small [--limit <n>]
+`--type` is **required**. Pass `vector` or `bm25`. Both run entirely server-side.
+
+```
+# BM25 full-text search (requires BM25 index on the column)
+hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [--output table|json|csv]
 
-# Vector search with piped embedding
-echo '[0.1, -0.2, ...]' | hotdata search --table <table> --column <vector_column> [--limit <n>]
+# Vector similarity search via server-side auto-embed (requires a vector index on the column)
+hotdata search "<query>" --type vector --table <table> --column <source_text_column> [--limit <n>]
 ```
-- Without `--model` and with query text: BM25 full-text search. Requires a BM25 index on the target column.
-- With `--model`: generates an embedding via OpenAI and performs vector search using `l2_distance`. Requires `OPENAI_API_KEY` env var. Supported models: `text-embedding-3-small`, `text-embedding-3-large`.
-- Without query text and with piped stdin: reads a vector (raw JSON array or OpenAI embedding response) and performs vector search.
-- BM25 results are ordered by relevance score (descending). Vector results are ordered by distance (ascending).
+- **`--type vector`** — pass the query as **plain text** and name the **source text column** (e.g. `title`). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — distance metric, model, and dimensions match automatically. No client-side embedding, no `OPENAI_API_KEY` required. Generated SQL: `vector_distance(col, 'text')`.
+- **`--type bm25`** generates `bm25_search(table, col, 'text')` server-side; requires a BM25 index on the column.
+- **No vector index on the column, or want a different embedding model?** `hotdata search` won't help — drop down to raw SQL via `hotdata query` (e.g. `SELECT *, cosine_distance(col, [<vec>]) FROM ...`). See the SQL reference for available distance functions and table UDFs.
+- BM25 results sort by score (descending). Vector results sort by distance (ascending).
 - `--select` specifies which columns to return (comma-separated, defaults to all).
 - Default limit is 10.
-- **For BM25 search, create a BM25 index on the target column first. For vector search, create a vector index.**
+- **For BM25 search, create a BM25 index on the target column first (`hotdata indexes create ... --type bm25`). For vector search, create a vector index, optionally with auto-embedding on a text column.**
+- The earlier `--model` flag and stdin-piped-vector path have both been removed. They hardcoded `l2_distance` regardless of the index's metric (silently wrong on cosine indexes). For client-side embedding or precomputed-vector workflows, use raw SQL via `hotdata query`.
 
 ### Indexes
+
+Indexes attach to either a connection-table (`--connection-id` + `--schema` + `--table`) or a dataset (`--dataset-id`) — the two scopes are mutually exclusive. `--type` is required (no default).
+
+```
+# Connection-table scope
+hotdata indexes list   --connection-id <connection_id> --schema <schema> --table <table> [--workspace-id <workspace_id>] [--output table|json|yaml]
+hotdata indexes create --connection-id <connection_id> --schema <schema> --table <table> \
+  --name <name> --columns <cols> --type sorted|bm25|vector \
+  [--metric l2|cosine|dot] [--async] \
+  [--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
+hotdata indexes delete --connection-id <connection_id> --schema <schema> --table <table> --name <name>
+
+# Dataset scope (positional dataset_id replaced by --dataset-id flag)
+hotdata indexes list   --dataset-id <dataset_id> [--workspace-id <workspace_id>] [--output table|json|yaml]
+hotdata indexes create --dataset-id <dataset_id> --name <name> --columns <cols> --type sorted|bm25|vector ...
+hotdata indexes delete --dataset-id <dataset_id> --name <name>
+```
+- `--type` accepts `sorted` (B-tree-like; range/exact lookups), `bm25` (full-text), or `vector` (similarity). It is **required**.
+- `--type vector` requires exactly one column.
+- `--async` submits index creation as a background job; poll with `hotdata jobs <job_id>`.
+- **Auto-embedding:** with `--type vector` on a **text** column, the server generates embeddings automatically. Pass `--embedding-provider-id` to pick a specific provider; if omitted, the first system provider is used. The generated column defaults to `{column}_embedding` (override with `--output-column`).
+
+### Embedding providers
 ```
-hotdata indexes list --connection-id <connection_id> --schema <schema> --table <table> [--workspace-id <workspace_id>] [--output table|json|yaml]
-hotdata indexes create --connection-id <connection_id> --schema <schema> --table <table> --name <name> --columns <cols> [--workspace-id <workspace_id>] [--type sorted|bm25|vector] [--metric l2|cosine|dot] [--async]
+hotdata embedding-providers list [--workspace-id <workspace_id>] [--output table|json|yaml]
+hotdata embedding-providers get <id> [--workspace-id <workspace_id>] [--output table|json|yaml]
+hotdata embedding-providers create --name <name> --provider-type service|local \
+  [--config '<json>'] [--provider-api-key <key> | --secret-name <name>] [--workspace-id <workspace_id>]
+hotdata embedding-providers update <id> [--name <name>] [--config '<json>'] [--provider-api-key <key> | --secret-name <name>]
+hotdata embedding-providers delete <id> [--workspace-id <workspace_id>]
 ```
-- `list` shows indexes on a table with name, type, columns, status, and creation date.
-- `create` creates an index. Use `--type bm25` for full-text search, `--type vector` for vector search (requires `--metric`).
-- `--async` submits index creation as a background job. Use `hotdata jobs <job_id>` to check status.
+- System providers (e.g. `sys_emb_openai`) come pre-configured. `list` shows IDs to pass to `--embedding-provider-id`.
+- `--provider-api-key` (the embedding service's own key, e.g. an OpenAI `sk-...`) auto-creates a managed secret. Pairs with `--provider-type`; named to avoid colliding with the global `--api-key` (Hotdata auth). `--secret-name` references an existing secret. Mutually exclusive.
 
 ### Jobs
 ```
@@ -319,7 +357,7 @@ hotdata jobs list [--workspace-id <workspace_id>] [--job-type <type>] [--status
 hotdata jobs <job_id> [--workspace-id <workspace_id>] [--output table|json|yaml]
 ```
 - `list` shows only active jobs (`pending`, `running`) by default. Use `--all` to see all jobs.
-- `--job-type`: `data_refresh_table`, `data_refresh_connection`, `create_index`.
+- `--job-type`: `data_refresh_table`, `data_refresh_connection`, `dataset_refresh`, `create_index`, `create_dataset_index`.
 - `--status`: `pending`, `running`, `succeeded`, `partially_succeeded`, `failed`.
 - Use `hotdata jobs <job_id>` to inspect a specific job's status, error, and result.
 
diff --git a/src/api.rs b/src/api.rs
index 9e0e81c..4824af9 100644
--- a/src/api.rs
+++ b/src/api.rs
@@ -214,6 +214,13 @@ impl ApiClient {
         self.send(req, Some(body))
     }
 
+    /// DELETE request, exits on connection error, returns raw (status, body).
+    pub fn delete_raw(&self, path: &str) -> (reqwest::StatusCode, String) {
+        let url = format!("{}{path}", self.api_url);
+        let req = self.build_request(reqwest::Method::DELETE, &url);
+        self.send(req, None)
+    }
+
     /// PATCH request with JSON body, returns parsed response.
     pub fn patch<T: DeserializeOwned>(&self, path: &str, body: &serde_json::Value) -> T {
         let url = format!("{}{path}", self.api_url);
@@ -301,6 +308,39 @@ mod tests {
         mock.assert();
     }
 
+    #[test]
+    fn delete_raw_returns_status_and_body() {
+        let mut server = mockito::Server::new();
+        let mock = server
+            .mock("DELETE", "/widgets/abc")
+            .match_header("Authorization", "Bearer test-key")
+            .with_status(204)
+            .with_body("")
+            .create();
+
+        let api = ApiClient::test_new(&server.url(), "test-key", None);
+        let (status, body) = api.delete_raw("/widgets/abc");
+        assert_eq!(status.as_u16(), 204);
+        assert!(body.is_empty());
+        mock.assert();
+    }
+
+    #[test]
+    fn delete_raw_surfaces_error_body_on_4xx() {
+        let mut server = mockito::Server::new();
+        let mock = server
+            .mock("DELETE", "/widgets/missing")
+            .with_status(404)
+            .with_body(r#"{"error":{"message":"not found"}}"#)
+            .create();
+
+        let api = ApiClient::test_new(&server.url(), "test-key", None);
+        let (status, body) = api.delete_raw("/widgets/missing");
+        assert_eq!(status.as_u16(), 404);
+        assert!(body.contains("not found"));
+        mock.assert();
+    }
+
     #[test]
     fn get_none_if_not_found_returns_some_on_200() {
         let mut server = mockito::Server::new();
diff --git a/src/command.rs b/src/command.rs
index fbefeaf..3013c70 100644
--- a/src/command.rs
+++ b/src/command.rs
@@ -125,16 +125,38 @@ pub enum Commands {
         command: IndexesCommands,
     },
 
+    /// Manage embedding providers (OpenAI, local, etc.) used by vector indexes
+    #[command(name = "embedding-providers")]
+    EmbeddingProviders {
+        /// Workspace ID (defaults to first workspace from login)
+        #[arg(long, short = 'w', global = true)]
+        workspace_id: Option<String>,
+
+        #[command(subcommand)]
+        command: EmbeddingProvidersCommands,
+    },
+
     /// Full-text or vector search across a table column
     Search {
-        /// Search query text (omit to read a vector from stdin for vector search)
-        query: Option<String>,
+        /// Search query text — required for both --type bm25 and --type vector
+        query: String,
+
+        /// Search type — required (no default; choose deliberately)
+        ///
+        /// `vector` runs server-side `vector_distance(col, 'text')` — the server resolves the
+        /// embedding column, model, and metric from the index metadata.
+        ///
+        /// `bm25` runs server-side `bm25_search(table, col, 'text')` and requires a BM25 index
+        /// on the column.
+        #[arg(long, value_parser = ["vector", "bm25"])]
+        r#type: String,
 
         /// Table to search (connection.schema.table)
         #[arg(long)]
         table: String,
 
-        /// Column to search
+        /// Column to search. For `--type vector`, name the source text column — the server
+        /// resolves the embedding column from the index metadata.
         #[arg(long)]
         column: String,
 
@@ -146,10 +168,6 @@ pub enum Commands {
         #[arg(long, default_value = "10")]
         limit: u32,
 
-        /// Embedding model to generate a vector from the query text (e.g. text-embedding-3-small)
-        #[arg(long, value_parser = ["text-embedding-3-small", "text-embedding-3-large"])]
-        model: Option<String>,
-
         /// Workspace ID (defaults to first workspace from login)
         #[arg(long, short = 'w')]
         workspace_id: Option<String>,
@@ -248,49 +266,60 @@ pub enum AuthCommands {
 
 #[derive(Subcommand)]
 pub enum IndexesCommands {
-    /// List indexes (defaults to the whole workspace; narrow with filters)
+    /// List indexes (defaults to the whole workspace; narrow with filters or pass --dataset-id)
     List {
         /// Filter by connection ID
-        #[arg(long, short = 'c')]
+        #[arg(long, short = 'c', conflicts_with = "dataset_id")]
         connection_id: Option<String>,
 
         /// Filter by schema name
-        #[arg(long)]
+        #[arg(long, conflicts_with = "dataset_id")]
         schema: Option<String>,
 
         /// Filter by table name
-        #[arg(long)]
+        #[arg(long, conflicts_with = "dataset_id")]
         table: Option<String>,
 
+        /// List indexes for a specific dataset (alternative scope to --connection-id)
+        #[arg(long)]
+        dataset_id: Option<String>,
+
         /// Output format
         #[arg(long = "output", short = 'o', default_value = "table", value_parser = ["table", "json", "yaml"])]
         output: String,
     },
 
-    /// Create an index on a table
+    /// Create an index on a table or dataset
+    ///
+    /// Pass either connection scope (--connection-id + --schema + --table) OR
+    /// dataset scope (--dataset-id), not both.
     Create {
-        /// Connection ID
-        #[arg(long, short = 'c')]
-        connection_id: String,
+        /// Connection ID (use with --schema and --table)
+        #[arg(long, short = 'c', conflicts_with = "dataset_id", requires_all = ["schema", "table"])]
+        connection_id: Option<String>,
 
-        /// Schema name
-        #[arg(long)]
-        schema: String,
+        /// Schema name (requires --connection-id)
+        #[arg(long, requires = "connection_id")]
+        schema: Option<String>,
 
-        /// Table name
-        #[arg(long)]
-        table: String,
+        /// Table name (requires --connection-id)
+        #[arg(long, requires = "connection_id")]
+        table: Option<String>,
+
+        /// Dataset ID (alternative scope to --connection-id)
+        #[arg(long, conflicts_with_all = ["connection_id", "schema", "table"])]
+        dataset_id: Option<String>,
 
         /// Index name
         #[arg(long)]
         name: String,
 
-        /// Columns to index (comma-separated)
+        /// Columns to index (comma-separated). Vector indexes accept exactly one column.
         #[arg(long)]
         columns: String,
 
-        /// Index type
-        #[arg(long, default_value = "sorted", value_parser = ["sorted", "bm25", "vector"])]
+        /// Index type — required (no default; choose deliberately)
+        #[arg(long, value_parser = ["sorted", "bm25", "vector"])]
         r#type: String,
 
         /// Distance metric for vector indexes
@@ -300,6 +329,49 @@ pub enum IndexesCommands {
         /// Create as a background job
         #[arg(long)]
         r#async: bool,
+
+        /// Embedding provider ID — when set on a vector index over a text column,
+        /// embeddings are generated automatically. Defaults to first system provider if omitted.
+        #[arg(long = "embedding-provider-id")]
+        embedding_provider_id: Option<String>,
+
+        /// Override embedding output dimensions (vector indexes with auto-embedding only)
+        #[arg(long)]
+        dimensions: Option<u32>,
+
+        /// Custom name for the generated embedding column (defaults to `{column}_embedding`)
+        #[arg(long = "output-column")]
+        output_column: Option<String>,
+
+        /// Human-readable description of the embedding (e.g. "product titles")
+        #[arg(long)]
+        description: Option<String>,
+    },
+
+    /// Delete an index from a table or dataset
+    ///
+    /// Pass either connection scope (--connection-id + --schema + --table) OR
+    /// dataset scope (--dataset-id), not both.
+    Delete {
+        /// Connection ID (use with --schema and --table)
+        #[arg(long, short = 'c', conflicts_with = "dataset_id", requires_all = ["schema", "table"])]
+        connection_id: Option<String>,
+
+        /// Schema name (requires --connection-id)
+        #[arg(long, requires = "connection_id")]
+        schema: Option<String>,
+
+        /// Table name (requires --connection-id)
+        #[arg(long, requires = "connection_id")]
+        table: Option<String>,
+
+        /// Dataset ID (alternative scope to --connection-id)
+        #[arg(long, conflicts_with_all = ["connection_id", "schema", "table"])]
+        dataset_id: Option<String>,
+
+        /// Index name
+        #[arg(long)]
+        name: String,
     },
 }
 
@@ -308,7 +380,7 @@ pub enum JobsCommands {
     /// List background jobs (shows active jobs by default)
     List {
         /// Filter by job type
-        #[arg(long, value_parser = ["data_refresh_table", "data_refresh_connection", "create_index"])]
+        #[arg(long, value_parser = ["data_refresh_table", "data_refresh_connection", "dataset_refresh", "create_index", "create_dataset_index"])]
         job_type: Option<String>,
 
         /// Filter by status
@@ -402,6 +474,16 @@ pub enum DatasetsCommands {
         #[arg(long = "output", short = 'o', default_value = "table", value_parser = ["table", "json", "yaml"])]
         output: String,
     },
+
+    /// Refresh a dataset by re-running its source (URL fetch or saved query) and creating a new version
+    Refresh {
+        /// Dataset ID
+        id: String,
+
+        /// Submit as a background job
+        #[arg(long)]
+        r#async: bool,
+    },
 }
 
 #[derive(Subcommand)]
@@ -467,10 +549,30 @@ pub enum ConnectionsCommands {
         output: String,
     },
 
-    /// Refresh a connection's schema
+    /// Refresh a connection's schema or data
     Refresh {
         /// Connection ID
         connection_id: String,
+
+        /// Refresh data instead of schema metadata
+        #[arg(long)]
+        data: bool,
+
+        /// Narrow refresh to a specific schema (requires --table for data refresh)
+        #[arg(long)]
+        schema: Option<String>,
+
+        /// Narrow refresh to a specific table (requires --schema)
+        #[arg(long)]
+        table: Option<String>,
+
+        /// Submit as a background job (only valid with --data)
+        #[arg(long)]
+        r#async: bool,
+
+        /// Include uncached tables in connection-wide data refresh (only with --data, no --table)
+        #[arg(long = "include-uncached")]
+        include_uncached: bool,
     },
 }
 
@@ -664,3 +766,86 @@ pub enum TablesCommands {
         output: String,
     },
 }
+
+#[derive(Subcommand)]
+pub enum EmbeddingProvidersCommands {
+    /// List embedding providers
+    List {
+        /// Output format
+        #[arg(long = "output", short = 'o', default_value = "table", value_parser = ["table", "json", "yaml"])]
+        output: String,
+    },
+
+    /// Show details for a specific embedding provider
+    Get {
+        /// Provider ID
+        id: String,
+
+        /// Output format
+        #[arg(long = "output", short = 'o', default_value = "table", value_parser = ["table", "json", "yaml"])]
+        output: String,
+    },
+
+    /// Create a new embedding provider
+    Create {
+        /// Provider name (must be unique within the workspace)
+        #[arg(long)]
+        name: String,
+
+        /// Provider type ("local" or "service")
+        #[arg(long, value_parser = ["local", "service"])]
+        provider_type: String,
+
+        /// Provider-specific config as a JSON string (model, base_url, dimensions, etc.)
+        #[arg(long)]
+        config: Option<String>,
+
+        /// The provider's own API key (e.g. an OpenAI sk-... key). Auto-creates a
+        /// managed secret. Mutually exclusive with --secret-name. Named
+        /// `--provider-api-key` to pair with `--provider-type` and to avoid colliding
+        /// with the global `--api-key` (Hotdata auth) flag.
+        #[arg(long = "provider-api-key", conflicts_with = "secret_name")]
+        provider_api_key: Option<String>,
+
+        /// Reference an existing secret by name (for service providers)
+        #[arg(long)]
+        secret_name: Option<String>,
+
+        /// Output format
+        #[arg(long = "output", short = 'o', default_value = "table", value_parser = ["table", "json", "yaml"])]
+        output: String,
+    },
+
+    /// Update an embedding provider's name, config, or secret
+    Update {
+        /// Provider ID
+        id: String,
+
+        /// New name
+        #[arg(long)]
+        name: Option<String>,
+
+        /// New config as a JSON string
+        #[arg(long)]
+        config: Option<String>,
+
+        /// New provider API key (replaces or creates the managed secret).
+        /// See `embedding-providers create --provider-api-key` for naming rationale.
+        #[arg(long = "provider-api-key", conflicts_with = "secret_name")]
+        provider_api_key: Option<String>,
+
+        /// New secret name to reference
+        #[arg(long)]
+        secret_name: Option<String>,
+
+        /// Output format
+        #[arg(long = "output", short = 'o', default_value = "table", value_parser = ["table", "json", "yaml"])]
+        output: String,
+    },
+
+    /// Delete an embedding provider
+    Delete {
+        /// Provider ID
+        id: String,
+    },
+}
diff --git a/src/connections.rs b/src/connections.rs
index 9909cf0..fcf011e 100644
--- a/src/connections.rs
+++ b/src/connections.rs
@@ -344,21 +344,122 @@ pub fn list(workspace_id: &str, format: &str) {
     }
 }
 
-pub fn refresh(workspace_id: &str, connection_id: &str) {
-    let body = serde_json::json!({
+pub fn refresh(
+    workspace_id: &str,
+    connection_id: &str,
+    data: bool,
+    schema: Option<&str>,
+    table: Option<&str>,
+    async_mode: bool,
+    include_uncached: bool,
+) {
+    use crossterm::style::Stylize;
+
+    if async_mode && !data {
+        eprintln!(
+            "{}",
+            "--async only valid with --data (schema refresh is always synchronous)".red()
+        );
+        std::process::exit(1);
+    }
+    if include_uncached && !data {
+        eprintln!("{}", "--include-uncached only valid with --data".red());
+        std::process::exit(1);
+    }
+    if include_uncached && table.is_some() {
+        eprintln!(
+            "{}",
+            "--include-uncached cannot be combined with --table (it only applies to connection-wide refresh)".red()
+        );
+        std::process::exit(1);
+    }
+    if table.is_some() && schema.is_none() {
+        eprintln!("{}", "--table requires --schema".red());
+        std::process::exit(1);
+    }
+    if data && schema.is_some() && table.is_none() {
+        eprintln!(
+            "{}",
+            "--schema requires --table for data refresh (no schema-scoped data refresh)".red()
+        );
+        std::process::exit(1);
+    }
+
+    let mut body = serde_json::json!({
         "connection_id": connection_id,
-        "data": false,
+        "data": data,
     });
+    if let Some(s) = schema {
+        body["schema_name"] = serde_json::Value::String(s.to_string());
+    }
+    if let Some(t) = table {
+        body["table_name"] = serde_json::Value::String(t.to_string());
+    }
+    if async_mode {
+        body["async"] = serde_json::Value::Bool(true);
+    }
+    if include_uncached {
+        body["include_uncached"] = serde_json::Value::Bool(true);
+    }
 
     let api = ApiClient::new(Some(workspace_id));
     let (status, resp_body) = api.post_raw("/refresh", &body);
 
     if !status.is_success() {
-        use crossterm::style::Stylize;
         eprintln!("{}", crate::util::api_error(resp_body).red());
         std::process::exit(1);
     }
 
-    use crossterm::style::Stylize;
-    println!("{}", "Schema refresh completed.".green());
+    let parsed: serde_json::Value = serde_json::from_str(&resp_body).unwrap_or_default();
+
+    if async_mode {
+        let job_id = parsed["id"].as_str().unwrap_or("unknown");
+        println!("{}", "Data refresh submitted.".green());
+        println!("job_id: {}", job_id);
+        println!(
+            "{}",
+            format!("Use 'hotdata jobs {}' to check status.", job_id).dark_grey()
+        );
+        return;
+    }
+
+    if !data {
+        let discovered = parsed["tables_discovered"].as_u64().unwrap_or(0);
+        let added = parsed["tables_added"].as_u64().unwrap_or(0);
+        let modified = parsed["tables_modified"].as_u64().unwrap_or(0);
+        println!("{}", "Schema refresh completed.".green());
+        println!(
+            "{}",
+            format!("  tables: {discovered} discovered, {added} added, {modified} modified")
+                .dark_grey()
+        );
+        return;
+    }
+
+    if let Some(rows) = parsed["rows_synced"].as_u64() {
+        let dur = parsed["duration_ms"].as_u64().unwrap_or(0);
+        println!("{}", "Data refresh completed.".green());
+        println!("{}", format!("  {rows} rows synced ({dur} ms)").dark_grey());
+    } else {
+        let refreshed = parsed["tables_refreshed"].as_u64().unwrap_or(0);
+        let failed = parsed["tables_failed"].as_u64().unwrap_or(0);
+        let total = parsed["total_rows"].as_u64().unwrap_or(0);
+        let dur = parsed["duration_ms"].as_u64().unwrap_or(0);
+        println!("{}", "Data refresh completed.".green());
+        println!(
+            "{}",
+            format!(
+                "  {refreshed} tables refreshed, {failed} failed, {total} total rows ({dur} ms)"
+            )
+            .dark_grey()
+        );
+        if let Some(errors) = parsed["errors"].as_array() {
+            if !errors.is_empty() {
+                eprintln!("{}", format!("  {} error(s):", errors.len()).yellow());
+                for err in errors {
+                    eprintln!("    {}", err);
+                }
+            }
+        }
+    }
 }
diff --git a/src/datasets.rs b/src/datasets.rs
index 0dff510..bab4604 100644
--- a/src/datasets.rs
+++ b/src/datasets.rs
@@ -553,6 +553,47 @@ pub fn update(
     }
 }
 
+pub fn refresh(workspace_id: &str, dataset_id: &str, async_mode: bool) {
+    use crossterm::style::Stylize;
+
+    let mut body = json!({
+        "dataset_id": dataset_id,
+    });
+    if async_mode {
+        body["async"] = json!(true);
+    }
+
+    let api = ApiClient::new(Some(workspace_id));
+    let (status, resp_body) = api.post_raw("/refresh", &body);
+
+    if !status.is_success() {
+        eprintln!("{}", crate::util::api_error(resp_body).red());
+        std::process::exit(1);
+    }
+
+    let parsed: serde_json::Value = serde_json::from_str(&resp_body).unwrap_or_default();
+
+    if async_mode {
+        let job_id = parsed["id"].as_str().unwrap_or("unknown");
+        println!("{}", "Dataset refresh submitted.".green());
+        println!("job_id: {}", job_id);
+        println!(
+            "{}",
+            format!("Use 'hotdata jobs {}' to check status.", job_id).dark_grey()
+        );
+        return;
+    }
+
+    let id = parsed["id"].as_str().unwrap_or("unknown");
+    let version = parsed["version"].as_i64().unwrap_or(0);
+    let dataset_status = parsed["status"].as_str().unwrap_or("");
+    println!("{}", "Dataset refresh completed.".green());
+    println!(
+        "{}",
+        format!("  id: {id}, version: {version}, status: {dataset_status}").dark_grey()
+    );
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
diff --git a/src/embedding.rs b/src/embedding.rs
deleted file mode 100644
index 29b362e..0000000
--- a/src/embedding.rs
+++ /dev/null
@@ -1,123 +0,0 @@
-use serde_json::Value;
-
-/// Try to parse a vector from stdin. Accepts either:
-/// - A raw JSON array of numbers: [0.1, -0.2, ...]
-/// - An OpenAI-compatible response: {"data": [{"embedding": [...]}]}
-pub fn read_vector_from_stdin() -> Vec<f64> {
-    use std::io::Read;
-    let mut input = String::new();
-    std::io::stdin()
-        .read_to_string(&mut input)
-        .unwrap_or_else(|e| {
-            eprintln!("error reading stdin: {e}");
-            std::process::exit(1);
-        });
-
-    let input = input.trim();
-    if input.is_empty() {
-        eprintln!("error: no vector provided on stdin");
-        std::process::exit(1);
-    }
-
-    let parsed: Value = match serde_json::from_str(input) {
-        Ok(v) => v,
-        Err(e) => {
-            eprintln!("error parsing vector from stdin: {e}");
-            std::process::exit(1);
-        }
-    };
-
-    extract_vector(&parsed)
-}
-
-/// Extract a float vector from either a raw JSON array or an OpenAI embedding response.
-fn extract_vector(value: &Value) -> Vec<f64> {
-    // Raw array: [0.1, -0.2, ...]
-    if let Some(arr) = value.as_array() {
-        return parse_float_array(arr);
-    }
-
-    // OpenAI response: {"data": [{"embedding": [...]}]}
-    if let Some(embedding) = value
-        .get("data")
-        .and_then(|d| d.get(0))
-        .and_then(|d| d.get("embedding"))
-        .and_then(|e| e.as_array())
-    {
-        return parse_float_array(embedding);
-    }
-
-    eprintln!("error: stdin must be a JSON array of numbers or an OpenAI embedding response");
-    std::process::exit(1);
-}
-
-fn parse_float_array(arr: &[Value]) -> Vec<f64> {
-    arr.iter()
-        .enumerate()
-        .map(|(i, v)| {
-            v.as_f64().unwrap_or_else(|| {
-                eprintln!("error: vector element {i} is not a number: {v}");
-                std::process::exit(1);
-            })
-        })
-        .collect()
-}
-
-/// Call the OpenAI embeddings API to generate a vector from text.
-pub fn openai_embed(text: &str, model: &str) -> Vec<f64> {
-    let api_key = match std::env::var("OPENAI_API_KEY") {
-        Ok(k) if !k.is_empty() => k,
-        _ => {
-            eprintln!("error: OPENAI_API_KEY environment variable is not set");
-            std::process::exit(1);
-        }
-    };
-
-    let body = serde_json::json!({
-        "input": text,
-        "model": model,
-    });
-
-    let client = reqwest::blocking::Client::new();
-    let req = client
-        .post("https://api.openai.com/v1/embeddings")
-        .header("Authorization", format!("Bearer {api_key}"))
-        .json(&body);
-    let (status, resp_body) = match crate::util::send_debug(&client, req, Some(&body)) {
-        Ok(pair) => pair,
-        Err(e) => {
-            eprintln!("error connecting to OpenAI API: {e}");
-            std::process::exit(1);
-        }
-    };
-
-    if !status.is_success() {
-        let message = serde_json::from_str::<Value>(&resp_body)
-            .ok()
-            .and_then(|v| v["error"]["message"].as_str().map(str::to_string))
-            .unwrap_or(resp_body);
-        eprintln!("error from OpenAI API ({status}): {message}");
-        std::process::exit(1);
-    }
-
-    let parsed: Value = match serde_json::from_str(&resp_body) {
-        Ok(v) => v,
-        Err(e) => {
-            eprintln!("error parsing OpenAI response: {e}");
-            std::process::exit(1);
-        }
-    };
-
-    extract_vector(&parsed)
-}
-
-/// Format a vector as a SQL ARRAY literal: ARRAY[0.1,-0.2,...]
-pub fn vector_to_sql(vec: &[f64]) -> String {
-    format!(
-        "ARRAY[{}]",
-        vec.iter()
-            .map(|v| v.to_string())
-            .collect::<Vec<_>>()
-            .join(",")
-    )
-}
diff --git a/src/embedding_providers.rs b/src/embedding_providers.rs
new file mode 100644
index 0000000..335a741
--- /dev/null
+++ b/src/embedding_providers.rs
@@ -0,0 +1,264 @@
+use crate::api::ApiClient;
+use serde::{Deserialize, Serialize};
+
+#[derive(Deserialize, Serialize)]
+struct Provider {
+    id: String,
+    name: String,
+    provider_type: String,
+    config: serde_json::Value,
+    has_secret: bool,
+    source: String,
+    created_at: String,
+    updated_at: String,
+}
+
+#[derive(Deserialize)]
+struct ListResponse {
+    embedding_providers: Vec<Provider>,
+}
+
+fn parse_config(raw: Option<&str>) -> Option<serde_json::Value> {
+    use crossterm::style::Stylize;
+    raw.map(|s| match serde_json::from_str::<serde_json::Value>(s) {
+        Ok(v) => v,
+        Err(e) => {
+            eprintln!("{}", format!("--config is not valid JSON: {e}").red());
+            std::process::exit(1);
+        }
+    })
+}
+
+pub fn list(workspace_id: &str, format: &str) {
+    let api = ApiClient::new(Some(workspace_id));
+    let body: ListResponse = api.get("/embedding-providers");
+
+    use crossterm::style::Stylize;
+    match format {
+        "json" => println!(
+            "{}",
+            serde_json::to_string_pretty(&body.embedding_providers).unwrap()
+        ),
+        "yaml" => print!(
+            "{}",
+            serde_yaml::to_string(&body.embedding_providers).unwrap()
+        ),
+        "table" => {
+            if body.embedding_providers.is_empty() {
+                eprintln!("{}", "No embedding providers found.".dark_grey());
+                return;
+            }
+            let rows: Vec<Vec<String>> = body
+                .embedding_providers
+                .iter()
+                .map(|p| {
+                    vec![
+                        p.id.clone(),
+                        p.name.clone(),
+                        p.provider_type.clone(),
+                        p.source.clone(),
+                        if p.has_secret { "yes" } else { "no" }.to_string(),
+                    ]
+                })
+                .collect();
+            crate::table::print(&["ID", "NAME", "TYPE", "SOURCE", "SECRET"], &rows);
+        }
+        _ => unreachable!(),
+    }
+}
+
+pub fn get(workspace_id: &str, id: &str, format: &str) {
+    let api = ApiClient::new(Some(workspace_id));
+    let p: Provider = api.get(&format!("/embedding-providers/{id}"));
+
+    match format {
+        "json" => println!("{}", serde_json::to_string_pretty(&p).unwrap()),
+        "yaml" => print!("{}", serde_yaml::to_string(&p).unwrap()),
+        "table" => {
+            println!("id:           {}", p.id);
+            println!("name:         {}", p.name);
+            println!("type:         {}", p.provider_type);
+            println!("source:       {}", p.source);
+            println!("has_secret:   {}", p.has_secret);
+            println!("created_at:   {}", crate::util::format_date(&p.created_at));
+            println!("updated_at:   {}", crate::util::format_date(&p.updated_at));
+            println!(
+                "config:       {}",
+                serde_json::to_string_pretty(&p.config).unwrap_or_default()
+            );
+        }
+        _ => unreachable!(),
+    }
+}
+
+#[allow(clippy::too_many_arguments)]
+pub fn create(
+    workspace_id: &str,
+    name: &str,
+    provider_type: &str,
+    config: Option<&str>,
+    api_key: Option<&str>,
+    secret_name: Option<&str>,
+    format: &str,
+) {
+    use crossterm::style::Stylize;
+
+    let api = ApiClient::new(Some(workspace_id));
+    let mut body = serde_json::json!({
+        "name": name,
+        "provider_type": provider_type,
+    });
+    if let Some(cfg) = parse_config(config) {
+        body["config"] = cfg;
+    }
+    if let Some(k) = api_key {
+        body["api_key"] = serde_json::json!(k);
+    }
+    if let Some(s) = secret_name {
+        body["secret_name"] = serde_json::json!(s);
+    }
+
+    let (status, resp_body) = api.post_raw("/embedding-providers", &body);
+    if !status.is_success() {
+        eprintln!("{}", crate::util::api_error(resp_body).red());
+        std::process::exit(1);
+    }
+
+    let parsed: serde_json::Value = serde_json::from_str(&resp_body).unwrap_or_default();
+    eprintln!("{}", "Embedding provider created.".green());
+    match format {
+        "json" => println!("{}", serde_json::to_string_pretty(&parsed).unwrap()),
+        "yaml" => print!("{}", serde_yaml::to_string(&parsed).unwrap()),
+        "table" => {
+            println!("id:    {}", parsed["id"].as_str().unwrap_or(""));
+            println!("name:  {}", parsed["name"].as_str().unwrap_or(""));
+            println!(
+                "type:  {}",
+                parsed["provider_type"].as_str().unwrap_or("")
+            );
+        }
+        _ => unreachable!(),
+    }
+}
+
+pub fn update(
+    workspace_id: &str,
+    id: &str,
+    name: Option<&str>,
+    config: Option<&str>,
+    api_key: Option<&str>,
+    secret_name: Option<&str>,
+    format: &str,
+) {
+    use crossterm::style::Stylize;
+
+    if name.is_none() && config.is_none() && api_key.is_none() && secret_name.is_none() {
+        eprintln!(
+            "{}",
+            "error: provide at least one of --name, --config, --provider-api-key, --secret-name.".red()
+        );
+        std::process::exit(1);
+    }
+
+    let api = ApiClient::new(Some(workspace_id));
+    let mut body = serde_json::json!({});
+    if let Some(n) = name {
+        body["name"] = serde_json::json!(n);
+    }
+    if let Some(cfg) = parse_config(config) {
+        body["config"] = cfg;
+    }
+    if let Some(k) = api_key {
+        body["api_key"] = serde_json::json!(k);
+    }
+    if let Some(s) = secret_name {
+        body["secret_name"] = serde_json::json!(s);
+    }
+
+    let resp: serde_json::Value = api.put(&format!("/embedding-providers/{id}"), &body);
+    eprintln!("{}", "Embedding provider updated.".green());
+    match format {
+        "json" => println!("{}", serde_json::to_string_pretty(&resp).unwrap()),
+        "yaml" => print!("{}", serde_yaml::to_string(&resp).unwrap()),
+        "table" => {
+            println!("id:          {}", resp["id"].as_str().unwrap_or(""));
+            println!("name:        {}", resp["name"].as_str().unwrap_or(""));
+            if let Some(updated_at) = resp["updated_at"].as_str() {
+                println!("updated_at:  {}", crate::util::format_date(updated_at));
+            }
+        }
+        _ => unreachable!(),
+    }
+}
+
+pub fn delete(workspace_id: &str, id: &str) {
+    use crossterm::style::Stylize;
+    let api = ApiClient::new(Some(workspace_id));
+    let (status, resp_body) = api.delete_raw(&format!("/embedding-providers/{id}"));
+    if !status.is_success() {
+        eprintln!("{}", crate::util::api_error(resp_body).red());
+        std::process::exit(1);
+    }
+    println!("{}", format!("Embedding provider '{id}' deleted.").green());
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    /// Mirrors runtimedb's `EmbeddingProviderResponse` (see `runtimedb/openapi.yaml`).
+    /// If the server response shape changes, update this fixture in lockstep.
+    #[test]
+    fn provider_deserializes_runtimedb_payload() {
+        let body = serde_json::json!({
+            "id": "sys_emb_openai",
+            "name": "openai",
+            "provider_type": "service",
+            "config": {
+                "base_url": "https://api.openai.com/v1",
+                "metric": "cosine",
+                "model": "text-embedding-3-small"
+            },
+            "has_secret": true,
+            "source": "system",
+            "created_at": "2026-04-29T08:19:57.083658085Z",
+            "updated_at": "2026-04-29T08:19:57.083658085Z"
+        });
+        let p: Provider = serde_json::from_value(body).unwrap();
+        assert_eq!(p.id, "sys_emb_openai");
+        assert_eq!(p.provider_type, "service");
+        assert_eq!(p.source, "system");
+        assert!(p.has_secret);
+        assert_eq!(p.config["model"], "text-embedding-3-small");
+    }
+
+    /// Mirrors runtimedb's `ListEmbeddingProvidersResponse`.
+    #[test]
+    fn list_response_deserializes_runtimedb_payload() {
+        let body = serde_json::json!({
+            "embedding_providers": [
+                {
+                    "id": "sys_emb_openai",
+                    "name": "openai",
+                    "provider_type": "service",
+                    "config": {},
+                    "has_secret": true,
+                    "source": "system",
+                    "created_at": "2026-04-29T08:19:57Z",
+                    "updated_at": "2026-04-29T08:19:57Z"
+                }
+            ]
+        });
+        let resp: ListResponse = serde_json::from_value(body).unwrap();
+        assert_eq!(resp.embedding_providers.len(), 1);
+        assert_eq!(resp.embedding_providers[0].name, "openai");
+    }
+
+    #[test]
+    fn parse_config_rejects_invalid_json() {
+        // parse_config exits on invalid JSON, so we only verify the success path here.
+        let parsed = parse_config(Some(r#"{"key":"value"}"#));
+        assert_eq!(parsed.unwrap()["key"], "value");
+        assert!(parse_config(None).is_none());
+    }
+}
diff --git a/src/indexes.rs b/src/indexes.rs
index 8629311..b9473fa 100644
--- a/src/indexes.rs
+++ b/src/indexes.rs
@@ -128,6 +128,12 @@ fn list_one_table(api: &ApiClient, connection_id: &str, schema: &str, table: &st
     body.indexes
 }
 
+fn list_one_dataset(api: &ApiClient, dataset_id: &str) -> Vec<Index> {
+    let path = format!("/datasets/{dataset_id}/indexes");
+    let body: ListResponse = api.get(&path);
+    body.indexes
+}
+
 fn list_one_table_scan(
     api: &ApiClient,
     connection_id: &str,
@@ -146,12 +152,24 @@ pub fn list(
     connection_id: Option<&str>,
     schema: Option<&str>,
     table: Option<&str>,
+    dataset_id: Option<&str>,
     format: &str,
 ) {
     let api = ApiClient::new(Some(workspace_id));
 
-    let (rows, multi_table) = match (connection_id, schema, table) {
-        (Some(cid), Some(sch), Some(tbl)) => {
+    let (rows, multi_table) = match (dataset_id, connection_id, schema, table) {
+        (Some(did), _, _, _) => {
+            let indexes = list_one_dataset(&api, did);
+            let rows: Vec<IndexRow> = indexes
+                .into_iter()
+                .map(|i| IndexRow {
+                    inner: i,
+                    table: None,
+                })
+                .collect();
+            (rows, false)
+        }
+        (None, Some(cid), Some(sch), Some(tbl)) => {
             let indexes = list_one_table(&api, cid, sch, tbl);
             let rows: Vec<IndexRow> = indexes
                 .into_iter()
@@ -243,21 +261,85 @@ pub fn list(
     }
 }
 
+/// Where an index is being created or deleted.
+pub enum IndexScope<'a> {
+    Connection {
+        connection_id: &'a str,
+        schema: &'a str,
+        table: &'a str,
+    },
+    Dataset {
+        dataset_id: &'a str,
+    },
+}
+
+impl IndexScope<'_> {
+    fn create_path(&self) -> String {
+        match self {
+            IndexScope::Connection {
+                connection_id,
+                schema,
+                table,
+            } => format!("/connections/{connection_id}/tables/{schema}/{table}/indexes"),
+            IndexScope::Dataset { dataset_id } => format!("/datasets/{dataset_id}/indexes"),
+        }
+    }
+
+    fn delete_path(&self, index_name: &str) -> String {
+        match self {
+            IndexScope::Connection {
+                connection_id,
+                schema,
+                table,
+            } => format!(
+                "/connections/{connection_id}/tables/{schema}/{table}/indexes/{index_name}"
+            ),
+            IndexScope::Dataset { dataset_id } => {
+                format!("/datasets/{dataset_id}/indexes/{index_name}")
+            }
+        }
+    }
+}
+
 #[allow(clippy::too_many_arguments)]
 pub fn create(
     workspace_id: &str,
-    connection_id: &str,
-    schema: &str,
-    table: &str,
+    scope: IndexScope<'_>,
     name: &str,
     columns: &str,
     index_type: &str,
     metric: Option<&str>,
     async_mode: bool,
+    embedding_provider_id: Option<&str>,
+    dimensions: Option<u32>,
+    output_column: Option<&str>,
+    description: Option<&str>,
 ) {
-    let api = ApiClient::new(Some(workspace_id));
+    use crossterm::style::Stylize;
 
     let cols: Vec<&str> = columns.split(',').map(str::trim).collect();
+
+    let auto_embed_set = embedding_provider_id.is_some()
+        || dimensions.is_some()
+        || output_column.is_some()
+        || description.is_some();
+    if auto_embed_set && index_type != "vector" {
+        eprintln!(
+            "{}",
+            "--embedding-provider-id, --dimensions, --output-column, and --description are only valid with --type vector".red()
+        );
+        std::process::exit(1);
+    }
+    if index_type == "vector" && cols.len() != 1 {
+        eprintln!(
+            "{}",
+            "--type vector requires exactly one column in --columns".red()
+        );
+        std::process::exit(1);
+    }
+
+    let api = ApiClient::new(Some(workspace_id));
+
     let mut body = serde_json::json!({
         "index_name": name,
         "columns": cols,
@@ -267,31 +349,54 @@ pub fn create(
     if let Some(m) = metric {
         body["metric"] = serde_json::json!(m);
     }
+    if let Some(p) = embedding_provider_id {
+        body["embedding_provider_id"] = serde_json::json!(p);
+    }
+    if let Some(d) = dimensions {
+        body["dimensions"] = serde_json::json!(d);
+    }
+    if let Some(o) = output_column {
+        body["output_column"] = serde_json::json!(o);
+    }
+    if let Some(d) = description {
+        body["description"] = serde_json::json!(d);
+    }
 
-    let path = format!("/connections/{connection_id}/tables/{schema}/{table}/indexes");
-    let (status, resp_body) = api.post_raw(&path, &body);
+    let (status, resp_body) = api.post_raw(&scope.create_path(), &body);
 
     if !status.is_success() {
-        use crossterm::style::Stylize;
         eprintln!("{}", crate::util::api_error(resp_body).red());
         std::process::exit(1);
     }
 
-    use crossterm::style::Stylize;
     if async_mode {
         let parsed: serde_json::Value = serde_json::from_str(&resp_body).unwrap_or_default();
-        let job_id = parsed["job_id"].as_str().unwrap_or("unknown");
+        let job_id = parsed["id"].as_str().unwrap_or("unknown");
         println!("{}", "Index creation submitted.".green());
         println!("job_id: {}", job_id);
         println!(
             "{}",
-            "Use 'hotdata jobs <job_id>' to check status.".dark_grey()
+            format!("Use 'hotdata jobs {}' to check status.", job_id).dark_grey()
         );
     } else {
         println!("{}", "Index created.".green());
     }
 }
 
+pub fn delete(workspace_id: &str, scope: IndexScope<'_>, index_name: &str) {
+    use crossterm::style::Stylize;
+
+    let api = ApiClient::new(Some(workspace_id));
+    let (status, resp_body) = api.delete_raw(&scope.delete_path(index_name));
+
+    if !status.is_success() {
+        eprintln!("{}", crate::util::api_error(resp_body).red());
+        std::process::exit(1);
+    }
+
+    println!("{}", format!("Index '{}' deleted.", index_name).green());
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
@@ -304,6 +409,35 @@ mod tests {
         ));
     }
 
+    #[test]
+    fn index_scope_connection_paths() {
+        let scope = IndexScope::Connection {
+            connection_id: "conn1",
+            schema: "public",
+            table: "users",
+        };
+        assert_eq!(
+            scope.create_path(),
+            "/connections/conn1/tables/public/users/indexes"
+        );
+        assert_eq!(
+            scope.delete_path("idx_email"),
+            "/connections/conn1/tables/public/users/indexes/idx_email"
+        );
+    }
+
+    #[test]
+    fn index_scope_dataset_paths() {
+        let scope = IndexScope::Dataset {
+            dataset_id: "data_xyz",
+        };
+        assert_eq!(scope.create_path(), "/datasets/data_xyz/indexes");
+        assert_eq!(
+            scope.delete_path("idx_title"),
+            "/datasets/data_xyz/indexes/idx_title"
+        );
+    }
+
     #[test]
     fn information_schema_followup_breaks_when_more_but_no_cursor() {
         assert!(matches!(
diff --git a/src/main.rs b/src/main.rs
index d213fdc..f78d329 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -6,7 +6,7 @@ mod connections;
 mod connections_new;
 mod context;
 mod datasets;
-mod embedding;
+mod embedding_providers;
 mod indexes;
 mod jobs;
 mod jwt;
@@ -24,8 +24,9 @@ use anstyle::AnsiColor;
 use clap::{Parser, builder::Styles};
 use command::{
     AuthCommands, Commands, ConnectionsCommands, ConnectionsCreateCommands, ContextCommands,
-    DatasetsCommands, IndexesCommands, JobsCommands, QueriesCommands, QueryCommands,
-    ResultsCommands, SandboxCommands, SkillCommands, TablesCommands, WorkspaceCommands,
+    DatasetsCommands, EmbeddingProvidersCommands, IndexesCommands, JobsCommands, QueriesCommands,
+    QueryCommands, ResultsCommands, SandboxCommands, SkillCommands, TablesCommands,
+    WorkspaceCommands,
 };
 
 #[derive(Parser)]
@@ -173,6 +174,9 @@ fn main() {
                             table_name.as_deref(),
                             &output,
                         ),
+                        Some(DatasetsCommands::Refresh { id, r#async }) => {
+                            datasets::refresh(&workspace_id, &id, r#async)
+                        }
                         None => {
                             use clap::CommandFactory;
                             let mut cmd = Cli::command();
@@ -270,9 +274,22 @@ fn main() {
                                 )
                             }
                         },
-                        Some(ConnectionsCommands::Refresh { connection_id }) => {
-                            connections::refresh(&workspace_id, &connection_id)
-                        }
+                        Some(ConnectionsCommands::Refresh {
+                            connection_id,
+                            data,
+                            schema,
+                            table,
+                            r#async,
+                            include_uncached,
+                        }) => connections::refresh(
+                            &workspace_id,
+                            &connection_id,
+                            data,
+                            schema.as_deref(),
+                            table.as_deref(),
+                            r#async,
+                            include_uncached,
+                        ),
                         None => {
                             use clap::CommandFactory;
                             let mut cmd = Cli::command();
@@ -393,92 +410,186 @@ fn main() {
                         connection_id,
                         schema,
                         table,
+                        dataset_id,
                         output,
                     } => indexes::list(
                         &workspace_id,
                         connection_id.as_deref(),
                         schema.as_deref(),
                         table.as_deref(),
+                        dataset_id.as_deref(),
                         &output,
                     ),
                     IndexesCommands::Create {
                         connection_id,
                         schema,
                         table,
+                        dataset_id,
                         name,
                         columns,
                         r#type,
                         metric,
                         r#async,
-                    } => indexes::create(
+                        embedding_provider_id,
+                        dimensions,
+                        output_column,
+                        description,
+                    } => {
+                        let scope = match (
+                            dataset_id.as_deref(),
+                            connection_id.as_deref(),
+                            schema.as_deref(),
+                            table.as_deref(),
+                        ) {
+                            (Some(did), _, _, _) => indexes::IndexScope::Dataset { dataset_id: did },
+                            (None, Some(cid), Some(sch), Some(tbl)) => {
+                                indexes::IndexScope::Connection {
+                                    connection_id: cid,
+                                    schema: sch,
+                                    table: tbl,
+                                }
+                            }
+                            _ => {
+                                eprintln!(
+                                    "error: provide either --dataset-id or all three of --connection-id, --schema, --table"
+                                );
+                                std::process::exit(1);
+                            }
+                        };
+                        indexes::create(
+                            &workspace_id,
+                            scope,
+                            &name,
+                            &columns,
+                            &r#type,
+                            metric.as_deref(),
+                            r#async,
+                            embedding_provider_id.as_deref(),
+                            dimensions,
+                            output_column.as_deref(),
+                            description.as_deref(),
+                        )
+                    }
+                    IndexesCommands::Delete {
+                        connection_id,
+                        schema,
+                        table,
+                        dataset_id,
+                        name,
+                    } => {
+                        let scope = match (
+                            dataset_id.as_deref(),
+                            connection_id.as_deref(),
+                            schema.as_deref(),
+                            table.as_deref(),
+                        ) {
+                            (Some(did), _, _, _) => indexes::IndexScope::Dataset { dataset_id: did },
+                            (None, Some(cid), Some(sch), Some(tbl)) => {
+                                indexes::IndexScope::Connection {
+                                    connection_id: cid,
+                                    schema: sch,
+                                    table: tbl,
+                                }
+                            }
+                            _ => {
+                                eprintln!(
+                                    "error: provide either --dataset-id or all three of --connection-id, --schema, --table"
+                                );
+                                std::process::exit(1);
+                            }
+                        };
+                        indexes::delete(&workspace_id, scope, &name);
+                    }
+                }
+            }
+            Commands::EmbeddingProviders {
+                workspace_id,
+                command,
+            } => {
+                let workspace_id = resolve_workspace(workspace_id);
+                match command {
+                    EmbeddingProvidersCommands::List { output } => {
+                        embedding_providers::list(&workspace_id, &output)
+                    }
+                    EmbeddingProvidersCommands::Get { id, output } => {
+                        embedding_providers::get(&workspace_id, &id, &output)
+                    }
+                    EmbeddingProvidersCommands::Create {
+                        name,
+                        provider_type,
+                        config,
+                        provider_api_key,
+                        secret_name,
+                        output,
+                    } => embedding_providers::create(
                         &workspace_id,
-                        &connection_id,
-                        &schema,
-                        &table,
                         &name,
-                        &columns,
-                        &r#type,
-                        metric.as_deref(),
-                        r#async,
+                        &provider_type,
+                        config.as_deref(),
+                        provider_api_key.as_deref(),
+                        secret_name.as_deref(),
+                        &output,
+                    ),
+                    EmbeddingProvidersCommands::Update {
+                        id,
+                        name,
+                        config,
+                        provider_api_key,
+                        secret_name,
+                        output,
+                    } => embedding_providers::update(
+                        &workspace_id,
+                        &id,
+                        name.as_deref(),
+                        config.as_deref(),
+                        provider_api_key.as_deref(),
+                        secret_name.as_deref(),
+                        &output,
                     ),
+                    EmbeddingProvidersCommands::Delete { id } => {
+                        embedding_providers::delete(&workspace_id, &id)
+                    }
                 }
             }
             Commands::Search {
                 query,
+                r#type,
                 table,
                 column,
                 select,
                 limit,
-                model,
                 workspace_id,
                 output,
             } => {
                 let workspace_id = resolve_workspace(workspace_id);
                 let select_cols = select.as_deref().unwrap_or("*");
 
-                // Determine search mode:
-                // 1. --model flag: embed the query text via the model provider
-                // 2. No query + piped stdin: read vector from stdin
-                // 3. Query text without --model: BM25 text search
-                let sql = if let Some(ref model_name) = model {
-                    let query_text = match query {
-                        Some(ref q) => q.as_str(),
-                        None => {
-                            eprintln!("error: --model requires a search query text");
-                            std::process::exit(1);
-                        }
-                    };
-                    let vec = embedding::openai_embed(query_text, model_name);
-                    let vec_str = embedding::vector_to_sql(&vec);
-                    format!(
-                        "SELECT {}, l2_distance({}, {}) as dist FROM {} ORDER BY dist LIMIT {}",
-                        select_cols, column, vec_str, table, limit,
-                    )
-                } else if let Some(q) = query.as_ref() {
-                    let bm25_columns = match select.as_deref() {
-                        Some(cols) => format!("{}, score", cols),
-                        None => "*".to_string(),
-                    };
-                    format!(
-                        "SELECT {} FROM bm25_search('{}', '{}', '{}') ORDER BY score DESC LIMIT {}",
-                        bm25_columns,
-                        table.replace('\'', "''"),
-                        column.replace('\'', "''"),
-                        q.replace('\'', "''"),
-                        limit,
-                    )
-                } else {
-                    use std::io::IsTerminal;
-                    if std::io::stdin().is_terminal() {
-                        eprintln!("error: provide a search query or pipe a vector via stdin");
-                        std::process::exit(1);
+                let sql = match r#type.as_str() {
+                    "bm25" => {
+                        let bm25_columns = match select.as_deref() {
+                            Some(cols) => format!("{}, score", cols),
+                            None => "*".to_string(),
+                        };
+                        format!(
+                            "SELECT {} FROM bm25_search('{}', '{}', '{}') ORDER BY score DESC LIMIT {}",
+                            bm25_columns,
+                            table.replace('\'', "''"),
+                            column.replace('\'', "''"),
+                            query.replace('\'', "''"),
+                            limit,
+                        )
                     }
-                    let vec = embedding::read_vector_from_stdin();
-                    let vec_str = embedding::vector_to_sql(&vec);
-                    format!(
-                        "SELECT {}, l2_distance({}, {}) as dist FROM {} ORDER BY dist LIMIT {}",
-                        select_cols, column, vec_str, table, limit,
-                    )
+                    // Server-side vector_distance: resolves the embedding column, model,
+                    // and metric from the index metadata. The user names the source text column.
+                    "vector" => format!(
+                        "SELECT {}, vector_distance({}, '{}') AS dist FROM {} ORDER BY dist LIMIT {}",
+                        select_cols,
+                        column,
+                        query.replace('\'', "''"),
+                        table,
+                        limit,
+                    ),
+                    _ => unreachable!(),
                 };
                 query::execute(&sql, &workspace_id, None, &output)
             }