feat(embedded): Database::new_databricks() + dialect plumbing (Phase 2.2b)#331
Merged
Conversation
…2.2b)
Wires the Phase 2.1 Databricks executor into the `clickgraph-embedded`
API so callers can construct a Databricks-backed database and run
queries through the existing `Connection::query_remote` path. End-to-end,
this is the first time Cypher → Spark SQL → Databricks Statement
Execution API → Spark JSON → `Value` works through the public API.
Changes:
- `clickgraph-embedded/Cargo.toml`: new `databricks` feature that
forwards to `clickgraph/databricks`. `wiremock` added as a normal
dev-dep (gated test stays inside `#[cfg(feature = "databricks")]`).
- `Database` gains a `dialect: SqlDialect` field. All existing
constructors default to `ClickHouse`; the new `new_databricks()`
constructor (feature-gated) sets it to `Databricks`.
- `Connection::query_to_sql` and `query_with_executor_async` now stamp
`set_current_dialect(self.db.dialect)` after entering the
`with_query_context` scope. This is the load-bearing line that
routes all the Phase 1.x FunctionMapper sites to Spark spellings
when the database is Databricks-backed.
- Three test-helper `Database { .. }` literals in `connection.rs` get
the new field initialized.
- `clickgraph_embedded` re-exports `DatabricksConfig` and
`DatabricksSqlExecutor` under the feature gate.
Two end-to-end wiremock tests in `tests/databricks_e2e.rs`:
- `query_remote_against_databricks_mock_returns_rows_and_uses_spark_dialect`:
full POST/parse cycle with PAT auth, asserting columns and row count.
- `databricks_database_emits_spark_sql_for_collect`: inspects the SQL
that wiremock captured to verify `collect()` emits `collect_list(...)`
(Spark) not `groupArray(...)` (CH) — pins the dialect plumbing so a
future regression that drops the `set_current_dialect` call fails
loudly.
Test summary: clickgraph 1369/1369 default and 1384/1384 with
`--features databricks` (unchanged from PR #330). clickgraph-embedded
gains 2 e2e tests under `--features databricks`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds Databricks support to the embedded API by exposing a feature-gated constructor and plumbing SQL dialect selection into the main tabular query paths.
Changes:
- Adds
clickgraph-embedded/databricksfeature and public Databricks re-exports. - Adds
Database::new_databricks()with Databricks executor and dialect selection. - Stamps the embedded query context with the database dialect for
query_to_sql()andquery_remote()paths, plus new wiremock E2E tests.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
clickgraph-embedded/tests/databricks_e2e.rs |
Adds Databricks mock E2E tests for remote queries and Spark SQL emission. |
clickgraph-embedded/src/lib.rs |
Re-exports Databricks config/executor behind the feature flag. |
clickgraph-embedded/src/database.rs |
Adds dialect field and Databricks constructor. |
clickgraph-embedded/src/connection.rs |
Applies database dialect in SQL generation/execution paths and updates test fixtures. |
clickgraph-embedded/Cargo.toml |
Adds the Databricks feature and wiremock dev-dependency. |
Cargo.lock |
Records the new wiremock dependency edge for clickgraph-embedded. |
| schema: Arc::new(graph_schema), | ||
| runtime, | ||
| executor_kind: clickgraph::query_planner::write_guard::ExecutorKind::Remote, | ||
| dialect: SqlDialect::Databricks, |
Comment on lines
+65
to
+74
| // Match a POST that has the Spark-style backtick alias quoting in | ||
| // its `statement` field. This is the load-bearing assertion that | ||
| // the dialect plumbing actually flipped — under the CH dialect the | ||
| // emitted SQL would use double-quoted aliases (`AS "..."`), which | ||
| // would NOT match here. | ||
| Mock::given(method("POST")) | ||
| .and(path("/api/2.0/sql/statements")) | ||
| .and(bearer_token("dapi-token")) | ||
| .and(body_partial_json(json!({ "warehouse_id": "wh-test" }))) | ||
| .respond_with(ResponseTemplate::new(200).set_body_json(json!({ |
|
|
||
| let (cols, rows) = result; | ||
| assert_eq!(cols, vec!["u.user_id", "u.name"]); | ||
| assert_eq!(rows.len(), 2); |
| pub use database::StorageCredentials; | ||
| pub use database::{Database, RemoteConfig, SystemConfig}; | ||
| #[cfg(feature = "databricks")] | ||
| pub use database::{DatabricksConfig, DatabricksSqlExecutor}; |
Addresses 4 Copilot comments on PR #331: 1. Security: `DatabricksConfig` now redacts the PAT in its manual `Debug` impl (matches `RemoteConfig`'s password redaction). Logging the config via `{:?}` prints `********` for the token, not the raw value. New unit test `debug_redacts_token` pins the contract. 2. `query_graph_async` was missing `set_current_dialect()`, so `Connection::query_remote_graph()` against a Databricks-backed database silently fell back to ClickHouse SQL generation. Now stamps the database dialect, same as `query_with_executor_async` and `query_to_sql`. 3. `query_remote_against_databricks_mock` now asserts cell values, not just column names and row counts. Renamed away from "uses Spark dialect" since the body inspection lives in the next test — the docstring now reflects what's actually verified here (response wiring + JSON→Value conversion). 4. New `query_remote_graph_under_databricks_uses_spark_dialect` test covers the graph path by inspecting the captured request body for backtick alias quoting (Spark) vs double quotes (CH). If a future regression drops the new `set_current_dialect` call in `query_graph_async` this assertion fails before any wire-format check does. clickgraph: 1385/1385 with `--features databricks` (was 1384 — added debug_redacts_token). clickgraph-embedded: 3/3 e2e tests under `--features databricks` (was 2 — added query_remote_graph variant). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wires the Phase 2.1 executor into the
clickgraph-embeddedAPI. End-to-end, this is the first time Cypher → Spark SQL → Databricks Statement Execution API → Spark JSON →Valueworks through the public API. A user enabling thedatabricksfeature can now write:…and the SQL crossing the wire is correct Spark SQL —
collect_list, backtick aliases, the works.What ships
clickgraph-embedded/Cargo.tomlgains adatabricksfeature that forwards toclickgraph/databricks.wiremockis added as a normal dev-dep (the test itself stays inside#[cfg(feature = \"databricks\")]).Databasegains adialect: SqlDialectfield. All existing constructors default toClickHouse; the newnew_databricks()constructor (feature-gated) sets it toDatabricks. The field is what the renderer reads at SQL-emission time via the task-localQueryContext.Connection::query_to_sqlandquery_with_executor_asyncnow stampset_current_dialect(self.db.dialect)after entering thewith_query_contextscope. This is what flips the Phase 1.x routing to Spark spellings end-to-end.connection.rsget the new field initialized toClickHouseso existing unit tests stay green.Test plan
cargo fmt --all --checkcleancargo clippy -p clickgraph -p clickgraph-embedded -p clickgraph-client -p clickgraph-tool --all-targets -- -D warningsclean (default)--features clickgraph-embedded/databricks,clickgraph/databrickscleancargo test -p clickgraph --lib— 1369/1369 (unchanged)cargo test -p clickgraph --features databricks --lib— 1384/1384 (unchanged)cargo test -p clickgraph-embedded --tests— 12/12 (10 existing + 2 new e2e, latter only with--features databricks)End-to-end tests
Two
wiremock-backed tests inclickgraph-embedded/tests/databricks_e2e.rs:query_remote_against_databricks_mock_returns_rows_and_uses_spark_dialect: full POST/parse cycle with PAT auth, asserting column names and row count.databricks_database_emits_spark_sql_for_collect: pulls the captured request body out of wiremock and asserts the submitted SQL containscollect_list((Spark) and NOTgroupArray((CH). If a future regression drops theset_current_dialectcall, this test fails loudly before any wire-format assertion does.Doctest failures in
clickgraph-embeddedare pre-existing on main (they callDatabase::newwithout--features embedded) — verified by stashing this PR's changes and re-running.🤖 Generated with Claude Code