Skip to content

feat(embedded): Database::new_databricks() + dialect plumbing (Phase 2.2b)#331

Merged
genezhang merged 2 commits into
mainfrom
feat/embedded-databricks-database
May 16, 2026
Merged

feat(embedded): Database::new_databricks() + dialect plumbing (Phase 2.2b)#331
genezhang merged 2 commits into
mainfrom
feat/embedded-databricks-database

Conversation

@genezhang
Copy link
Copy Markdown
Owner

Summary

Wires the Phase 2.1 executor into the clickgraph-embedded API. End-to-end, this is the first time Cypher → Spark SQL → Databricks Statement Execution API → Spark JSON → Value works through the public API. A user enabling the databricks feature can now write:

let db = Database::new_databricks("schema.yaml", DatabricksConfig::new(host, wh, token))?;
let conn = Connection::new(&db)?;
let result = conn.query_remote("MATCH (u:User) RETURN u.name LIMIT 10")?;

…and the SQL crossing the wire is correct Spark SQL — collect_list, backtick aliases, the works.

What ships

  • clickgraph-embedded/Cargo.toml gains a databricks feature that forwards to clickgraph/databricks. wiremock is added as a normal dev-dep (the test itself stays inside #[cfg(feature = \"databricks\")]).
  • Database gains a dialect: SqlDialect field. All existing constructors default to ClickHouse; the new new_databricks() constructor (feature-gated) sets it to Databricks. The field is what the renderer reads at SQL-emission time via the task-local QueryContext.
  • The load-bearing line: Connection::query_to_sql and query_with_executor_async now stamp set_current_dialect(self.db.dialect) after entering the with_query_context scope. This is what flips the Phase 1.x routing to Spark spellings end-to-end.
  • Test-helper struct literals in connection.rs get the new field initialized to ClickHouse so existing unit tests stay green.

Test plan

  • cargo fmt --all --check clean
  • cargo clippy -p clickgraph -p clickgraph-embedded -p clickgraph-client -p clickgraph-tool --all-targets -- -D warnings clean (default)
  • same with --features clickgraph-embedded/databricks,clickgraph/databricks clean
  • cargo test -p clickgraph --lib — 1369/1369 (unchanged)
  • cargo test -p clickgraph --features databricks --lib — 1384/1384 (unchanged)
  • cargo test -p clickgraph-embedded --tests — 12/12 (10 existing + 2 new e2e, latter only with --features databricks)

End-to-end tests

Two wiremock-backed tests in clickgraph-embedded/tests/databricks_e2e.rs:

  • query_remote_against_databricks_mock_returns_rows_and_uses_spark_dialect: full POST/parse cycle with PAT auth, asserting column names and row count.
  • databricks_database_emits_spark_sql_for_collect: pulls the captured request body out of wiremock and asserts the submitted SQL contains collect_list( (Spark) and NOT groupArray( (CH). If a future regression drops the set_current_dialect call, this test fails loudly before any wire-format assertion does.

Doctest failures in clickgraph-embedded are pre-existing on main (they call Database::new without --features embedded) — verified by stashing this PR's changes and re-running.

🤖 Generated with Claude Code

…2.2b)

Wires the Phase 2.1 Databricks executor into the `clickgraph-embedded`
API so callers can construct a Databricks-backed database and run
queries through the existing `Connection::query_remote` path. End-to-end,
this is the first time Cypher → Spark SQL → Databricks Statement
Execution API → Spark JSON → `Value` works through the public API.

Changes:
- `clickgraph-embedded/Cargo.toml`: new `databricks` feature that
  forwards to `clickgraph/databricks`. `wiremock` added as a normal
  dev-dep (gated test stays inside `#[cfg(feature = "databricks")]`).
- `Database` gains a `dialect: SqlDialect` field. All existing
  constructors default to `ClickHouse`; the new `new_databricks()`
  constructor (feature-gated) sets it to `Databricks`.
- `Connection::query_to_sql` and `query_with_executor_async` now stamp
  `set_current_dialect(self.db.dialect)` after entering the
  `with_query_context` scope. This is the load-bearing line that
  routes all the Phase 1.x FunctionMapper sites to Spark spellings
  when the database is Databricks-backed.
- Three test-helper `Database { .. }` literals in `connection.rs` get
  the new field initialized.
- `clickgraph_embedded` re-exports `DatabricksConfig` and
  `DatabricksSqlExecutor` under the feature gate.

Two end-to-end wiremock tests in `tests/databricks_e2e.rs`:
- `query_remote_against_databricks_mock_returns_rows_and_uses_spark_dialect`:
  full POST/parse cycle with PAT auth, asserting columns and row count.
- `databricks_database_emits_spark_sql_for_collect`: inspects the SQL
  that wiremock captured to verify `collect()` emits `collect_list(...)`
  (Spark) not `groupArray(...)` (CH) — pins the dialect plumbing so a
  future regression that drops the `set_current_dialect` call fails
  loudly.

Test summary: clickgraph 1369/1369 default and 1384/1384 with
`--features databricks` (unchanged from PR #330). clickgraph-embedded
gains 2 e2e tests under `--features databricks`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 16, 2026 06:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Databricks support to the embedded API by exposing a feature-gated constructor and plumbing SQL dialect selection into the main tabular query paths.

Changes:

  • Adds clickgraph-embedded/databricks feature and public Databricks re-exports.
  • Adds Database::new_databricks() with Databricks executor and dialect selection.
  • Stamps the embedded query context with the database dialect for query_to_sql() and query_remote() paths, plus new wiremock E2E tests.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
clickgraph-embedded/tests/databricks_e2e.rs Adds Databricks mock E2E tests for remote queries and Spark SQL emission.
clickgraph-embedded/src/lib.rs Re-exports Databricks config/executor behind the feature flag.
clickgraph-embedded/src/database.rs Adds dialect field and Databricks constructor.
clickgraph-embedded/src/connection.rs Applies database dialect in SQL generation/execution paths and updates test fixtures.
clickgraph-embedded/Cargo.toml Adds the Databricks feature and wiremock dev-dependency.
Cargo.lock Records the new wiremock dependency edge for clickgraph-embedded.

schema: Arc::new(graph_schema),
runtime,
executor_kind: clickgraph::query_planner::write_guard::ExecutorKind::Remote,
dialect: SqlDialect::Databricks,
Comment on lines +65 to +74
// Match a POST that has the Spark-style backtick alias quoting in
// its `statement` field. This is the load-bearing assertion that
// the dialect plumbing actually flipped — under the CH dialect the
// emitted SQL would use double-quoted aliases (`AS "..."`), which
// would NOT match here.
Mock::given(method("POST"))
.and(path("/api/2.0/sql/statements"))
.and(bearer_token("dapi-token"))
.and(body_partial_json(json!({ "warehouse_id": "wh-test" })))
.respond_with(ResponseTemplate::new(200).set_body_json(json!({

let (cols, rows) = result;
assert_eq!(cols, vec!["u.user_id", "u.name"]);
assert_eq!(rows.len(), 2);
pub use database::StorageCredentials;
pub use database::{Database, RemoteConfig, SystemConfig};
#[cfg(feature = "databricks")]
pub use database::{DatabricksConfig, DatabricksSqlExecutor};
Addresses 4 Copilot comments on PR #331:

1. Security: `DatabricksConfig` now redacts the PAT in its manual
   `Debug` impl (matches `RemoteConfig`'s password redaction). Logging
   the config via `{:?}` prints `********` for the token, not the raw
   value. New unit test `debug_redacts_token` pins the contract.

2. `query_graph_async` was missing `set_current_dialect()`, so
   `Connection::query_remote_graph()` against a Databricks-backed
   database silently fell back to ClickHouse SQL generation. Now
   stamps the database dialect, same as `query_with_executor_async`
   and `query_to_sql`.

3. `query_remote_against_databricks_mock` now asserts cell values, not
   just column names and row counts. Renamed away from "uses Spark
   dialect" since the body inspection lives in the next test — the
   docstring now reflects what's actually verified here (response
   wiring + JSON→Value conversion).

4. New `query_remote_graph_under_databricks_uses_spark_dialect` test
   covers the graph path by inspecting the captured request body for
   backtick alias quoting (Spark) vs double quotes (CH). If a future
   regression drops the new `set_current_dialect` call in
   `query_graph_async` this assertion fails before any wire-format
   check does.

clickgraph: 1385/1385 with `--features databricks` (was 1384 — added
debug_redacts_token). clickgraph-embedded: 3/3 e2e tests under
`--features databricks` (was 2 — added query_remote_graph variant).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@genezhang genezhang merged commit b9847eb into main May 16, 2026
4 checks passed
@genezhang genezhang deleted the feat/embedded-databricks-database branch May 16, 2026 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants