Skip to content

Phase 7e: JSON column type + path queries#54

Merged
joaoh82 merged 1 commit intomainfrom
feat/json-column-type
Apr 29, 2026
Merged

Phase 7e: JSON column type + path queries#54
joaoh82 merged 1 commit intomainfrom
feat/json-column-type

Conversation

@joaoh82
Copy link
Copy Markdown
Owner

@joaoh82 joaoh82 commented Apr 28, 2026

Summary

  • New JSON (and JSONB alias) column type — stored as canonical UTF-8 text, validated via serde_json::from_str at INSERT/UPDATE time. Same shape as SQLite's JSON1 extension.
  • Four path-aware functions: json_extract / json_type / json_array_length / json_object_keys, with a small JSONPath subset ($, .key, [N], chained).
  • 11 new tests; 245/245 engine + 9 connection + 8 FFI + 4 doctests pass.

Why text-storage instead of bincoded serde_json::Value?

Phase 7 plan Q3 originally proposed bincode, but bincode was removed from the engine in Phase 3c when cell-based encoding replaced it. Rather than re-add bincode for one column type, JSON-as-canonical-text matches SQLite and reuses the existing Text storage path. The scope correction is recorded inline in docs/phase-7-plan.md.

What landed

Layer Change
Type system DataType::Json variant; JSONB parses as an alias
Validation INSERT/UPDATE call serde_json::from_str::<Value>; malformed JSON rejected with a typed error before any write
Functions json_extract, json_type, json_array_length, json_object_keys (~370 LOC in src/sql/executor.rs)
Path parser walk_json_path: $, .key, [N], chained. Well-formed but unresolved → NULL; malformed → typed error
Persistence Round-trips through save/open via the existing Text path. New test: round_trip_preserves_json_column.

Notable divergence from SQLite

json_object_keys is a table-valued function in real SQLite (one row per key). SQLRite has no set-returning functions yet, so we return the keys as a JSON-array text (e.g. '[\"a\",\"b\",\"c\"]'). Documented in supported-sql.md.

Test plan

  • cargo fmt --all -- --check — clean
  • cargo check --workspace --exclude sqlrite-desktop --all-targets — clean
  • cargo test --workspace --exclude sqlrite-desktop --exclude sqlrite-python --exclude sqlrite-nodejs — 245 + 9 + 8 + 4 pass
  • cargo clippy --workspace --exclude sqlrite-desktop --all-targets — 0 errors, 38 pre-existing warnings (none new)
  • CI: rust matrix (linux/macos/windows), python-sdk, nodejs-sdk, go-sdk, wasm-build, desktop-build all green
  • Smoke: cargo run --example quickstart still builds
  • Manual REPL check: CREATE TABLE events (id INTEGER PRIMARY KEY, payload JSON), INSERT both well-formed and malformed JSON, run json_extract/json_type/json_array_length/json_object_keys against the well-formed row, save + reopen + repeat

Docs touched

  • docs/roadmap.md — 7e flipped to ✅ with corrected storage description
  • docs/supported-sql.md — JSON column type added to types table; four new function rows; new "JSON path syntax" subsection with runnable example
  • docs/phase-7-plan.md — Q3 scope correction recorded inline

Cleanup

Throwaway examples/rust/sqlparser_probe.rs (used to inspect sqlparser's DataType::JSON / DataType::Custom AST shapes during design) deleted along with its [[example]] entry in Cargo.toml.

Next up after merge

Cut v0.1.16 Release PR via release-pr.yml dispatch, then pick from:

  • 7g (ask() API across the product surface — natural-language → SQL via Anthropic)
  • 7h (MCP server adapter — new sqlrite-mcp binary)
  • 7b.1 (deferred operator forms <-> <=> <#>)

🤖 Generated with Claude Code

Adds the JSON storage class and four path-aware query functions, closing
the second of Phase 7's two storage primitives (the first was VECTOR(N)
in 7a). Shape mirrors SQLite's JSON1 extension — JSON values store as
canonical UTF-8 text, validated via `serde_json::from_str` at INSERT and
UPDATE time. Phase 7 plan Q3 originally proposed bincoded `serde_json::
Value`, but bincode was removed from the engine in Phase 3c (cell-based
encoding replaced it); rather than re-add bincode for one column type,
JSON-as-text matches SQLite's choice and reuses the existing Text storage
path. Q3 in `docs/phase-7-plan.md` records the scope correction inline.

Engine surface:

- `DataType::Json` variant alongside `Vector(N)`. `JSONB` parses as an
  alias (Postgres convention; both store as text in our case).
- INSERT/UPDATE on a JSON column runs `serde_json::from_str::<Value>`;
  malformed JSON is rejected with `Type mismatch: expected JSON for
  column 'foo': <serde error>`. NULLs pass through untouched.
- UNIQUE on a JSON column treats the value as raw text (string equality
  on the canonical form).
- `table_to_create_sql` round-trips JSON columns; `build_empty_table`,
  `Row::Text(BTreeMap::new())` storage, and the `clone_datatype` helpers
  in `executor.rs` and `pager/mod.rs` all gained the new arm.

Functions (executor.rs, ~370 LOC):

- `json_extract(json[, path])` — walks the path, returns the resolved
  node coerced to the closest SQL type. Strings → TEXT, numbers →
  INTEGER/REAL, booleans → BOOLEAN, `null` → NULL, composites
  (object/array) → canonical JSON text.
- `json_type(json[, path])` — returns one of `'object'`, `'array'`,
  `'string'`, `'integer'`, `'real'`, `'true'`, `'false'`, `'null'`.
- `json_array_length(json[, path])` — element count; errors if the
  resolved node isn't an array.
- `json_object_keys(json[, path])` — keys as a JSON-array text in
  insertion order (e.g. `'["a","b","c"]'`). Diverges from SQLite's
  table-valued shape — we have no set-returning functions yet, so the
  array-text form is the cleanest analog. Documented in supported-sql.md.

Path subset implemented in `walk_json_path`: `$` (root, default), `.key`
(bare-key object access), `[N]` (0-based array index), chained. A
malformed path (unbalanced brackets, missing `$`) is a typed runtime
error; a well-formed path that doesn't resolve returns NULL. The path
parser shares a single string-walking pass — small enough to inline,
not enough surface to deserve its own module yet.

Object-key insertion order is preserved end-to-end via `serde_json`'s
`preserve_order` feature, so `json_object_keys` output is stable across
re-parses.

Tests: 11 new (10 in `src/sql/mod.rs`, 1 round-trip-after-save in
`src/sql/pager/mod.rs`):

- json_column_round_trip_primitive_values
- json_insert_rejects_invalid_json
- json_extract_object_field
- json_extract_array_index_and_nested
- json_extract_missing_path_returns_null
- json_extract_malformed_path_errors
- json_array_length_on_array
- json_array_length_on_non_array_errors
- json_type_recognizes_each_kind (covers all 7 types)
- update_on_json_column_revalidates
- round_trip_preserves_json_column

245 engine tests + 9 connection + 8 FFI + 4 doctests pass. `cargo
clippy` clean across the new code (existing pre-Phase-7e warnings
unchanged — same as Phase 6b retrospective).

Docs:

- `docs/roadmap.md` — 7e bullet flipped to ✅ with the corrected
  storage description (canonical text, not bincoded).
- `docs/supported-sql.md` — JSON / JSONB row added to the column-types
  table; four `json_*` functions documented in the built-in functions
  table; new "JSON path syntax" subsection covers the supported tokens
  with a runnable example.
- `docs/phase-7-plan.md` — Q3 scope correction recorded inline; 7e
  section header marked ✅.

Throwaway `examples/rust/sqlparser_probe.rs` deleted along with its
`[[example]]` entry in `Cargo.toml` (the file was used to inspect
sqlparser's AST shapes for `DataType::JSON` / `DataType::Custom` during
the design pass; not needed in main).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joaoh82 joaoh82 merged commit a2ad2ad into main Apr 29, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant