Merged
Conversation
Adds the JSON storage class and four path-aware query functions, closing the second of Phase 7's two storage primitives (the first was VECTOR(N) in 7a). Shape mirrors SQLite's JSON1 extension — JSON values store as canonical UTF-8 text, validated via `serde_json::from_str` at INSERT and UPDATE time. Phase 7 plan Q3 originally proposed bincoded `serde_json:: Value`, but bincode was removed from the engine in Phase 3c (cell-based encoding replaced it); rather than re-add bincode for one column type, JSON-as-text matches SQLite's choice and reuses the existing Text storage path. Q3 in `docs/phase-7-plan.md` records the scope correction inline. Engine surface: - `DataType::Json` variant alongside `Vector(N)`. `JSONB` parses as an alias (Postgres convention; both store as text in our case). - INSERT/UPDATE on a JSON column runs `serde_json::from_str::<Value>`; malformed JSON is rejected with `Type mismatch: expected JSON for column 'foo': <serde error>`. NULLs pass through untouched. - UNIQUE on a JSON column treats the value as raw text (string equality on the canonical form). - `table_to_create_sql` round-trips JSON columns; `build_empty_table`, `Row::Text(BTreeMap::new())` storage, and the `clone_datatype` helpers in `executor.rs` and `pager/mod.rs` all gained the new arm. Functions (executor.rs, ~370 LOC): - `json_extract(json[, path])` — walks the path, returns the resolved node coerced to the closest SQL type. Strings → TEXT, numbers → INTEGER/REAL, booleans → BOOLEAN, `null` → NULL, composites (object/array) → canonical JSON text. - `json_type(json[, path])` — returns one of `'object'`, `'array'`, `'string'`, `'integer'`, `'real'`, `'true'`, `'false'`, `'null'`. - `json_array_length(json[, path])` — element count; errors if the resolved node isn't an array. - `json_object_keys(json[, path])` — keys as a JSON-array text in insertion order (e.g. `'["a","b","c"]'`). Diverges from SQLite's table-valued shape — we have no set-returning functions yet, so the array-text form is the cleanest analog. Documented in supported-sql.md. Path subset implemented in `walk_json_path`: `$` (root, default), `.key` (bare-key object access), `[N]` (0-based array index), chained. A malformed path (unbalanced brackets, missing `$`) is a typed runtime error; a well-formed path that doesn't resolve returns NULL. The path parser shares a single string-walking pass — small enough to inline, not enough surface to deserve its own module yet. Object-key insertion order is preserved end-to-end via `serde_json`'s `preserve_order` feature, so `json_object_keys` output is stable across re-parses. Tests: 11 new (10 in `src/sql/mod.rs`, 1 round-trip-after-save in `src/sql/pager/mod.rs`): - json_column_round_trip_primitive_values - json_insert_rejects_invalid_json - json_extract_object_field - json_extract_array_index_and_nested - json_extract_missing_path_returns_null - json_extract_malformed_path_errors - json_array_length_on_array - json_array_length_on_non_array_errors - json_type_recognizes_each_kind (covers all 7 types) - update_on_json_column_revalidates - round_trip_preserves_json_column 245 engine tests + 9 connection + 8 FFI + 4 doctests pass. `cargo clippy` clean across the new code (existing pre-Phase-7e warnings unchanged — same as Phase 6b retrospective). Docs: - `docs/roadmap.md` — 7e bullet flipped to ✅ with the corrected storage description (canonical text, not bincoded). - `docs/supported-sql.md` — JSON / JSONB row added to the column-types table; four `json_*` functions documented in the built-in functions table; new "JSON path syntax" subsection covers the supported tokens with a runnable example. - `docs/phase-7-plan.md` — Q3 scope correction recorded inline; 7e section header marked ✅. Throwaway `examples/rust/sqlparser_probe.rs` deleted along with its `[[example]]` entry in `Cargo.toml` (the file was used to inspect sqlparser's AST shapes for `DataType::JSON` / `DataType::Custom` during the design pass; not needed in main). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
JSON(andJSONBalias) column type — stored as canonical UTF-8 text, validated viaserde_json::from_strat INSERT/UPDATE time. Same shape as SQLite's JSON1 extension.json_extract/json_type/json_array_length/json_object_keys, with a small JSONPath subset ($,.key,[N], chained).Why text-storage instead of bincoded
serde_json::Value?Phase 7 plan Q3 originally proposed bincode, but bincode was removed from the engine in Phase 3c when cell-based encoding replaced it. Rather than re-add bincode for one column type, JSON-as-canonical-text matches SQLite and reuses the existing Text storage path. The scope correction is recorded inline in
docs/phase-7-plan.md.What landed
DataType::Jsonvariant;JSONBparses as an aliasserde_json::from_str::<Value>; malformed JSON rejected with a typed error before any writejson_extract,json_type,json_array_length,json_object_keys(~370 LOC insrc/sql/executor.rs)walk_json_path:$,.key,[N], chained. Well-formed but unresolved →NULL; malformed → typed errorround_trip_preserves_json_column.Notable divergence from SQLite
json_object_keysis a table-valued function in real SQLite (one row per key). SQLRite has no set-returning functions yet, so we return the keys as a JSON-array text (e.g.'[\"a\",\"b\",\"c\"]'). Documented insupported-sql.md.Test plan
cargo fmt --all -- --check— cleancargo check --workspace --exclude sqlrite-desktop --all-targets— cleancargo test --workspace --exclude sqlrite-desktop --exclude sqlrite-python --exclude sqlrite-nodejs— 245 + 9 + 8 + 4 passcargo clippy --workspace --exclude sqlrite-desktop --all-targets— 0 errors, 38 pre-existing warnings (none new)cargo run --example quickstartstill buildsCREATE TABLE events (id INTEGER PRIMARY KEY, payload JSON), INSERT both well-formed and malformed JSON, runjson_extract/json_type/json_array_length/json_object_keysagainst the well-formed row, save + reopen + repeatDocs touched
docs/roadmap.md— 7e flipped to ✅ with corrected storage descriptiondocs/supported-sql.md— JSON column type added to types table; four new function rows; new "JSON path syntax" subsection with runnable exampledocs/phase-7-plan.md— Q3 scope correction recorded inlineCleanup
Throwaway
examples/rust/sqlparser_probe.rs(used to inspect sqlparser'sDataType::JSON/DataType::CustomAST shapes during design) deleted along with its[[example]]entry inCargo.toml.Next up after merge
Cut
v0.1.16Release PR viarelease-pr.ymldispatch, then pick from:ask()API across the product surface — natural-language → SQL via Anthropic)sqlrite-mcpbinary)<-><=><#>)🤖 Generated with Claude Code