Skip to content

feat(oracledb): native JSON, VECTOR ergonomics, smart LOB coercion#430

Merged
cofin merged 18 commits intomainfrom
feat/oracle-type-coercion-overhaul
Apr 27, 2026
Merged

feat(oracledb): native JSON, VECTOR ergonomics, smart LOB coercion#430
cofin merged 18 commits intomainfrom
feat/oracle-type-coercion-overhaul

Conversation

@cofin
Copy link
Copy Markdown
Member

@cofin cofin commented Apr 26, 2026

Summary

Overhauls Oracle's type coercion path so the common cases just work and the
uncommon cases have an explicit escape hatch.

Native JSON

  • 21c+ binds Python dict / list directly via DB_TYPE_JSON (binary OSON);
    19c-20c falls back to BLOB CHECK (... IS JSON); pre-19c uses
    CLOB CHECK (... IS JSON). The right path is picked from the server's
    major version, cached on the connection.
  • The default JSON serializer strategy is now "driver" so the binary path
    isn't skipped by an upstream string serialization.
  • Output side: DB_TYPE_JSON columns return Python objects as-is, and BLOB /
    CLOB columns whose type_name includes JSON are auto-parsed.

VECTOR ergonomics (Oracle 23ai)

  • list[float], list[int], tuple[...], array.array, and np.ndarray
    all bind to DB_TYPE_VECTOR with no flag toggle. Integer sequences in the
    int8 range pack as int8; everything else falls back to float32.
  • New vector_return_format driver feature ("numpy" / "list" / "array")
    controls how VECTOR reads materialize. Defaults to "numpy" when NumPy is
    installed, "list" otherwise. Errors loudly if "numpy" is requested
    without NumPy.
  • Module renamed from _numpy_handlers to _vector_handlers to reflect
    the broader payload coverage. Public API (numpy_converter_in, etc.)
    is unchanged.

Smart LOB coercion

  • New typed wrappers — OracleClob, OracleBlob, OracleJson — let users
    bypass the size heuristics when they want explicit control. OracleClob(bytes)
    decodes utf-8 before binding; OracleBlob(str) encodes utf-8;
    OracleJson(...) defers to the JSON handler chain so the value never gets
    coerced into a CLOB intermediary.
  • Wrappers work for both named ({"col": OracleClob(...)}) and positional
    ((1, OracleClob(...))) bind shapes.
  • The 4000 / 2000 byte thresholds are now driver_features settings —
    oracle_varchar2_byte_limit and oracle_raw_byte_limit — so users on
    databases with MAX_STRING_SIZE=EXTENDED can opt into 32767-byte VARCHAR2
    without auto-coercion to CLOB.

cofin added 18 commits April 26, 2026 15:13
Adds sqlspec/adapters/oracledb/_json_handlers.py implementing:

- json_converter_in_clob / json_converter_in_blob — serialize Python
  dict/list/tuple to JSON string / UTF-8 bytes for CLOB / BLOB binding.
- json_converter_out_clob / json_converter_out_blob — parse JSON read
  back into native Python values.
- json_input_type_handler — column-aware routing across server versions:
  21c+ → DB_TYPE_JSON (binary OSON), 19c-20c → DB_TYPE_BLOB+OSON,
  12c-18c → DB_TYPE_CLOB+JSON-string. Reads major from cursor connection
  attribute _sqlspec_oracle_major; defaults to 21c+ when unset.
- json_output_type_handler — passthrough for native DB_TYPE_JSON; claims
  BLOB / CLOB columns whose type_name carries JSON for round-trip parse.
- register_json_handlers — chaining-aware install (preserves any existing
  inputtypehandler / outputtypehandler via wrapper classes).

The input handler explicitly does NOT claim list[float] / tuple[float, ...]
so the vector handler retains ownership of embedding payloads.

Re-exports the public surface in sqlspec/adapters/oracledb/__init__.py.

Tests: 36 new unit tests in tests/unit/adapters/test_oracledb/
test_json_handlers.py covering converters, input dispatch table across
server majors, output passthrough/claim/ignore matrix, chaining, and
round-trip integrity.

Beads: closes C1.T1 (sqlspec-ffm), C1.T5 (sqlspec-3la), C1.T6 (sqlspec-ci4).
Part of sqlspec-aa9 (C1: Native JSON binding pipeline) under sqlspec-i6j.
Activates the JSON handlers from the prior commit by installing them at
session-callback time and flipping the parameter profile so dict / list
parameters are no longer pre-serialised at parameter prep.

config.py:
- Adds _extract_oracle_major(connection) helper that parses the leading
  digit of connection.version into an int (None when unavailable).
- Both sync and async _init_connection now register JSON handlers
  unconditionally (no driver-feature flag — native DB_TYPE_JSON is the
  correct path on every supported Oracle), and stash the major version
  on the connection as _sqlspec_oracle_major so the handler avoids
  per-bind metadata queries.
- Imports register_json_handlers from sqlspec.adapters.oracledb._json_handlers.

core.py:
- DriverParameterProfile.json_serializer_strategy flips from "helper" to
  "driver". The "helper" strategy installed to_json into type_coercion_map
  for dict/list/tuple, which pre-serialised every JSON parameter to a
  string before binding. "driver" mode passes the encoder/decoder onto
  ParameterStyleConfig but leaves dict/list payloads intact, so they
  reach cursor.execute and the JSON inputtypehandler claims them.
- requires_session_callback now returns True unconditionally (the JSON
  handler registration must always fire) and ignores driver_features.

Bundles unrelated dep drift the worktree was already carrying:
- ruff pre-commit pin v0.15.11 → v0.15.12.
- click-extra 7.13.0 → 7.14.0 (uv.lock).

Verification: 1212 unit tests + 4 skipped, ruff + mypy clean.

Beads: closes C1.T2 (sqlspec-dzp), C1.T3 (sqlspec-b5f).
Part of sqlspec-aa9 (C1: Native JSON binding pipeline) under sqlspec-i6j.
Verifies the C1 contract end-to-end against an Oracle 23ai container:

- dict payloads round-trip bit-identical through native JSON columns —
  no createlob workaround needed regardless of payload size.
- list[dict] payloads round-trip.
- Large dicts (>4000 bytes serialised, ~8000-byte string + 500-elem int
  list) bind via DB_TYPE_JSON, proving the helper-strategy CLOB-coercion
  path is no longer triggered for native JSON columns.
- bool / None / int / nested list / nested dict survive round-trip.
- executemany over multiple dicts.
- Sync driver parity.

Float values in native JSON columns come back as decimal.Decimal per
python-oracledb's default OSON-numeric coercion; tracked as a separate
follow-up concern (sqlspec-ohv).

Verification: 6 new integration tests pass; 32 adjacent integration
tests (test_msgspec_clob, test_numpy_vectors, test_uuid_binary) still
pass — no regressions in CLOB / vector / UUID handler chains.

Beads: closes C1.T7 (sqlspec-5lo).
Part of sqlspec-aa9 (C1: Native JSON binding pipeline) under sqlspec-i6j.
The handlers cover all DB_TYPE_VECTOR-bound Python sequences (ndarray,
array.array, list, tuple), not just NumPy arrays. Renaming the module
reflects the broader scope ahead of C3.T2/T3, which extend the input
handler to claim list/tuple sequences and add vector_return_format
routing. Public symbols keep their numpy_* prefix; the user-facing
rename is a separate follow-up.

DTYPE_TO_ARRAY_CODE gains int16 ('h') and int32 ('i') unconditionally
and float16 ('e') on Python 3.13+ (where array.array first accepted
the typecode). Per-typecode names are hoisted to module constants so
PLR2004 magic-value checks do not fire when downstream tasks use them
in the input handler's dispatch.

Internal imports in __init__.py, config.py, and type_converter.py move
to the new path; the package re-exports the module under the new
``vector_handlers`` alias and drops the old ``numpy_handlers`` alias.
Adds vector_return_format to OracleDriverFeatures with the documented
"numpy" (NumPy installed) / "list" (otherwise) default. Wired through
apply_driver_features so downstream session-callback code can consume
the value without re-deriving the policy. Unblocks C3.T2/T3 which
read this field at handler-registration time.

- core.py: setdefault("vector_return_format", "numpy" if NUMPY_INSTALLED else "list")
- config.py: NotRequired[str] field + docstring entry
- tests/unit/adapters/test_oracledb/test_core_driver_features.py: 5 new tests
…ispatch

Closes the C3 vector ergonomics gap: list[float] / tuple[float] / list[int]
embeddings from LLM clients now bind directly to DB_TYPE_VECTOR with no
manual conversion, and the read path dispatches to numpy / list / array via
a per-connection vector_return_format setting.

C3.T2 — _input_type_handler: claims numpy.ndarray (existing path),
array.array (passthrough), and list/tuple of int|float (auto-pack as int8
when entirely in [-128, 127] else float32). Bool sequences and list[dict]
fall through so the JSON handler can claim them.

C3.T3 — _output_type_handler: reads connection._sqlspec_vector_return_format
and dispatches to numpy_converter_out / list / passthrough. RuntimeError
when "numpy" requested without numpy installed; ValueError on invalid format.

C3.T4 — config._init_connection (sync + async): stashes
vector_return_format on the connection alongside the C1 oracle_major
cache. register_numpy_handlers now runs unconditionally so pure-Python
list[float] binds work without enable_numpy_vectors=True.

Tests: new tests/unit/adapters/test_oracledb/test_vector_handlers.py
(18 tests covering the dispatch matrix). test_numpy_handlers.py tests
that asserted "skip register when NUMPY_INSTALLED=False" updated to the
always-register policy. Full unit suite: 140 passed.
…nding (C2.T1)

New module `sqlspec.adapters.oracledb._param_types` adds three slot-based
wrapper classes — OracleClob, OracleBlob, OracleJson — that let users
override the size-based heuristics in coerce_large_parameters_*. The
wrappers are pure containers (no validation in __init__); type discipline
moves to the T2 routing site where errors can carry database context.

Per chapter-2/spec.md §3 T1. Sets the import surface for T2's
wrapper-aware coercion and T6's __init__.py re-export.
coerce_large_parameters_sync / _async now route Oracle{Clob,Blob,Json}
wrappers ahead of the size-based fallback so power users can express
explicit type intent. OracleClob(bytes) decodes utf-8; OracleBlob(str)
encodes utf-8; OracleJson unwraps so the C1 input handler claims the
value.

Adds OracleBlob/OracleClob/OracleJson to the package re-export surface
and 12 unit tests (6 sync + 6 async) covering the wrapper paths plus
the user-configurable threshold-override path that T3 will wire up.
The new _param_types.py (C2.T1) defines three slot-based wrapper
classes that sit on the parameter-binding hot path for every Oracle
execute. Pure-Python with no conditional imports or metaclass tricks —
clean mypyc target.

Audit of the existing C1/C3 handler modules (_json_handlers,
_vector_handlers, _uuid_handlers) for similar inclusion is tracked as
sqlspec-llu (needs a wheel-build verification pass).
…/T4/T5)

The 4000/2000-byte thresholds previously baked in as module constants
in driver.py now live in driver_features. dispatch_execute (sync +
async) reads oracle_varchar2_byte_limit / oracle_raw_byte_limit from
self.driver_features.get(...) so MAX_STRING_SIZE=EXTENDED databases
can opt into 32767-byte VARCHAR2.

apply_driver_features fills the defaults so the dispatch fallback
path is a one-time bootstrap concern, not a per-call default. The
OracleDriverFeatures TypedDict advertises both fields and the
docstring documents the EXTENDED scenario.

6 new unit tests cover the defaults, user-override preservation, and
TypedDict surface.
…2.T8)

Seven cases against the Oracle 23ai container exercise the wrapper-aware
routing landed in C2.T2: OracleClob/OracleBlob/OracleJson round-trip
(async + sync), the C1 native-JSON handler claiming OracleJson without
a CLOB intermediary, the demo's bytes-payload workaround replacement,
and the threshold-override path skipped on non-EXTENDED containers.

Discovered during testing that the wrappers only fire for dict-style
parameters; positional binds bypass coerce_large_parameters_* entirely
and reach python-oracledb raw, raising DPY-3002. Tests use named binds
to match the documented contract; the positional path is filed as a
follow-up.

6 pass on Oracle 23ai (STANDARD), 1 EXTENDED-only test skipped.
…205)

coerce_large_parameters_sync / _async previously short-circuited on
anything that wasn't a dict, leaving Oracle{Clob,Blob,Json} wrappers
inside positional tuples / lists to reach python-oracledb raw and
raise DPY-3002. Routing now extends to tuple and list parameters via
the new _coerce_value_sync / _coerce_value_async helpers shared with
the dict path.

Tuples are returned as new lists when iterated — the driver's existing
cast(..., 'list[Any] | tuple[Any, ...] | dict[Any, Any] | None') keeps
this contract. The pre-existing identity-based passthrough assertion
on lists is updated to value-equality (the new path always returns a
fresh list when iterating).

13 unit tests cover OracleClob/Blob/Json + plain str threshold +
empty-tuple short-circuit across sync and async; 2 integration tests
verify the path end-to-end against Oracle 23ai.
…llu)

Centralizes the runtime oracledb constants (DB_TYPE_BLOB / CLOB / JSON /
RAW) in _typing.py — already excluded from mypyc — and rewires the four
handler modules to import from there instead of doing lazy import oracledb
inside every function. Removes a per-call import lookup on the input /
output handler hot path and makes the handler modules pure mypyc targets.

Adds the three handler modules to the mypyc include glob alongside the
already-compiled _param_types.py. Wheel build with HATCH_BUILD_HOOKS_ENABLE=1
produces a cp310-linux_x86_64 wheel cleanly; 190 unit tests pass under both
interpreted and compiled imports.

Other adapters (asyncpg / psycopg / psqlpy) handle vectors via pgvector's
register_vector inside type_converter.py, which is already in the
sqlspec/adapters/**/type_converter.py glob — no parallel gap there.
Extract rows[0] to a local variable so isinstance narrowing applies to
the dict/sequence branch — pyright otherwise re-evaluates rows[0] and
loses the narrowing.
PYTHONWARNINGS scoped to google.adk.features._feature_decorator so the
[EXPERIMENTAL] PLUGGABLE_AUTH notice doesn't leak into our lint output.
The authlib.jose deprecation can't be suppressed via env vars — authlib
calls warnings.simplefilter("always", AuthlibDeprecationWarning) in its
deprecate module which resets the filter list at import time. Will
disappear when authlib ships 2.0.
Upstream regression: mysql-connector-python 9.7.0 dropped cp312/cp313/
cp314 wheels (9.6.0 had all five ABIs). CI on cp312 fails with "doesn't
have a source distribution or wheel for the current platform".

- Add per-Python-version markers on the mysql-connector extra so 3.10/
  3.11 stay unconstrained while 3.12+ caps below 9.7.
- Add a [tool.uv] override-dependencies entry so the same cap applies
  to the transitive pull through pytest-databases[mysql].
@cofin cofin merged commit 258cb64 into main Apr 27, 2026
16 checks passed
@cofin cofin deleted the feat/oracle-type-coercion-overhaul branch April 27, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant