Skip to content

feat(gfql): add schema Arrow boundary validation#1635

Merged
lmeyerov merged 1 commit into
masterfrom
feature/issue-1339-schema-arrow-boundary
May 25, 2026
Merged

feat(gfql): add schema Arrow boundary validation#1635
lmeyerov merged 1 commit into
masterfrom
feature/issue-1339-schema-arrow-boundary

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov commented May 25, 2026

Closes #1339.
Refs #1058.

Summary

  • Add experimental public schema↔Arrow import/export APIs:
    • NodeType.from_arrow(...)
    • EdgeType.from_arrow(...)
    • EdgeTopology.from_metadata(...)
    • GraphSchema.node_arrow(), edge_arrow(), to_arrow(), from_arrow(...)
  • Add opt-in bound-schema Arrow boundary validation/coercion:
    • g.to_arrow(..., schema_validate='strict'|'autofix', schema_table='edges'|'nodes')
    • g.plot(..., schema_validate='strict'|'autofix')
    • g.upload(..., schema_validate='strict'|'autofix')
    • g.validate_arrow_schema('edges'|'nodes', validate='strict'|'autofix')
  • Preserve default behavior: schema_validate=False by default, so existing plot()/upload()/to_arrow() calls are unchanged.
  • Document the Arrow boundary recipe and update the changelog.

Design Notes

  • This is declaration import/export, not inference. GraphSchema.from_arrow(...) requires named node/edge declarations and edge topology, either directly or from the payload emitted by GraphSchema.to_arrow().
  • This does not implement gfql_remote() schema transport; GFQL remote: send bound typed GraphSchema with gfql_remote requests #1465 remains the remote follow-on.
  • Boundary enforcement runs after the existing pandas/cuDF-to-Arrow conversion path, so it composes with validate='autofix'.
  • schema_validate='strict' rejects missing declared columns, incompatible Arrow types, and non-null violations.
  • schema_validate='autofix' casts compatible Arrow columns to declared Arrow types after normal conversion.
  • Active node/edge type fragments are selected by true label__... markers so inactive relationship types do not force their properties.
  • Multi-type table validation treats non-nullability as type-local, avoiding false failures on rows for other active labels/types.
  • String/large-string and binary/large-binary are treated as strict-compatible logical families; autofix still casts to the declared physical Arrow type.
  • Experimental marking from feat(gfql): add public declarative schema model #1457 is preserved.

Coordination

LOC Buckets

  • Production: +683 / -27, net +656 LOC
  • Tests: +180 / -2, net +178 LOC
  • Docs / changelog: +52 / -4, net +48 LOC

Validation

  • Rebased over origin/master@79a61f9ab7d3efa4984007c0ae30660adbd6df4e; PR head 76065d0b4f20034ced71fd49485d94346323f662
  • python3 -m pytest -q graphistry/tests/test_plotter.py::TestPlotterArrowConversions graphistry/tests/compute/gfql/test_public_schema.py -> 50 passed, 6 skipped, 1 xfailed
  • python3 -m pytest -q docs/test_doc_examples.py -k schema -> 1 passed, 29 deselected
  • python3 -m pytest -q graphistry/tests/compute/gfql/test_public_schema.py -> 21 passed
  • ./bin/ruff.sh graphistry/schema.py graphistry/PlotterBase.py graphistry/Plottable.py graphistry/models/types.py graphistry/exceptions.py graphistry/tests/compute/gfql/test_public_schema.py -> pass
  • ./bin/typecheck.sh graphistry/schema.py graphistry/PlotterBase.py graphistry/Plottable.py graphistry/models/types.py graphistry/exceptions.py -> pass
  • CI typing follow-up: uvx --from mypy==1.14.1 mypy graphistry/PlotterBase.py -> pass after py3.8 mypy-compatible Arrow column variable/cast fix
  • git diff --check origin/master...HEAD -> pass
  • DGX RAPIDS 26.02: RAPIDS_VERSION=26.02 PROFILE=basic WITH_GPU=1 WITH_IMAGE_BUILD=0 TEST_FILES="graphistry/tests/test_plotter.py::TestPlotterArrowConversions graphistry/tests/compute/gfql/test_public_schema.py" docker/test-rapids-official-local.sh -> cudf 26.02.01, cuml 26.02.00, Python 3.13.12, 56 passed, 1 xfailed
  • DGX RAPIDS 25.02: RAPIDS_VERSION=25.02 PROFILE=basic WITH_GPU=1 WITH_IMAGE_BUILD=0 TEST_FILES="graphistry/tests/test_plotter.py::TestPlotterArrowConversions graphistry/tests/compute/gfql/test_public_schema.py" docker/test-rapids-official-local.sh -> cudf 25.02.02, cuml 25.02.01, Python 3.12.9, 56 passed, 1 xfailed
  • Review skill converged:
    • Wave 1 fixed type-local nullability for heterogeneous tables
    • Wave 2 fixed merged Arrow schema metadata preservation
    • Waves 3 and 4 clean

Pending / CI

  • Full PR CI pending.

@lmeyerov lmeyerov force-pushed the feature/issue-1339-schema-arrow-boundary branch 3 times, most recently from 18d4ec4 to a8c9f09 Compare May 25, 2026 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GFQL type system follow-on C: public schema-Arrow APIs + plottable boundary enforcement

1 participant