feat(#1133): add Polars DataFrame and LazyFrame support#1152
Merged
feat(#1133): add Polars DataFrame and LazyFrame support#1152
Conversation
51c6c8e to
34362a6
Compare
4 tasks
56ae458 to
421f898
Compare
92c1048 to
9531f65
Compare
63d245a to
4dbb2fb
Compare
eab8a78 to
c94e7d0
Compare
pl.DataFrame and pl.LazyFrame now work in plot(), materialize_nodes(), get_degrees(), get_indegrees(), get_outdegrees(), and hypergraph() without crashing. Polars is an optional dep — no behavior change when not installed. Changes: - PlotterBase: maybe_polars() lazy import; _polars_hash_to_arrow memoization cache; _table_to_arrow() Polars branch (collect + to_arrow, metadata stripped); _table_to_pandas() Polars branch; _plot_dispatch() type guard - Engine.py: resolve_engine() Polars → Engine.PANDAS branch - ComputeMixin: _coerce_to_pandas() Polars arm (DataFrame + LazyFrame) - hyper_dask: inline coerce block Polars arm - test_polars.py: 17 tests — internals (_table_to_arrow/_table_to_pandas, memoization, metadata stripping), compute (DataFrame + LazyFrame, mixed pandas/polars), hypergraph (DataFrame + LazyFrame, entity_types filter, parity with pandas) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… cleanly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- hyper_dask: replace inline coercion block with df_to_engine (DRY — was duplicating Arrow/Spark/Polars handling that df_to_engine already owns) - resolve_engine: Polars check now uses module-string gating consistent with df_to_engine - test_engine_coercion: add df_to_engine Polars DataFrame/LazyFrame tests, _coerce_to_pandas Polars tests - test_polars: add hop/chain end-to-end tests with Polars and LazyFrame edges; remove unused pd_mod import Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ain tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mismatch on dgx-spark hop() tests already cover the coerce-at-boundary path; _coerce_to_pandas unit tests in test_engine_coercion.py cover the chain entry point directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gfql coercion tests - to_pandas() raised ValueError for non-cuDF types (polars, spark, arrow, dask); delegate to df_to_engine so all input formats are handled uniformly - rename _is_cudf → _is_gpu_engine with doc comment explaining dask_cudf is intentionally caught by the 'cudf' substring check - add end-to-end chain()/gfql() tests with Arrow and Polars inputs to test_engine_coercion.py (TestChainCoercion) and test_polars.py - add to_pandas() round-trip tests for Polars DataFrame and LazyFrame Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d combine_steps edge-case tests - chain.py: remove redundant `resolve_engine as _resolve_engine` alias (resolve_engine already imported at module top) - test_engine_coercion.py: add TestCombineStepsEdgeCases covering two previously untested paths: - output_max_hops + has_na (isin([]) accumulation in combine_steps) - named node + e_undirected (df_concat(engine) undirected allowed_ids path) Both paths tested for pandas (CPU) and cuDF (GPU, skipUnless HAS_CUDF) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Plottable.hop, normalize engine str in chain() - Duplicate class left by rebase conflict resolution caused F811 ruff error - Plottable.hop() missing engine kwarg caused mypy call-arg error from ast.py - chain() calling resolve_engine(engine) with Union[EngineAbstract|str] before normalizing caused mypy arg-type error; normalize str→EngineAbstract first Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xclude config Duplicate INI section caused mypy to ignore the [mypy] exclude pattern, making it check all 367 test files instead of 223 source files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dule-level classes - PlotterBase.py: except ImportError: 1 -> pass (no-op typo) - test_polars.py: replace if/else block with module-level classes + @skipUnless; pytest/unittest now discover tests correctly when polars is absent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TestDbgDf — None/pd/arrow/no-len; TestSNa — PANDAS returns pd.NA, DASK raises; TestDfToEngineDask — pandas/arrow→dask, identity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ing updates - CI: add test-polars job (py3.9–3.14) with lockfile profile and bin/test-polars.sh; add 'polars' extra to setup.py so polars tests actually run in CI (I-1) - df_executor.py: move _otel_attrs() collection after engine fallback so gfql.engine span attribute reflects actual execution engine, not requested (S-2) - Engine.py resolve_engine: wrap polars import in try/except ImportError matching the pyspark pattern above it (S-3) - PlotterBase.py maybe_polars: use '1' instead of 'pass' for ImportError handler, consistent with all other maybe_*() functions (S-6) - PlotterBase.py _table_to_arrow: add comment that validate_mode is not applied for polars (strictly typed, mixed-type columns impossible) (S-5) - PlotterBase.py docstrings: add polars/spark to _table_to_pandas and _table_to_arrow type summaries (S-1) - CHANGELOG.md: add CI/Polars infrastructure entry Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6af8360 to
0c00a9f
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s to compute/ Mirrors source layout: all three test files cover graphistry/compute/ code and belong under graphistry/tests/compute/ per the existing convention. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Polars
DataFrameandLazyFramesupport across all graphistry paths (closes #1133).Stacked on PR #1148 (
refactor/coerce-unification) — merge that first.Phase 2 changes (Polars feature):
Engine.py:df_to_engine+resolve_enginehandlepolars.DataFrame/polars.LazyFramevia module-string gating (no speculative imports)PlotterBase.py:maybe_polars(),_table_to_pandas,_table_to_arrow(with memoization + metadata stripping),_make_datasetguardhyper_dask.py: inline coercion block replaced withdf_to_enginecall (DRY)mypy.ini:[mypy-polars.*] ignore_missing_imports = TruePhase 3 changes (architecture fixes, committed to this branch):
to_pandas()was broken for all non-cuDF types (polars, spark, arrow, dask) — raisedValueError; now delegates todf_to_engineuniformly_coerce_to_pandas: renamed inner_is_cudf→_is_gpu_enginewith doc comment explaining thedask_cudfsubstring match is intentionalcompute/ast.py: syncedexecute()method present in local branch but missing from dgx-spark repoTests added:
graphistry/tests/test_polars.py:TestPolarsInternals,TestPolarsCompute,TestPolarsHopChain(hop + chain + gfql + to_pandas),TestPolarsHypergraphgraphistry/tests/test_engine_coercion.py:TestToPandas(all input types),TestChainCoercion(arrow + polars + dask end-to-end)Test plan
🤖 Generated with Claude Code