Skip to content

feat(#1133): add Polars DataFrame and LazyFrame support#1152

Merged
lmeyerov merged 16 commits intomasterfrom
feat/issue-1133-polars
Apr 24, 2026
Merged

feat(#1133): add Polars DataFrame and LazyFrame support#1152
lmeyerov merged 16 commits intomasterfrom
feat/issue-1133-polars

Conversation

@lmeyerov
Copy link
Copy Markdown
Contributor

@lmeyerov lmeyerov commented Apr 19, 2026

Summary

Adds Polars DataFrame and LazyFrame support across all graphistry paths (closes #1133).
Stacked on PR #1148 (refactor/coerce-unification) — merge that first.

Phase 2 changes (Polars feature):

  • Engine.py: df_to_engine + resolve_engine handle polars.DataFrame / polars.LazyFrame via module-string gating (no speculative imports)
  • PlotterBase.py: maybe_polars(), _table_to_pandas, _table_to_arrow (with memoization + metadata stripping), _make_dataset guard
  • hyper_dask.py: inline coercion block replaced with df_to_engine call (DRY)
  • mypy.ini: [mypy-polars.*] ignore_missing_imports = True

Phase 3 changes (architecture fixes, committed to this branch):

  • to_pandas() was broken for all non-cuDF types (polars, spark, arrow, dask) — raised ValueError; now delegates to df_to_engine uniformly
  • _coerce_to_pandas: renamed inner _is_cudf_is_gpu_engine with doc comment explaining the dask_cudf substring match is intentional
  • compute/ast.py: synced execute() method present in local branch but missing from dgx-spark repo

Tests added:

  • graphistry/tests/test_polars.py: TestPolarsInternals, TestPolarsCompute, TestPolarsHopChain (hop + chain + gfql + to_pandas), TestPolarsHypergraph
  • graphistry/tests/test_engine_coercion.py: TestToPandas (all input types), TestChainCoercion (arrow + polars + dask end-to-end)

Test plan

  • dgx-spark RAPIDS 25.02: 55 passed, 1 skipped
  • dgx-spark RAPIDS 26.02: 55 passed, 1 skipped
  • Local CPU: 15 passed, 18 skipped (optional deps absent)
  • CI green

🤖 Generated with Claude Code

@lmeyerov lmeyerov force-pushed the feat/issue-1133-polars branch from 51c6c8e to 34362a6 Compare April 19, 2026 20:22
@lmeyerov lmeyerov force-pushed the refactor/coerce-unification branch from 56ae458 to 421f898 Compare April 21, 2026 13:51
@lmeyerov lmeyerov force-pushed the feat/issue-1133-polars branch 10 times, most recently from 92c1048 to 9531f65 Compare April 23, 2026 02:09
@lmeyerov lmeyerov force-pushed the refactor/coerce-unification branch from 63d245a to 4dbb2fb Compare April 24, 2026 04:01
@lmeyerov lmeyerov force-pushed the feat/issue-1133-polars branch from eab8a78 to c94e7d0 Compare April 24, 2026 04:02
Base automatically changed from refactor/coerce-unification to master April 24, 2026 06:22
lmeyerov and others added 14 commits April 23, 2026 23:24
pl.DataFrame and pl.LazyFrame now work in plot(), materialize_nodes(),
get_degrees(), get_indegrees(), get_outdegrees(), and hypergraph() without
crashing. Polars is an optional dep — no behavior change when not installed.

Changes:
- PlotterBase: maybe_polars() lazy import; _polars_hash_to_arrow memoization
  cache; _table_to_arrow() Polars branch (collect + to_arrow, metadata stripped);
  _table_to_pandas() Polars branch; _plot_dispatch() type guard
- Engine.py: resolve_engine() Polars → Engine.PANDAS branch
- ComputeMixin: _coerce_to_pandas() Polars arm (DataFrame + LazyFrame)
- hyper_dask: inline coerce block Polars arm
- test_polars.py: 17 tests — internals (_table_to_arrow/_table_to_pandas,
  memoization, metadata stripping), compute (DataFrame + LazyFrame, mixed
  pandas/polars), hypergraph (DataFrame + LazyFrame, entity_types filter,
  parity with pandas)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… cleanly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- hyper_dask: replace inline coercion block with df_to_engine (DRY — was duplicating Arrow/Spark/Polars handling that df_to_engine already owns)
- resolve_engine: Polars check now uses module-string gating consistent with df_to_engine
- test_engine_coercion: add df_to_engine Polars DataFrame/LazyFrame tests, _coerce_to_pandas Polars tests
- test_polars: add hop/chain end-to-end tests with Polars and LazyFrame edges; remove unused pd_mod import

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ain tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mismatch on dgx-spark

hop() tests already cover the coerce-at-boundary path; _coerce_to_pandas
unit tests in test_engine_coercion.py cover the chain entry point directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…gfql coercion tests

- to_pandas() raised ValueError for non-cuDF types (polars, spark, arrow, dask);
  delegate to df_to_engine so all input formats are handled uniformly
- rename _is_cudf → _is_gpu_engine with doc comment explaining dask_cudf is
  intentionally caught by the 'cudf' substring check
- add end-to-end chain()/gfql() tests with Arrow and Polars inputs to
  test_engine_coercion.py (TestChainCoercion) and test_polars.py
- add to_pandas() round-trip tests for Polars DataFrame and LazyFrame

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d combine_steps edge-case tests

- chain.py: remove redundant `resolve_engine as _resolve_engine` alias (resolve_engine already
  imported at module top)
- test_engine_coercion.py: add TestCombineStepsEdgeCases covering two previously untested paths:
  - output_max_hops + has_na (isin([]) accumulation in combine_steps)
  - named node + e_undirected (df_concat(engine) undirected allowed_ids path)
  Both paths tested for pandas (CPU) and cuDF (GPU, skipUnless HAS_CUDF)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…Plottable.hop, normalize engine str in chain()

- Duplicate class left by rebase conflict resolution caused F811 ruff error
- Plottable.hop() missing engine kwarg caused mypy call-arg error from ast.py
- chain() calling resolve_engine(engine) with Union[EngineAbstract|str] before
  normalizing caused mypy arg-type error; normalize str→EngineAbstract first

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…xclude config

Duplicate INI section caused mypy to ignore the [mypy] exclude pattern, making
it check all 367 test files instead of 223 source files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dule-level classes

- PlotterBase.py: except ImportError: 1 -> pass (no-op typo)
- test_polars.py: replace if/else block with module-level classes + @skipUnless;
  pytest/unittest now discover tests correctly when polars is absent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
TestDbgDf — None/pd/arrow/no-len; TestSNa — PANDAS returns pd.NA, DASK raises;
TestDfToEngineDask — pandas/arrow→dask, identity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ing updates

- CI: add test-polars job (py3.9–3.14) with lockfile profile and bin/test-polars.sh;
  add 'polars' extra to setup.py so polars tests actually run in CI (I-1)
- df_executor.py: move _otel_attrs() collection after engine fallback so
  gfql.engine span attribute reflects actual execution engine, not requested (S-2)
- Engine.py resolve_engine: wrap polars import in try/except ImportError matching
  the pyspark pattern above it (S-3)
- PlotterBase.py maybe_polars: use '1' instead of 'pass' for ImportError handler,
  consistent with all other maybe_*() functions (S-6)
- PlotterBase.py _table_to_arrow: add comment that validate_mode is not applied
  for polars (strictly typed, mixed-type columns impossible) (S-5)
- PlotterBase.py docstrings: add polars/spark to _table_to_pandas and
  _table_to_arrow type summaries (S-1)
- CHANGELOG.md: add CI/Polars infrastructure entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lmeyerov lmeyerov force-pushed the feat/issue-1133-polars branch from 6af8360 to 0c00a9f Compare April 24, 2026 06:24
lmeyerov and others added 2 commits April 23, 2026 23:25
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…s to compute/

Mirrors source layout: all three test files cover graphistry/compute/ code
and belong under graphistry/tests/compute/ per the existing convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@lmeyerov lmeyerov merged commit 9c89124 into master Apr 24, 2026
135 checks passed
@lmeyerov lmeyerov deleted the feat/issue-1133-polars branch April 24, 2026 06:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Add Polars DataFrame support

1 participant