graphistry · lmeyerov · Apr 24, 2026 · Apr 18, 2026 · Apr 18, 2026 · Apr 19, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -891,6 +891,48 @@ jobs:
         source pygraphistry/bin/activate
         ./bin/test-graphviz.sh
 
+  test-polars:
+    needs: [ test-minimal-python, test-gfql-core, generate-lockfiles ]
+    if: ${{ success() }}
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+
+    strategy:
+      matrix:
+        python-version: [3.9, '3.10', 3.11, 3.12, '3.13', '3.14']
+
+    steps:
+
+    - name: Checkout repo
+      uses: actions/checkout@v4
+      with:
+        lfs: true
+        persist-credentials: false
+
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v5
+      with:
+        python-version: ${{ matrix.python-version }}
+
+    - name: Download lockfiles
+      uses: actions/download-artifact@v4
+      with:
+        name: lockfiles
+        path: requirements
+
+    - name: Install Python dependencies
+      run: |
+        python -m venv pygraphistry
+        source pygraphistry/bin/activate
+        python -m pip install --upgrade pip uv
+        uv pip install --require-hashes -r requirements/test-polars-py${{ matrix.python-version }}.lock
+        uv pip install -e . --no-deps
+
+    - name: Polars tests
+      run: |
+        source pygraphistry/bin/activate
+        ./bin/test-polars.sh
+
   test-core-umap:
     needs: [ test-minimal-python, test-gfql-core, generate-lockfiles ]
     # Inherit condition from test-minimal-python
@@ -1220,7 +1262,7 @@ jobs:
     - name: Run Spark tests
       run: |
         source pygraphistry/bin/activate
-        python -B -m pytest graphistry/tests/test_df_types.py -v -k spark
+        python -B -m pytest graphistry/tests/compute/test_df_types.py -v -k spark
 
 
   test-neo4j:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -19,6 +19,10 @@ This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.htm
 - **Release docs / metadata**: Updated publish instructions to pull `master` in fast-forward-only mode (`git pull --ff-only origin master`), require a clean working tree before tagging (`git status --short` should be empty), push only the intended tag ref (`git push origin refs/tags/X.Y.Z`) instead of `--tags`/ambiguous ref pushes, clarified manual publish dispatch as maintainer-only recovery on `master`, added guidance to avoid rerunning already-published versions, and normalized legacy `pypi.python.org` links in `README.md` to `pypi.org`.
 - **CI / OIDC context tightening**: `publish-pypi.yml` now verifies repository/workflow identity via `GITHUB_REPOSITORY` + `GITHUB_WORKFLOW_REF` and enforces release-tag format checks before publish. `DEVELOP.md` now documents the required PyPI Trusted Publisher binding (`repository`, `workflow`, `environment`, and trusted refs) so external OIDC policy stays aligned with workflow constraints.
 
+### Added
+- **CI / Polars**: Added `test-polars` CI job (Python 3.9–3.14) with a dedicated `test-polars` lockfile profile; `polars` is now a named `setup.py` extra so the test matrix installs and exercises `test_polars.py` on every PR (#1133).
+- **Polars support**: `polars.DataFrame` and `polars.LazyFrame` now work in `plot()`, `materialize_nodes()`, `get_degrees()`, `get_indegrees()`, `get_outdegrees()`, and `hypergraph()`. Polars is an optional dependency — no behavior change when not installed. Upload path uses efficient Arrow conversion (`to_arrow()` with schema-metadata stripping and memoization); compute/hypergraph paths coerce to pandas at entry. `LazyFrame` is materialized via `.collect()` at each boundary. Adds `test_polars.py` with 17 tests; skips gracefully when polars is absent (#1133).
+
 ### Fixed
 - **GFQL / Cypher binder**: Replaced fragile regex-based WHERE label narrowing fallback in `_apply_where_label_narrowing` with AST-derived narrowing. `generic_where_clause` now lifts AND-joined bare label predicates (`WHERE n:Admin AND n:Active`) to structured `WhereClause.predicates` using the existing quote/bracket/paren/backtick-aware `_split_top_level_and_terms` helper; string-literal false-matches (e.g. `WHERE n.name = 'n:Admin'` incorrectly narrowing alias `n`) are closed by `fullmatch` anchoring. Removes `_WHERE_LABEL_RE` and `_WHERE_NON_CONJUNCTIVE_RE` from `binder.py`. Adds 10 targeted tests covering single/double/triple AND, multi-alias, multi-label-per-alias, lowercase `and`, XOR/OR/NOT conservative non-narrowing, mixed label+property all-or-nothing, and string-literal false-positive guards (#1125, #1193).
 - **DataFrame engine coercion**: Unified all DataFrame-to-engine conversion behind `df_to_engine()` with explicit dispatch for Arrow, Spark, dask, dask_cudf, cuDF, Polars, and pandas; unknown types now raise `ValueError` instead of silently calling `.to_pandas()`. `_coerce_input_formats(g, engine)` replaces `_coerce_to_pandas(g)` as the engine-aware coercion entry point in `chain()`, `hop()`, and `materialize_nodes()`, preserving GPU (cuDF) output when input is cuDF. `to_pandas()` now handles all input types via the same dispatch. Adds `test_engine_coercion.py` with 50+ tests (#1148).
@@ -2508,3 +2512,4 @@ Code that looks like `g.edges(some_fn, None, None, some_arg)` should now be like
 ### Changed
 - Removed deprecated docker test harness in favor of `docker/` - [#172](https://github.com/graphistry/pygraphistry/pull/172)
 
+
diff --git a/bin/generate-lockfiles.sh b/bin/generate-lockfiles.sh
@@ -34,6 +34,7 @@ PROFILE_DEFS=(
     "test-compat-latest:test,bolt,nodexl:3.14:3.14:--constraint /tmp/pandas-latest.txt"
     "test-compat-gfql-legacy:test:3.9:3.9:--constraint /tmp/pandas-legacy.txt"
     "test-compat-gfql-latest:test:3.14:3.14:--constraint /tmp/pandas-latest.txt"
+    "test-polars:test,polars:3.9::"
     "test-graphviz:test,pygraphviz:3.8::"
     "test-umap:test,testai,umap-learn:3.9::--no-emit-package torch"
     "test-ai:test,testai,ai:3.9::--no-emit-package torch --constraint /tmp/sentence-transformers-compat.txt"

diff --git a/bin/test-polars.sh b/bin/test-polars.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+set -ex
+
+# Run from project root
+# - Args get passed to pytest phase
+# Non-zero exit code on fail
+
+# Assume [polars,test] installed
+
+python -m pytest --version
+
+python -B -m pytest -vv \
+    graphistry/tests/compute/test_polars.py
diff --git a/graphistry/Engine.py b/graphistry/Engine.py
@@ -82,6 +82,14 @@ def resolve_engine(
         except ImportError:
             pass
 
+        if 'polars' in str(type(g_or_df).__module__):
+            try:
+                import polars as pl
+                if isinstance(g_or_df, (pl.DataFrame, pl.LazyFrame)):
+                    return Engine.PANDAS
+            except ImportError:
+                pass
+
         if 'cudf.core.dataframe' in str(getmodule(g_or_df)):
             has_cudf_dependancy_, _, _ = lazy_cudf_import()
             if has_cudf_dependancy_:

diff --git a/graphistry/PlotterBase.py b/graphistry/PlotterBase.py
@@ -142,6 +142,17 @@ def maybe_spark():
         logger.warning('Runtime error import pyspark: Available but failed to initialize', exc_info=True)
     return None
 
+@lru_cache(maxsize=1)
+def maybe_polars():
+    try:
+        import polars
+        return polars
+    except ImportError:
+        1
+    except RuntimeError:
+        logger.warning('Runtime error importing polars', exc_info=True)
+    return None
+
 # #####################################
 
 
@@ -165,13 +176,15 @@ class PlotterBase(Plottable):
 
     _pd_hash_to_arrow : WeakValueDictionary = WeakValueDictionary()
     _cudf_hash_to_arrow : WeakValueDictionary = WeakValueDictionary()
+    _polars_hash_to_arrow : WeakValueDictionary = WeakValueDictionary()
     _umap_param_to_g : WeakValueDictionary = WeakValueDictionary()
     _feat_param_to_g : WeakValueDictionary = WeakValueDictionary()
 
-    def reset_caches(self): 
+    def reset_caches(self):
         """Reset memoization caches"""
         self._pd_hash_to_arrow.clear()
         self._cudf_hash_to_arrow.clear()
+        self._polars_hash_to_arrow.clear()
         self._umap_param_to_g.clear()
         self._feat_param_to_g.clear()
         cache_coercion_helper.cache_clear()
@@ -2753,7 +2766,8 @@ def _plot_dispatch(self, graph, nodes, name, description, mode='json', metadata=
                 or ( not (maybe_cudf() is None) and isinstance(graph, maybe_cudf().DataFrame) ) \
                 or ( not (maybe_dask_cudf() is None) and isinstance(graph, maybe_dask_cudf().DataFrame) ) \
                 or ( not (maybe_dask_dataframe() is None) and isinstance(graph, maybe_dask_dataframe().DataFrame) ) \
-                or ( not (maybe_spark() is None) and isinstance(graph, maybe_spark().sql.dataframe.DataFrame) ):
+                or ( not (maybe_spark() is None) and isinstance(graph, maybe_spark().sql.dataframe.DataFrame) ) \
+                or ( not (maybe_polars() is None) and isinstance(graph, (maybe_polars().DataFrame, maybe_polars().LazyFrame)) ):
             return g._make_dataset(graph, nodes, name, description, mode, metadata, memoize, validate_mode, emit_warnings)
 
         try:
@@ -2861,7 +2875,7 @@ def bind(df, pbname, attrib, default=None):
 
     def _table_to_pandas(self, table) -> Optional[pd.DataFrame]:
         """
-            pandas | arrow | dask | cudf | dask_cudf => pandas
+            pandas | arrow | dask | cudf | dask_cudf | polars | spark => pandas
         """
 
         if table is None:
@@ -2882,6 +2896,11 @@ def _table_to_pandas(self, table) -> Optional[pd.DataFrame]:
         if not (maybe_dask_dataframe() is None) and isinstance(table, maybe_dask_dataframe().DataFrame):
             return self._table_to_pandas(table.compute())
 
+        if not (maybe_polars() is None) and isinstance(table, (maybe_polars().DataFrame, maybe_polars().LazyFrame)):
+            if isinstance(table, maybe_polars().LazyFrame):
+                table = table.collect()
+            return table.to_pandas()
+
         raise Exception('Unknown type %s: Could not convert data to Pandas dataframe' % str(type(table)))
 
     def _find_bad_arrow_columns(self, df: Any, is_cudf: bool = False) -> List[str]:
@@ -2923,7 +2942,7 @@ def _coerce_mixed_type_columns(self, df: Any, is_cudf: bool = False, emit_warnin
 
     def _table_to_arrow(self, table: Any, memoize: bool = True, validate_mode: ValidationMode = 'autofix', emit_warnings: bool = True) -> Optional[pa.Table]:  # noqa: C901
         """
-            pandas | arrow | dask | cudf | dask_cudf => arrow
+            pandas | arrow | dask | cudf | dask_cudf | polars | spark => arrow
 
             dask/dask_cudf convert to pandas/cudf
 
@@ -3035,6 +3054,29 @@ def _table_to_arrow(self, table: Any, memoize: bool = True, validate_mode: Valid
             #TODO push the hash check to Spark
             return self._table_to_arrow(df, memoize, validate_mode, emit_warnings)
 
+        if not (maybe_polars() is None) and isinstance(table, (maybe_polars().DataFrame, maybe_polars().LazyFrame)):
+            # validate_mode and emit_warnings are not applied for polars input: polars frames are
+            # strictly typed so mixed-type columns cannot exist, making validation a no-op here.
+            if isinstance(table, maybe_polars().LazyFrame):
+                table = table.collect()
+            hashed = None
+            if memoize:
+                try:
+                    hashed = (
+                        hashlib.sha256(table.hash_rows().to_numpy().tobytes()).hexdigest()
+                        + hashlib.sha256(str(table.columns).encode('utf-8')).hexdigest()
+                    )
+                    if hashed in PlotterBase._polars_hash_to_arrow:
+                        return PlotterBase._polars_hash_to_arrow[hashed].v
+                except Exception:
+                    logger.debug('Failed to hash polars frame', exc_info=True)
+            out = table.to_arrow().replace_schema_metadata({})
+            if memoize and hashed is not None:
+                w = WeakValueWrapper(out)
+                cache_coercion(hashed, w)
+                PlotterBase._polars_hash_to_arrow[hashed] = w
+            return out
+
         raise Exception('Unknown type %s: Could not convert data to Arrow' % str(type(table)))
 
     def to_arrow(

diff --git a/graphistry/compute/gfql/df_executor.py b/graphistry/compute/gfql/df_executor.py
@@ -83,21 +83,22 @@ def edges_df_for_step(self, edge_idx: int, state: Optional[PathState] = None) ->
         return state.pruned_edges[edge_idx] if state is not None and edge_idx in state.pruned_edges else self.forward_steps[edge_idx]._edges
 
     def run(self) -> Plottable:
+        mode = os.environ.get(_CUDF_MODE_ENV, "auto").lower()
+        if self.inputs.engine == Engine.CUDF:
+            cudf_available = True
+            try:
+                import cudf  # type: ignore  # noqa: F401
+            except Exception:
+                cudf_available = False
+            if not cudf_available:
+                if mode == "strict":
+                    raise RuntimeError(
+                        "cuDF engine requested with strict mode but cudf is unavailable")
+                # auto mode: fall back to pandas transparently
+                self.inputs = dataclass_replace(self.inputs, engine=Engine.PANDAS)
+        # Collect OTel attrs after engine fallback so gfql.engine reflects actual execution engine
         attrs = self._otel_attrs() if otel_enabled() else None
         with otel_span("gfql.df_executor.run", attrs=attrs):
-            mode = os.environ.get(_CUDF_MODE_ENV, "auto").lower()
-            if self.inputs.engine == Engine.CUDF:
-                cudf_available = True
-                try:
-                    import cudf  # type: ignore  # noqa: F401
-                except Exception:
-                    cudf_available = False
-                if not cudf_available:
-                    if mode == "strict":
-                        raise RuntimeError(
-                            "cuDF engine requested with strict mode but cudf is unavailable")
-                    # auto mode: fall back to pandas transparently
-                    self.inputs = dataclass_replace(self.inputs, engine=Engine.PANDAS)
             self._forward()
             if mode == "oracle":
                 return self._unsafe_run_test_only_oracle()

diff --git a/graphistry/hyper_dask.py b/graphistry/hyper_dask.py
@@ -4,7 +4,7 @@
 
 from typing import TYPE_CHECKING, Any, Dict, List, Optional, Union
 from typing_extensions import Literal
-from .Engine import Engine, EngineAbstractType, DataframeLike, DataframeLocalLike, resolve_engine
+from .Engine import Engine, EngineAbstractType, DataframeLike, DataframeLocalLike, resolve_engine, df_to_engine
 import numpy as np, pandas as pd, pyarrow as pa, sys
 from .util import setup_logger
 logger = setup_logger(__name__)
@@ -817,17 +817,9 @@ def hypergraph(
         engine_resolved = resolve_engine(engine, raw_events)
     else:
         engine_resolved = engine
-    # Coerce input-format types (Arrow, Spark) to the resolved engine's native type
+    # Coerce input-format types (Arrow, Spark, Polars, dask) to the resolved engine's native type
     if raw_events is not None and engine_resolved == Engine.PANDAS and not isinstance(raw_events, pd.DataFrame):
-        if isinstance(raw_events, pa.Table):
-            raw_events = raw_events.to_pandas()
-        else:
-            try:
-                from pyspark.sql import DataFrame as SparkDataFrame
-                if isinstance(raw_events, SparkDataFrame):
-                    raw_events = raw_events.toPandas()
-            except ImportError:
-                pass
+        raw_events = df_to_engine(raw_events, Engine.PANDAS)
 
     defs = HyperBindings(**opts)
     entity_types = [i for i in screen_entities(raw_events, entity_types, defs) if i != defs.event_id]

diff --git a/graphistry/tests/test_df_types.py → graphistry/tests/compute/test_df_types.py b/graphistry/tests/test_df_types.py → graphistry/tests/compute/test_df_types.py