Summary
pa.Table (PyArrow) and pyspark.DataFrame work when passed to plot(), but crash with confusing errors in materialize_nodes(), get_degrees(), and hypergraph(). The library already has the conversion infrastructure — it just isn't called in those paths.
Support matrix
| Type |
plot() |
materialize_nodes() / get_degrees() |
hypergraph() |
pd.DataFrame |
✅ |
✅ |
✅ |
pa.Table |
✅ |
❌ |
❌ |
cudf.DataFrame |
✅ |
✅ |
✅ |
dask.DataFrame |
✅ |
⚠️ partial |
⚠️ partial |
pyspark.DataFrame |
✅ |
❌ |
❌ |
This issue covers the ❌ cells. Dask gaps are out of scope here.
Reproduction
import pyarrow as pa
import graphistry
edges = pa.table({'src': ['a', 'b', 'c'], 'dst': ['b', 'c', 'a']})
g = graphistry.edges(edges, 'src', 'dst')
g.plot() # ✅ works
g.materialize_nodes() # ❌ ValueError: Could not determine engine for edges,
# expected pandas or cudf dataframe, got: pyarrow.Table
g.get_degrees() # ❌ same
events = pa.table({'user': ['alice', 'bob'], 'action': ['click', 'view']})
g.hypergraph(events) # ❌ AttributeError: 'pyarrow.lib.ChunkedArray' object
# has no attribute 'dropna'
Same failures with pyspark.DataFrame.
Workaround
g = graphistry.edges(arrow_table.to_pandas(), 'src', 'dst')
g.hypergraph(spark_df.toPandas())
Root cause and fix
pygraphistry has two conversion patterns:
Upload path (plot()): graphistry/PlotterBase.py — _table_to_arrow() and _table_to_pandas() each have an explicit branch per supported type. Arrow and Spark are handled here and work correctly.
Compute/hypergraph paths: The intended pattern is resolve_engine(df) → coerce df to match the resolved engine → run engine-specific code. This was applied for pandas and cuDF but not completed for Arrow and Spark.
Three localized fixes, all using existing infrastructure:
Fix 1 — graphistry/Engine.py: resolve_engine()
Currently resolve_engine() returns Engine.PANDAS for unrecognized types via silent fallthrough (line ~84). Add explicit branches before the fallthrough:
if isinstance(g_or_df, pa.Table):
return Engine.PANDAS
if not (maybe_spark() is None) and isinstance(g_or_df, maybe_spark().sql.dataframe.DataFrame):
return Engine.PANDAS
Fix 2 — graphistry/compute/ComputeMixin.py: materialize_nodes()
Currently (line ~191) checks isinstance(g._edges, pd.DataFrame) then cudf, then raises. After engine detection resolves to Engine.PANDAS, add a coerce step before the engine-specific code runs:
if engine_concrete == Engine.PANDAS and not isinstance(g._edges, pd.DataFrame):
g = g.edges(self._table_to_pandas(g._edges)).nodes(self._table_to_pandas(g._nodes))
Fix 3 — graphistry/hyper_dask.py: hypergraph()
After resolve_engine() (line ~817), raw_events is still in its original type when passed to engine-specific ops. Add one coerce-at-entry block before screen_entities() is called:
if engine_resolved == Engine.PANDAS and not isinstance(raw_events, pd.DataFrame):
raw_events = _table_to_pandas(raw_events)
_table_to_pandas() already handles Arrow and Spark — no new conversion logic needed. This same fix will cover Polars once #1124 adds it to _table_to_pandas().
Testing
Add to tests/test_compute.py and tests/test_hypergraph.py (or a new tests/test_df_types.py):
pa.table(...) in materialize_nodes() → returns pandas-backed result, no error
pa.table(...) in get_degrees() → same
pa.table(...) in hypergraph() → returns valid Hypergraph result
- Repeat for Spark if available; skip gracefully if not
Relationship to #1124
Once _table_to_pandas() gains a Polars branch (per #1124), Fix 2 and Fix 3 above automatically cover Polars in materialize_nodes() and hypergraph() — no additional Polars-specific code needed in those paths.
Summary
pa.Table(PyArrow) andpyspark.DataFramework when passed toplot(), but crash with confusing errors inmaterialize_nodes(),get_degrees(), andhypergraph(). The library already has the conversion infrastructure — it just isn't called in those paths.Support matrix
plot()materialize_nodes()/get_degrees()hypergraph()pd.DataFramepa.Tablecudf.DataFramedask.DataFramepyspark.DataFrameThis issue covers the ❌ cells. Dask gaps are out of scope here.
Reproduction
Same failures with
pyspark.DataFrame.Workaround
Root cause and fix
pygraphistry has two conversion patterns:
Upload path (
plot()):graphistry/PlotterBase.py—_table_to_arrow()and_table_to_pandas()each have an explicit branch per supported type. Arrow and Spark are handled here and work correctly.Compute/hypergraph paths: The intended pattern is
resolve_engine(df)→ coerce df to match the resolved engine → run engine-specific code. This was applied for pandas and cuDF but not completed for Arrow and Spark.Three localized fixes, all using existing infrastructure:
Fix 1 —
graphistry/Engine.py:resolve_engine()Currently
resolve_engine()returnsEngine.PANDASfor unrecognized types via silent fallthrough (line ~84). Add explicit branches before the fallthrough:Fix 2 —
graphistry/compute/ComputeMixin.py:materialize_nodes()Currently (line ~191) checks
isinstance(g._edges, pd.DataFrame)then cudf, then raises. After engine detection resolves toEngine.PANDAS, add a coerce step before the engine-specific code runs:Fix 3 —
graphistry/hyper_dask.py:hypergraph()After
resolve_engine()(line ~817),raw_eventsis still in its original type when passed to engine-specific ops. Add one coerce-at-entry block beforescreen_entities()is called:_table_to_pandas()already handles Arrow and Spark — no new conversion logic needed. This same fix will cover Polars once #1124 adds it to_table_to_pandas().Testing
Add to
tests/test_compute.pyandtests/test_hypergraph.py(or a newtests/test_df_types.py):pa.table(...)inmaterialize_nodes()→ returns pandas-backed result, no errorpa.table(...)inget_degrees()→ samepa.table(...)inhypergraph()→ returns validHypergraphresultRelationship to #1124
Once
_table_to_pandas()gains a Polars branch (per #1124), Fix 2 and Fix 3 above automatically cover Polars inmaterialize_nodes()andhypergraph()— no additional Polars-specific code needed in those paths.