Bug: Non-pandas DataFrames accepted by plot() crash in compute methods


## Summary

`pa.Table` (PyArrow) and `pyspark.DataFrame` work when passed to `plot()`, but crash with confusing errors in `materialize_nodes()`, `get_degrees()`, and `hypergraph()`. The library already has the conversion infrastructure — it just isn't called in those paths.

## Support matrix

| Type | `plot()` | `materialize_nodes()` / `get_degrees()` | `hypergraph()` |
|---|:---:|:---:|:---:|
| `pd.DataFrame` | ✅ | ✅ | ✅ |
| `pa.Table` | ✅ | ❌ | ❌ |
| `cudf.DataFrame` | ✅ | ✅ | ✅ |
| `dask.DataFrame` | ✅ | ⚠️ partial | ⚠️ partial |
| `pyspark.DataFrame` | ✅ | ❌ | ❌ |

This issue covers the ❌ cells. Dask gaps are out of scope here.

## Reproduction

```python
import pyarrow as pa
import graphistry

edges = pa.table({'src': ['a', 'b', 'c'], 'dst': ['b', 'c', 'a']})
g = graphistry.edges(edges, 'src', 'dst')

g.plot()               # ✅ works
g.materialize_nodes()  # ❌ ValueError: Could not determine engine for edges,
                       #    expected pandas or cudf dataframe, got: pyarrow.Table
g.get_degrees()        # ❌ same

events = pa.table({'user': ['alice', 'bob'], 'action': ['click', 'view']})
g.hypergraph(events)   # ❌ AttributeError: 'pyarrow.lib.ChunkedArray' object
                       #    has no attribute 'dropna'
```

Same failures with `pyspark.DataFrame`.

## Workaround

```python
g = graphistry.edges(arrow_table.to_pandas(), 'src', 'dst')
g.hypergraph(spark_df.toPandas())
```

## Root cause and fix

pygraphistry has two conversion patterns:

**Upload path** (`plot()`): `graphistry/PlotterBase.py` — `_table_to_arrow()` and `_table_to_pandas()` each have an explicit branch per supported type. Arrow and Spark are handled here and work correctly.

**Compute/hypergraph paths**: The intended pattern is `resolve_engine(df)` → coerce df to match the resolved engine → run engine-specific code. This was applied for pandas and cuDF but not completed for Arrow and Spark.

Three localized fixes, all using existing infrastructure:

**Fix 1 — `graphistry/Engine.py`: `resolve_engine()`**

Currently `resolve_engine()` returns `Engine.PANDAS` for unrecognized types via silent fallthrough (line ~84). Add explicit branches before the fallthrough:

```python
if isinstance(g_or_df, pa.Table):
    return Engine.PANDAS

if not (maybe_spark() is None) and isinstance(g_or_df, maybe_spark().sql.dataframe.DataFrame):
    return Engine.PANDAS
```

**Fix 2 — `graphistry/compute/ComputeMixin.py`: `materialize_nodes()`**

Currently (line ~191) checks `isinstance(g._edges, pd.DataFrame)` then cudf, then raises. After engine detection resolves to `Engine.PANDAS`, add a coerce step before the engine-specific code runs:

```python
if engine_concrete == Engine.PANDAS and not isinstance(g._edges, pd.DataFrame):
    g = g.edges(self._table_to_pandas(g._edges)).nodes(self._table_to_pandas(g._nodes))
```

**Fix 3 — `graphistry/hyper_dask.py`: `hypergraph()`**

After `resolve_engine()` (line ~817), `raw_events` is still in its original type when passed to engine-specific ops. Add one coerce-at-entry block before `screen_entities()` is called:

```python
if engine_resolved == Engine.PANDAS and not isinstance(raw_events, pd.DataFrame):
    raw_events = _table_to_pandas(raw_events)
```

`_table_to_pandas()` already handles Arrow and Spark — no new conversion logic needed. This same fix will cover Polars once #1124 adds it to `_table_to_pandas()`.

## Testing

Add to `tests/test_compute.py` and `tests/test_hypergraph.py` (or a new `tests/test_df_types.py`):
- `pa.table(...)` in `materialize_nodes()` → returns pandas-backed result, no error
- `pa.table(...)` in `get_degrees()` → same
- `pa.table(...)` in `hypergraph()` → returns valid `Hypergraph` result
- Repeat for Spark if available; skip gracefully if not

## Relationship to #1124

Once `_table_to_pandas()` gains a Polars branch (per #1124), Fix 2 and Fix 3 above automatically cover Polars in `materialize_nodes()` and `hypergraph()` — no additional Polars-specific code needed in those paths.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Non-pandas DataFrames accepted by plot() crash in compute methods #1132

Summary

Support matrix

Reproduction

Workaround

Root cause and fix

Testing

Relationship to #1124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	`plot()`	`materialize_nodes()` / `get_degrees()`	`hypergraph()`
`pd.DataFrame`	✅	✅	✅
`pa.Table`	✅	❌	❌
`cudf.DataFrame`	✅	✅	✅
`dask.DataFrame`	✅	⚠️ partial	⚠️ partial
`pyspark.DataFrame`	✅	❌	❌

Bug: Non-pandas DataFrames accepted by plot() crash in compute methods #1132

Description

Summary

Support matrix

Reproduction

Workaround

Root cause and fix

Testing

Relationship to #1124

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions