Add Ibis Hotdata backend examples and Arrow result path by eddietejeda · Pull Request #2 · hotdata-dev/hotdata-ibis

eddietejeda · 2026-05-08T19:48:11Z

Summary

Add runnable Ibis Hotdata examples and README guidance for TPC-H-backed workspaces.
Refactor the HTTP layer to use the official Hotdata Python SDK.
Switch query execution to async-only result polling with Arrow IPC materialization.

Test plan

uv run ruff check src tests examples
uv run ruff format --check src tests examples
uv run pytest tests -q

- backend: information_schema page size constant, dedupe HTTP→Ibis errors, parse_qsl import, version from importlib.metadata - http: poll sleep respects deadline, guard missing columns, typed _safe_call - types: trim trailing blank lines - ruff format/import order

Always submit Hotdata queries asynchronously and materialize successful results from the Arrow IPC result endpoint so the backend has one typed execution path.

claude · 2026-05-08T19:51:37Z

+    def _arrow_payload_from_table(
+        self,
+        table: pa.Table,
+        *,
+        result_id: str,
+    ) -> dict[str, Any]:
+        sch = table.schema
+        columns = sch.names
+        nullable = [sch.field(i).nullable for i in range(len(columns))]
        return {
+            "format": "arrow",
+            "pa_table": table,
            "columns": columns,
            "nullable": nullable,
-            "rows": list(data["rows"]) if data.get("rows") is not None else [],
-            "row_count": data.get("row_count"),
-            "execution_time_ms": data.get("execution_time_ms"),
-            "query_run_id": data.get("query_run_id"),
-            "result_id": data.get("result_id"),
-            "warning": data.get("warning"),
+            "rows": [],
+            "result_id": result_id,
+            "row_count": table.num_rows,
+            "execution_time_ms": None,
+            "query_run_id": None,
+            "warning": None,
        }


nit: (not blocking) the backend now consumes only pa_table from this payload — _get_schema_using_query reads data["pa_table"].schema and _safe_raw_sql yields payload["pa_table"]. The columns, nullable, rows, result_id, row_count, execution_time_ms, query_run_id, and warning fields are dead now that the JSON-row code path is gone. Worth shrinking to just {"pa_table": table} (or whatever subset is actually read elsewhere) to avoid the impression that they carry meaningful data.

claude · 2026-05-08T19:51:42Z

+            if status == 200 and ctype == APPLICATION_ARROW_STREAM.lower():
+                table = _ipc_stream_bytes_to_table(body)
+                return self._arrow_payload_from_table(table, result_id=result_id)
+
+            if status == 202:
+                _sleep_until(deadline, poll_interval_s)
+                continue
+
+            if status == 409:
+                d = _json_utf8(body) if body else {}
+                raise HotdataAPIError(
+                    d.get("error_message") or "Result failed",
+                    status_code=409,
+                    body=d,
+                )
+
+            if status == 404:
+                d = _json_utf8(body) if body else {}
+                raise HotdataAPIError(
+                    d.get("detail") or f"Result {result_id!r} not found",
+                    status_code=404,
+                    body=d,
+                )

+            raise HotdataAPIError(
+                f"Unexpected GET /v1/results/{result_id} status {status}",
+                status_code=status,
+                body=body,
+            )


nit: (not blocking) if the server returns 200 with a non-Arrow content-type (e.g. accidental JSON), the first branch is skipped and execution falls all the way through to Unexpected GET /v1/results/{id} status 200. The "status 200" wording is misleading because the real problem is the content-type mismatch. Consider adding an explicit branch like:

if status == 200: raise HotdataAPIError( f"Unexpected Content-Type {ctype!r} for /v1/results/{result_id} (expected {APPLICATION_ARROW_STREAM})", status_code=200, body=body, )

so the failure mode is diagnosable.

eddietejeda added 2 commits May 7, 2026 15:03

refactor: make query results Arrow-only

cdd6733

Always submit Hotdata queries asynchronously and materialize successful results from the Arrow IPC result endpoint so the backend has one typed execution path.

claude Bot reviewed May 8, 2026

View reviewed changes

claude Bot approved these changes May 8, 2026

View reviewed changes

eddietejeda merged commit 11f733b into main May 8, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ibis Hotdata backend examples and Arrow result path#2

Add Ibis Hotdata backend examples and Arrow result path#2
eddietejeda merged 2 commits into
mainfrom
docs/readme-ibis-examples

eddietejeda commented May 8, 2026

Uh oh!

claude Bot May 8, 2026

Uh oh!

claude Bot May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eddietejeda commented May 8, 2026

Summary

Test plan

Uh oh!

claude Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant