Add Ibis Hotdata backend examples and Arrow result path#2
Conversation
- backend: information_schema page size constant, dedupe HTTP→Ibis errors, parse_qsl import, version from importlib.metadata - http: poll sleep respects deadline, guard missing columns, typed _safe_call - types: trim trailing blank lines - ruff format/import order
Always submit Hotdata queries asynchronously and materialize successful results from the Arrow IPC result endpoint so the backend has one typed execution path.
| def _arrow_payload_from_table( | ||
| self, | ||
| table: pa.Table, | ||
| *, | ||
| result_id: str, | ||
| ) -> dict[str, Any]: | ||
| sch = table.schema | ||
| columns = sch.names | ||
| nullable = [sch.field(i).nullable for i in range(len(columns))] | ||
| return { | ||
| "format": "arrow", | ||
| "pa_table": table, | ||
| "columns": columns, | ||
| "nullable": nullable, | ||
| "rows": list(data["rows"]) if data.get("rows") is not None else [], | ||
| "row_count": data.get("row_count"), | ||
| "execution_time_ms": data.get("execution_time_ms"), | ||
| "query_run_id": data.get("query_run_id"), | ||
| "result_id": data.get("result_id"), | ||
| "warning": data.get("warning"), | ||
| "rows": [], | ||
| "result_id": result_id, | ||
| "row_count": table.num_rows, | ||
| "execution_time_ms": None, | ||
| "query_run_id": None, | ||
| "warning": None, | ||
| } |
There was a problem hiding this comment.
nit: (not blocking) the backend now consumes only pa_table from this payload — _get_schema_using_query reads data["pa_table"].schema and _safe_raw_sql yields payload["pa_table"]. The columns, nullable, rows, result_id, row_count, execution_time_ms, query_run_id, and warning fields are dead now that the JSON-row code path is gone. Worth shrinking to just {"pa_table": table} (or whatever subset is actually read elsewhere) to avoid the impression that they carry meaningful data.
| if status == 200 and ctype == APPLICATION_ARROW_STREAM.lower(): | ||
| table = _ipc_stream_bytes_to_table(body) | ||
| return self._arrow_payload_from_table(table, result_id=result_id) | ||
|
|
||
| if status == 202: | ||
| _sleep_until(deadline, poll_interval_s) | ||
| continue | ||
|
|
||
| if status == 409: | ||
| d = _json_utf8(body) if body else {} | ||
| raise HotdataAPIError( | ||
| d.get("error_message") or "Result failed", | ||
| status_code=409, | ||
| body=d, | ||
| ) | ||
|
|
||
| if status == 404: | ||
| d = _json_utf8(body) if body else {} | ||
| raise HotdataAPIError( | ||
| d.get("detail") or f"Result {result_id!r} not found", | ||
| status_code=404, | ||
| body=d, | ||
| ) | ||
|
|
||
| raise HotdataAPIError( | ||
| f"Unexpected GET /v1/results/{result_id} status {status}", | ||
| status_code=status, | ||
| body=body, | ||
| ) |
There was a problem hiding this comment.
nit: (not blocking) if the server returns 200 with a non-Arrow content-type (e.g. accidental JSON), the first branch is skipped and execution falls all the way through to Unexpected GET /v1/results/{id} status 200. The "status 200" wording is misleading because the real problem is the content-type mismatch. Consider adding an explicit branch like:
if status == 200:
raise HotdataAPIError(
f"Unexpected Content-Type {ctype!r} for /v1/results/{result_id} (expected {APPLICATION_ARROW_STREAM})",
status_code=200,
body=body,
)so the failure mode is diagnosable.
Summary
Test plan