feat: batch query string array params to avoid URL length limits by skylarmb · Pull Request #274 · honeyhiveai/python-sdk

skylarmb · 2026-02-26T00:21:47Z

Summary

When list parameters (datapoint_ids, run_ids) exceed 100 items, the client now transparently splits requests into batches and merges results
Prevents hitting URL length limits (8-16 KB) and the backend's qs arrayLimit of 1000
Batching is invisible to callers — the API surface is unchanged

Changes

src/honeyhive/api/client.py — added QUERY_BATCH_SIZE, _chunk_list() helper, and batching logic in DatapointsAPI.list()/list_async() and ExperimentsAPI.list_runs()/list_runs_async()
tests/unit/test_api_query_batching.py — 14 unit tests covering chunk logic, passthrough for small lists, batching + merge for large lists, metric deduplication, and async variants

Test plan

pytest tests/unit/test_api_query_batching.py — 14/14 pass
Existing unit tests unaffected (test_api_events_fixes.py — 20/20 pass)

🔗 Related backend PR: https://github.com/honeyhiveai/hive-kube/pull/2430

✨ Created with Claude Code

When list parameters (datapoint_ids, run_ids) exceed 100 items, the client now transparently splits requests into batches and merges the results. This avoids hitting URL length limits (8-16 KB) and the backend's qs arrayLimit of 1000. Affected methods: - DatapointsAPI.list() / list_async() - ExperimentsAPI.list_runs() / list_runs_async() ✨ Created with Claude Code

claude · 2026-02-26T00:22:03Z

Claude finished @skylarmb's task in 2m 10s —— View job

Review complete. Posted 6 inline comments and a top-level summary.

Read changed files and gather context
Review src/honeyhive/api/client.py changes
Review tests/unit/test_api_query_batching.py
Post inline feedback on specific issues
Post top-level summary comment

Key findings:

Bug: page/limit kwargs are forwarded to each batch in _batched_list_runs, silently producing wrong results when callers pass pagination params
Bug: total_unfiltered in synthetic Pagination sums filtered totals instead of unfiltered totals
Perf: Async batching is sequential — asyncio.gather() would parallelize independent batch requests
Design: No error handling for partial batch failures (relevant to SDK's "never crash host" principle)
Nits: _chunk_list type hint too narrow, inline Pagination import should be at module level

snyk-io · 2026-02-26T00:22:08Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

github-actions · 2026-02-26T00:24:05Z

📚 Documentation preview built — Download artifact

Review instructions & validation status

How to Review

Download the artifact from the link above
Extract the files
Open index.html in your browser

Validation Status

API validation: ✅ Passed
Build process: ✅ Successful
Import tests: ✅ All imports working

claude · 2026-02-26T00:24:14Z

Review Summary

Good approach — transparently batching large query-string arrays is the right call for avoiding URL length limits. The implementation is clean, well-tested, and the API surface is unchanged. A few things worth addressing before merge:

Should fix:

page/limit forwarded to batched requests — When batching is active, the caller's page and limit params are passed to every batch call via **kwargs. This will silently produce wrong results (e.g., limit=10 caps each batch at 10 items). These should be stripped or overridden in the batched path.
total_unfiltered in synthetic Pagination — Currently set to the sum of pagination.total, which is the filtered count. Should sum pagination.total_unfiltered instead.

Worth considering:

Async batching is sequential — _batched_list_runs_async and list_async use a sequential for loop with await. Since batches are independent, asyncio.gather() would parallelize the network calls for a meaningful speedup on large lists.
No error handling for partial batch failures — If batch 3 of 5 fails, the exception propagates and all successfully-fetched data is lost. Given the SDK's "never crash host application" principle, consider at minimum adding context to the exception.

Nits:

_chunk_list type hint is List[str] but works generically — consider List[Any] or a TypeVar.
Pagination import appears inline in two method bodies — could be a top-level import.

Tests look solid — good coverage of boundary conditions, passthrough, batching + merge, metric deduplication, and async variants.

- Parallelize async batch requests with asyncio.gather() - Strip page/limit params from batched run requests - Fix total_unfiltered aggregation to use per-batch values - Add error context logging before re-raising batch failures - Use generic TypeVar for _chunk_list type hint - Move Pagination import to module level ✨ Created with Claude Code

github-actions · 2026-02-26T00:33:45Z

📚 Documentation preview built — Download artifact

Review instructions & validation status

How to Review

Download the artifact from the link above
Extract the files
Open index.html in your browser

Validation Status

API validation: ✅ Passed
Build process: ✅ Successful
Import tests: ✅ All imports working

✨ Created with Claude Code

skylarmb · 2026-02-26T01:01:29Z

@claude review if feedback has been adequately addressed

claude · 2026-02-26T01:01:45Z

Claude finished @skylarmb's task in 1m 43s —— View job

All previous review feedback has been adequately addressed in 4068ce6.

Read current state of changed files
Check if page/limit kwargs bug was fixed
Check if total_unfiltered bug was fixed
Check if async batching was parallelized with asyncio.gather()
Check if error handling for partial batch failures was added
Check nits: _chunk_list type hint, Pagination import placement
Post summary of findings

Feedback status

Previous feedback	Status	Details
Bug: `page`/`limit` forwarded to batched requests	✅ Fixed	Both `_batched_list_runs` and `_batched_list_runs_async` now `kwargs.pop("page", None)` / `kwargs.pop("limit", None)`. New test `test_page_and_limit_stripped_from_batches` covers this.
Bug: `total_unfiltered` summing filtered totals	✅ Fixed	Both sync and async paths now track and sum `resp.pagination.total_unfiltered` separately. Test assertion added.
Perf: Async batching sequential	✅ Fixed	`list_async` and `_batched_list_runs_async` both use `asyncio.gather()` for parallel batch requests.
Design: No error handling for partial batch failures	✅ Addressed	Sync methods wrap each batch call in try/except with `logger.warning` context before re-raising. Async paths using `asyncio.gather()` propagate naturally. Reasonable approach.
Nit: `_chunk_list` type hint too narrow	✅ Fixed	Now uses `TypeVar("T")` with `List[T]` signature.
Nit: `Pagination` import inline	✅ Fixed	Moved to module-level import block (line 57). No remaining inline imports.

One minor remaining nit: there's only one blank line between T = TypeVar("T") and def _chunk_list(...) (line 117-118) — Black expects two blank lines before top-level function definitions. Not a blocker.

LGTM — the fix commit cleanly addresses every point raised in the initial review.

github-actions · 2026-02-26T01:02:40Z

📚 Documentation preview built — Download artifact

Review instructions & validation status

How to Review

Download the artifact from the link above
Extract the files
Open index.html in your browser

Validation Status

API validation: ✅ Passed
Build process: ✅ Successful
Import tests: ✅ All imports working

✨ Created with Claude Code

github-actions · 2026-02-26T01:05:28Z

📚 Documentation preview built — Download artifact

Review instructions & validation status

How to Review

Download the artifact from the link above
Extract the files
Open index.html in your browser

Validation Status

API validation: ✅ Passed
Build process: ✅ Successful
Import tests: ✅ All imports working