SEG-52: add v2 async helpers (submit_async, run_async, AsyncJob)#1
Conversation
Adds support for the v2 submit-then-poll inference path:
client.submit_async(slug, **params) -> AsyncJob
client.run_async(slug, **params, timeout=600, interval=1.0) -> dict
AsyncJob.wait(timeout, interval) -> dict
AsyncJob.status() / AsyncJob.result()
Plus module-level shortcuts segmind.submit_async / segmind.run_async
that resolve through the lazy default SegmindClient.
Design choices:
* 1.0s poll interval, 600s timeout defaults. No consumption of
server-side polling hints (SEG-243 was cancelled in favour of a
plain client). Callers tune timeout/interval per call for slow
models, or use webhooks (SEG-93).
* Two new exceptions, InferenceFailed + InferenceTimeout, both
subclassing the existing SegmindError so callers can broad-catch.
Names deliberately omit the 'Error' suffix for natural reading
(per-file ruff noqa).
* _v2_base() derives the v2 prefix from client.base_url so callers
who override for staging (api-latest.segmind.com/v1) keep working
without a separate v2 base_url.
* If a 2xx submit response lacks request_id/status_url/response_url
we raise SegmindError immediately rather than polling forever on a
missing URL.
Tests (11/11 pass, respx-mocked, no network):
* submit returns AsyncJob with the right URLs
* submit propagates 4xx via raise_for_status
* submit raises on missing request_id in a 2xx body
* wait returns result on COMPLETED
* wait polls through QUEUED -> PROCESSING -> COMPLETED
* wait raises InferenceFailed on FAILED with server error string
* wait raises InferenceTimeout when deadline elapses
* run_async one-shot end-to-end
* v2 URL derives correctly from a staging base_url override
* module-level run_async uses the lazy default client
Live smoke against api-latest.segmind.com /v2/mock-inference:
* run_async(sleep=1) -> status=COMPLETED, inference_time=1.013s
* run_async(sleep=5, timeout=1) -> InferenceTimeout raised cleanly
Linear: SEG-52 (parent, Phase 1 - Async core).
There was a problem hiding this comment.
Code Review
This pull request introduces v2 async inference capabilities to the Segmind Python SDK, adding submit_async and run_async methods alongside an AsyncJob class for polling and retrieving results. The feedback focuses on optimizing the polling loop in AsyncJob.wait to avoid sleeping past the timeout deadline, removing the unused _TERMINAL_STATES constant, and exposing SegmindError at the package level to simplify exception handling and unit tests.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| deadline = time.monotonic() + timeout | ||
| while True: | ||
| status_body = self.status() | ||
| state = status_body.get("status") | ||
|
|
||
| if state == "COMPLETED": | ||
| return self.result() | ||
|
|
||
| if state == "FAILED": | ||
| # /status carries the error for FAILED; pull the full body | ||
| # so the exception caller has metrics + request_id alongside. | ||
| final = self.result() | ||
| err = final.get("error") or status_body.get("error") | ||
| raise InferenceFailed(detail=err, response_body=final) | ||
|
|
||
| if time.monotonic() >= deadline: | ||
| raise InferenceTimeout( | ||
| request_id=self.request_id, | ||
| elapsed_s=timeout, | ||
| ) | ||
|
|
||
| time.sleep(interval) |
There was a problem hiding this comment.
The current polling loop can sleep past the deadline and make an unnecessary HTTP request after the timeout has already expired. Additionally, if the timeout is reached during the sleep, the loop will still perform another status check before raising InferenceTimeout. We can optimize this by checking the deadline at the start of the loop, checking if the remaining time is exceeded before sleeping, and capping the sleep interval to the remaining time.
deadline = time.monotonic() + timeout
while True:
if time.monotonic() >= deadline:
raise InferenceTimeout(
request_id=self.request_id,
elapsed_s=timeout,
)
status_body = self.status()
state = status_body.get("status")
if state == "COMPLETED":
return self.result()
if state == "FAILED":
# /status carries the error for FAILED; pull the full body
# so the exception caller has metrics + request_id alongside.
final = self.result()
err = final.get("error") or status_body.get("error")
raise InferenceFailed(detail=err, response_body=final)
remaining = deadline - time.monotonic()
if remaining <= 0:
raise InferenceTimeout(
request_id=self.request_id,
elapsed_s=timeout,
)
time.sleep(min(interval, remaining))| # Status strings reported by the v2 status endpoint. Anything outside | ||
| # this set is treated as in-progress (forward-compat with future states). | ||
| _TERMINAL_STATES = ("COMPLETED", "FAILED") |
| from segmind.v2 import ( | ||
| DEFAULT_POLL_INTERVAL_S, | ||
| DEFAULT_POLL_TIMEOUT_S, | ||
| AsyncJob, | ||
| InferenceFailed, | ||
| InferenceTimeout, | ||
| ) |
There was a problem hiding this comment.
Exposing SegmindError at the package level (segmind.SegmindError) is highly recommended so that users of the SDK can easily import and catch the base exception class without needing to import from internal modules. This also avoids workarounds in the test suite.
| from segmind.v2 import ( | |
| DEFAULT_POLL_INTERVAL_S, | |
| DEFAULT_POLL_TIMEOUT_S, | |
| AsyncJob, | |
| InferenceFailed, | |
| InferenceTimeout, | |
| ) | |
| from segmind.exceptions import SegmindError | |
| from segmind.v2 import ( | |
| DEFAULT_POLL_INTERVAL_S, | |
| DEFAULT_POLL_TIMEOUT_S, | |
| AsyncJob, | |
| InferenceFailed, | |
| InferenceTimeout, | |
| ) |
| "DEFAULT_POLL_INTERVAL_S", | ||
| "DEFAULT_POLL_TIMEOUT_S", | ||
| "AsyncJob", | ||
| "InferenceFailed", | ||
| "InferenceTimeout", |
There was a problem hiding this comment.
Add SegmindError to all to explicitly export it as part of the public API.
| "DEFAULT_POLL_INTERVAL_S", | |
| "DEFAULT_POLL_TIMEOUT_S", | |
| "AsyncJob", | |
| "InferenceFailed", | |
| "InferenceTimeout", | |
| "DEFAULT_POLL_INTERVAL_S", | |
| "DEFAULT_POLL_TIMEOUT_S", | |
| "AsyncJob", | |
| "InferenceFailed", | |
| "InferenceTimeout", | |
| "SegmindError", |
| return_value=httpx.Response(401, json={"error": "Invalid API key"}) | ||
| ) | ||
|
|
||
| with pytest.raises(segmind.SegmindError if hasattr(segmind, "SegmindError") else Exception): |
There was a problem hiding this comment.
Self-review pass before the tester session reports back.
Simplicity:
* Remove _TERMINAL_STATES constant — defined but never referenced.
* Drop the SEG-243 self-reference from the module docstring;
replace with product-facing 'use larger timeout/interval for
slow models, or webhooks for fire-and-forget'.
Optimization:
* FAILED path no longer makes a second HTTP round-trip to the
response URL. heimdall's /v2/requests/{id}/status already carries
the error string on FAILED (SEG-97), so we can build the
InferenceFailed exception from the status body alone.
* Rename InferenceFailed.response_body -> .status_body to reflect
that it now holds the status payload, not the full result.
Callers who want server metrics on failure can still call
AsyncJob.result() themselves after catching.
Test:
* test_wait_raises_inference_failed_on_failed asserts via
result_route.called == False that the optimization holds —
any regression that re-introduces the extra GET fails this test.
All 11 tests still pass; ruff + black clean.
Self-review pass — pushed
|
…ED + nits
Tester-session findings on SEG-52 (scenarios 5 and #4 nit). Reproduced
end-to-end against api-latest.
Headline bug — InferenceFailed was unreachable for worker-side failures.
* heimdall returns HTTP 422 on /v2/requests/{id}/status when the task
is in terminal FAILED state, while still carrying the
{status: 'FAILED', error: '...'} body. The SDK's
_request -> raise_for_status raised SegmindError(422) before
wait() could inspect the body, so the FAILED branch never fired.
* Fix: add AsyncJob._fetch_terminal_tolerant(url) which uses the
underlying httpx client directly (no raise_for_status). If the body
announces status COMPLETED or FAILED, return it as a valid payload
regardless of HTTP code; otherwise fall through to the existing
raise_for_status so genuine 401/404/5xx still surface as
SegmindError. status() and result() both route through the helper.
* _TERMINAL_STATES constant re-added (now used by the helper).
* Two new tests:
- 4xx-with-FAILED-body -> InferenceFailed
- genuine 401 with no terminal body -> SegmindError(401),
NOT a wrapped InferenceFailed
* Live verified end-to-end:
- sleep=99999 -> InferenceFailed('Validation error...sleep must be between 0 and 900.0 seconds')
- bad slug (submit-time) -> SegmindError(404, 'Model information not found')
- happy path unchanged -> COMPLETED, inference_time=1.013s
Nits from the tester comment:
* InferenceTimeout.elapsed_s was stamped as the *configured* timeout
arg, not real wall time. Now computed via
time.monotonic() - start (live: timeout=2.0 -> elapsed_s=2.264,
including last poll's sleep).
* wait() docstring now notes that the result dict shape is
model-dependent (status/output/metrics are reliable; the rest is
model-specific).
Tests: 13/13 pass. ruff + black clean.
Tester verdict: 'requires one change before ship' — this commit
addresses that change. Host URL leak (staging returns prod URLs) is
heimdall-side, not SDK; will note as a separate follow-up.
The Documentation workflow auto-failed on this PR because actions/upload-pages-artifact@v2 transitively pulls in actions/upload-artifact@v3, which GitHub auto-fails as of 2024-04-16. The if: refs/heads/main gates don't help — the deprecation check runs at workflow parse time, before any step runs. Bump to the current major versions, all of which use actions/upload-artifact@v4 internally: actions/setup-python @v4 -> @v5 actions/configure-pages @V3 -> @v5 actions/upload-pages-artifact@v2 -> @V3 actions/deploy-pages @v2 -> @v4 No behaviour change in the steps themselves; the docs build job still runs on PR (build smoke) and the deploy steps still gate on github.ref == 'refs/heads/main'.
Back-merged origin/main (PR #1 v2-async + the docs.yml deprecated-actions CI fix) into this branch so build-and-deploy passes — it was red only because the branch predated the Pages-actions bump now on main. Bump __version__ 1.0.0 -> 1.1.0: 1.0.0 is already on PyPI, and main gained the v2 async feature (submit_async / run_async / AsyncJob) since 1.0.0 — a minor bump. This branch also carries the SEG-319 X-Initiator: SDK-PY change, so 1.1.0 ships both. Full suite green after the merge.
* feat(client): send X-Initiator: SDK-PY so SDK traffic is attributable (SEG-319) The SDK sent X-Initiator: segmind-python-sdk/0.1.0, which spot-backend's SQS worker rejects (not in InitiatorType) and coerces to OTHERS — so SDK calls are indistinguishable from raw requests/curl in the DB. Send the stable token X-Initiator: SDK-PY instead. Heimdall passes it through verbatim on the sync path (-> SDK-PY) and suffixes -V2 on the v2-async path (-> SDK-PY-V2). Both are added to InitiatorType in the paired spot-backend PR. Version detail stays in the User-Agent header (segmind-python-sdk/0.1.0), which heimdall logs — so we don't lose version telemetry. Updated test_http_client_headers assertions accordingly. Full suite: 256 passed, 7 skipped. * chore(release): back-merge main + bump version to 1.1.0 Back-merged origin/main (PR #1 v2-async + the docs.yml deprecated-actions CI fix) into this branch so build-and-deploy passes — it was red only because the branch predated the Pages-actions bump now on main. Bump __version__ 1.0.0 -> 1.1.0: 1.0.0 is already on PyPI, and main gained the v2 async feature (submit_async / run_async / AsyncJob) since 1.0.0 — a minor bump. This branch also carries the SEG-319 X-Initiator: SDK-PY change, so 1.1.0 ships both. Full suite green after the merge. * chore(release): use patch bump 1.0.1 (not 1.1.0) --------- Co-authored-by: Shrey Kant Rajvanshi <shrey@segmind.com>
Summary
Adds v2 async support to the Python SDK — submit-and-poll for any heimdall `/v2/{slug}` model — so callers can stop wrapping their own request/poll loops around `requests.post`.
```python
import segmind
One-shot: submit + poll until COMPLETED
result = segmind.run_async("seedance-1-pro", prompt="A sunset", timeout=300)
Or split for finer control (parallelism, request_id tracking)
job = segmind.submit_async("seedance-1-pro", prompt="A sunset")
print(job.request_id)
result = job.wait(timeout=300)
```
Defaults: `interval=1.0s`, `timeout=600s`. Override per call for very slow models (Veo, Seedance video) or use webhooks (SEG-93) instead.
What's in
What's deliberately NOT in
Design notes
Smoke
Tickets
Parent: SEG-52 "SDK async" (Phase 1 — Async core).
Cancelled sibling: SEG-243 (server-side polling hints — decided we don't need them; client picks defaults).
Related: SEG-93 (webhooks, for slow models).