Skip to content

Comments

feat(sdk): add BenchmarkRun and AsyncBenchmarkRun classes#712

Merged
sid-rl merged 5 commits intonextfrom
siddarth/benchmark-sdk
Dec 17, 2025
Merged

feat(sdk): add BenchmarkRun and AsyncBenchmarkRun classes#712
sid-rl merged 5 commits intonextfrom
siddarth/benchmark-sdk

Conversation

@sid-rl
Copy link
Contributor

@sid-rl sid-rl commented Dec 17, 2025

Add new SDK classes for managing benchmark runs:

  • BenchmarkRun: Synchronous class for benchmark run operations

    • get_info(): Retrieve run status and metadata
    • cancel(): Cancel the benchmark run
    • complete(): Mark the run as completed
    • list_scenario_runs(): List scenario runs with filtering
  • AsyncBenchmarkRun: Async version with the same interface

  • SDKBenchmarkRunListScenarioRunsParams: TypedDict for list params

  • Unit tests for both sync and async classes

  • E2E smoketests validating against real API

@sid-rl sid-rl requested a review from james-rl December 17, 2025 01:38
Copy link
Contributor

@james-rl james-rl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few areas of potential inconsistency or where the naming looks a bit weird, but this looks generally ok.

# Get info
info = benchmark_run.get_info()
assert info.id == run_data.id
assert info.state in ["queued", "running", "completed", "canceled"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queued doesn't exist as a state?

Comment on lines 112 to 126
def list_scenario_runs(
self,
**params: Unpack[SDKBenchmarkRunListScenarioRunsParams],
) -> List[ScenarioRunView]:
"""List all scenario runs for this benchmark run.

:param params: See :typeddict:`~runloop_api_client.sdk._types.SDKBenchmarkRunListScenarioRunsParams` for available parameters
:return: List of scenario run views
:rtype: List[ScenarioRunView]
"""
page = self._client.benchmarks.runs.list_scenario_runs(
self._id,
**params,
)
return list(page)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider improving the interaction pattern to provide something like a paged iterator (or adding a separate iterator method to do this instead).

Comment on lines +63 to +66
async def get_info(
self,
**options: Unpack[BaseRequestOptions],
) -> BenchmarkRunView:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call this something else instead? get_state or refresh?

@sid-rl sid-rl force-pushed the siddarth/benchmark-sdk branch from 04b9a9c to 4650641 Compare December 17, 2025 18:17
@sid-rl sid-rl requested a review from james-rl December 17, 2025 20:22
Copy link
Contributor

@james-rl james-rl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a shame that get_info is now a widely supported convention but I agree that it's worse to make a change to only one place.

A minor nit for you in the comments, but this generally looks good

self._id = run_id
self._benchmark_id = benchmark_id

@override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want @override without a base class

Comment on lines +42 to +44
@override
def __repr__(self) -> str:
return f"<BenchmarkRun id={self._id!r}>"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here

@sid-rl sid-rl merged commit 1021c5a into next Dec 17, 2025
6 checks passed
@sid-rl sid-rl deleted the siddarth/benchmark-sdk branch December 17, 2025 20:37
@stainless-app stainless-app bot mentioned this pull request Dec 17, 2025
stainless-app bot pushed a commit that referenced this pull request Dec 18, 2025
* update requirements-dev

* pyproject formatting nit

* feat(sdk): add BenchmarkRun and AsyncBenchmarkRun classes

* fixed smoketests

* `list_scenario_runs()` now returns a list of ScenarioRun/AsyncScenarioRun objects
dines-rl added a commit that referenced this pull request Jan 20, 2026
* fix(types): allow pyright to infer TypedDict types within SequenceNotStr

* chore: add missing docstrings

* feat(devbox): added stdin streaming endpoint

* chore(internal): add missing files argument to base client

* feat(benchmarks): add `update_scenarios` method to benchmarks resource

* fix(benchmarks): `update()` for benchmarks and scenarios replaces all provided fields and does not modify unspecified fields (#6702)

* feat(sdk): add BenchmarkRun and AsyncBenchmarkRun classes (#712)

* update requirements-dev

* pyproject formatting nit

* feat(sdk): add BenchmarkRun and AsyncBenchmarkRun classes

* fixed smoketests

* `list_scenario_runs()` now returns a list of ScenarioRun/AsyncScenarioRun objects

* cleanup(agents): unified version parameter across agent sources (#713)

* cleanup(agents): unified version parameter across agent sources

* increase snapshot test timeout

* reinsert version parameter into example code

* fix: use async_to_httpx_files in patch method

* codegen metadata

* feat(sdk): add Benchmark and AsyncBenchmark classes (#714)

* feat(sdk): add Benchmark and AsyncBenchmark classes (with some import and test id cleanup)

* raise exceptions instead of skipping, more defensively run scenario

* rename benchmark `run()` to `start_run()`

* more helpful example docstrings

* comments about params type splitting for developer clarity

* remove low value unit tests

* add smoketest TODOs

* skip list_runs() smoketest when no available benchmark runs

* create/update custom benchmark and scenarios for smoketest, remove benchmark retrieval smoketest

* feat(sdk): add BenchmarkOps and AsyncBenchmarkOps to SDK (#716)

* chore(internal): add `--fix` argument to lint script

* chore(internal): codegen related update

* feat(client): add support for binary request streaming

* feat(devbox): remove this one

* feat(network-policy): add network policies to api

* chore(internal): update `actions/checkout` version

* feat(blueprint): Set cilium network policy on blueprint build (#7006)

* chore(devbox): Remove network policy from devbox view; use launch params instead (#7025)

* refactor(benchmark):  Deprecate /benchmark/{id}/runs in favor of /benchmark_runs (#7019)

* release: 1.3.0-alpha

* cp dines

---------

Co-authored-by: stainless-app[bot] <142633134+stainless-app[bot]@users.noreply.github.com>
Co-authored-by: sid-rl <siddarth@runloop.ai>
Co-authored-by: Alexander Dines <alex@runloop.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants