Skip to content

fix(eval_dataset): simplify test case conversion logic#349

Merged
viveknair merged 19 commits intonextfrom
vivek/gt-1841-accept-test-cases-for-eval-dataset-in-interaction
Jul 23, 2025
Merged

fix(eval_dataset): simplify test case conversion logic#349
viveknair merged 19 commits intonextfrom
vivek/gt-1841-accept-test-cases-for-eval-dataset-in-interaction

Conversation

@viveknair
Copy link
Copy Markdown
Contributor

@viveknair viveknair commented Jul 22, 2025

async def process_ai_request(test_case: TestCase) -> Optional[str]:

async def process_ai_request(test_case: TestCase) -> Optional[str]:

Look at these two scripts to learn about the interface structure.

The main idea is that both local test cases (defined with TestInput) and remote test cases are represented with the same TestCase, Stainless-generated class in the interaction() function now. This makes the typing much more straightforward rather than creating a messy Union[TestInput, TestCase] structure.

Also I changed TestInput to be a Pydantic class rather than TypedDict.

@linear
Copy link
Copy Markdown

linear bot commented Jul 22, 2025

@viveknair viveknair marked this pull request as ready for review July 22, 2025 00:41
@viveknair viveknair requested a review from dsafreno July 22, 2025 00:59
Comment thread examples/eval_dataset_simple.py Outdated
# Run the experiment
result = asyncio.run(dataset_evaluation())
print(f"Experiment URL: {result.url}")
print(f"Experiment URL: {result.url}") No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code should check if result is not None before accessing the .url property, similar to the pattern used in eval_dataset_local_cases.py. Consider adding a conditional check:

if result:
    print(f"Experiment URL: {result.url}")

This would prevent a potential NoneType has no attribute 'url' error if the experiment execution fails.

Suggested change
print(f"Experiment URL: {result.url}")
if result:
print(f"Experiment URL: {result.url}")

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

@viveknair viveknair merged commit 7b7a451 into next Jul 23, 2025
9 checks passed
@stainless-app stainless-app bot mentioned this pull request Jul 23, 2025
viveknair pushed a commit that referenced this pull request Jul 23, 2025
* chore(internal): bump pinned h11 dep

* chore(package): mark python 3.13 as supported

* fix(parsing): correctly handle nested discriminated unions

* chore(readme): fix version rendering on pypi

* fix(client): don't send Content-Type header on GET requests

* feat: clean up environment call outs

* codegen metadata

* feat(api): api update

* codegen metadata

* feat(warnings): centralize and refactor warnings (#341)

* feat(warnings): centralize and refactor warnings

* fix(utils): fix import order and console usage

* chore: reorder imports for consistency

* chore(tests): run tests in parallel

* fix(client): correctly parse binary response | stream

* chore(tests): add tests for httpx client instantiation & proxies

* chore(internal): update conftest.py

* chore(ci): enable for pull requests

* chore(readme): update badges

* fix(tests): fix: tests which call HTTP endpoints directly with the example parameters

* chore(internal): version bump

* docs(client): fix httpx.Timeout documentation reference

* feat(client): add support for aiohttp

* feat(client): add support for aiohttp

* chore(tests): skip some failing tests on the latest python versions

* fix(ci): release-doctor — report correct token name

* chore(ci): only run for pushes and fork pull requests

* fix(ci): correct conditional

* chore(ci): change upload type

* chore: fix version

* chore: fix deps

* feat(api): create organization methods

* feat(api): correct the organization structure

* feat: add organizations resource access (#343)

* feat(examples): add genai semantic conventions example (#345)

* feat(experiment): return experiment URL in result (#344)

* fix(parsing): ignore empty metadata

* feat(init): warn on config changes in multiple init() (#347)

* fix(pipeline): improve async validation scheduling (#348)

* fix(parsing): parse extra field types

* fix(eval_dataset): simplify test case conversion logic (#349)

* feat(eval_dataset): pass full test case to interaction

* feat(eval_dataset): clean API, always pass TestCase obj

* fix(eval): update test case fields and typing

* fix(eval): remove local dataset eval and clean up code

* feat(eval): remove dict support for test cases

* fix(eval_dataset): simplify test case conversion logic

* fix: remove unnecessary type ignore comment

* feat(eval): unify TestCase input and model usage

* fix(eval_dataset): update default dataset and pipeline ids

* fix(examples): remove arg, add type ignore comment

* fix(eval_dataset): inline test case conversion logic

* feat(eval): add generic type to TestInput inputs

* chore: Clean up docstring in eval example

* fix(eval): improve typing for TestInput inputs

* fix(eval_dataset): enforce TestCase type for input

* chore: remove unused decorator import and usage

* feat(eval): Support TypedDict for input schema validation

* fix: Print URL only if result exists

* feat(eval): Add pydantic schema validation support

* Revert "fix(pipeline): improve async validation scheduling (#348)"

This reverts commit a9abf88.

* chore: get tests to work with hotfix (#351)

* chore: set asyncio loop scope to function

* chore(tests): change fixture scope to function

* fix(pipeline): improve async validation scheduling (#348) (#352)

* release: 1.0.1

---------

Co-authored-by: stainless-app[bot] <142633134+stainless-app[bot]@users.noreply.github.com>
Co-authored-by: Vivek Nair <vivek@gentrace.ai>
Co-authored-by: meorphis <eric@stainless.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants