fix(eval_dataset): simplify test case conversion logic by viveknair · Pull Request #349 · gentrace/gentrace-python

viveknair · 2025-07-22T00:35:51Z

gentrace-python/examples/eval_dataset_local_cases.py

Line 32 in 057bae9

async def process_ai_request(test_case: TestCase) -> Optional[str]:

gentrace-python/examples/eval_dataset_simple.py

Line 26 in 057bae9

async def process_ai_request(test_case: TestCase) -> Optional[str]:

Look at these two scripts to learn about the interface structure.

The main idea is that both local test cases (defined with TestInput) and remote test cases are represented with the same TestCase, Stainless-generated class in the interaction() function now. This makes the typing much more straightforward rather than creating a messy Union[TestInput, TestCase] structure.

Also I changed TestInput to be a Pydantic class rather than TypedDict.

linear · 2025-07-22T00:35:54Z

GT-1841 Accept test cases for eval dataset in interaction()

graphite-app · 2025-07-22T22:41:13Z

-    # Run the experiment
    result = asyncio.run(dataset_evaluation())
-    print(f"Experiment URL: {result.url}")
+    print(f"Experiment URL: {result.url}")


The code should check if result is not None before accessing the .url property, similar to the pattern used in eval_dataset_local_cases.py. Consider adding a conditional check:

if result: print(f"Experiment URL: {result.url}")

This would prevent a potential NoneType has no attribute 'url' error if the experiment execution fails.

Suggested change

print(f"Experiment URL: {result.url}")

if result:

print(f"Experiment URL: {result.url}")

Spotted by Diamond

Is this helpful? React 👍 or 👎 to let us know.

* chore(internal): bump pinned h11 dep * chore(package): mark python 3.13 as supported * fix(parsing): correctly handle nested discriminated unions * chore(readme): fix version rendering on pypi * fix(client): don't send Content-Type header on GET requests * feat: clean up environment call outs * codegen metadata * feat(api): api update * codegen metadata * feat(warnings): centralize and refactor warnings (#341) * feat(warnings): centralize and refactor warnings * fix(utils): fix import order and console usage * chore: reorder imports for consistency * chore(tests): run tests in parallel * fix(client): correctly parse binary response | stream * chore(tests): add tests for httpx client instantiation & proxies * chore(internal): update conftest.py * chore(ci): enable for pull requests * chore(readme): update badges * fix(tests): fix: tests which call HTTP endpoints directly with the example parameters * chore(internal): version bump * docs(client): fix httpx.Timeout documentation reference * feat(client): add support for aiohttp * feat(client): add support for aiohttp * chore(tests): skip some failing tests on the latest python versions * fix(ci): release-doctor — report correct token name * chore(ci): only run for pushes and fork pull requests * fix(ci): correct conditional * chore(ci): change upload type * chore: fix version * chore: fix deps * feat(api): create organization methods * feat(api): correct the organization structure * feat: add organizations resource access (#343) * feat(examples): add genai semantic conventions example (#345) * feat(experiment): return experiment URL in result (#344) * fix(parsing): ignore empty metadata * feat(init): warn on config changes in multiple init() (#347) * fix(pipeline): improve async validation scheduling (#348) * fix(parsing): parse extra field types * fix(eval_dataset): simplify test case conversion logic (#349) * feat(eval_dataset): pass full test case to interaction * feat(eval_dataset): clean API, always pass TestCase obj * fix(eval): update test case fields and typing * fix(eval): remove local dataset eval and clean up code * feat(eval): remove dict support for test cases * fix(eval_dataset): simplify test case conversion logic * fix: remove unnecessary type ignore comment * feat(eval): unify TestCase input and model usage * fix(eval_dataset): update default dataset and pipeline ids * fix(examples): remove arg, add type ignore comment * fix(eval_dataset): inline test case conversion logic * feat(eval): add generic type to TestInput inputs * chore: Clean up docstring in eval example * fix(eval): improve typing for TestInput inputs * fix(eval_dataset): enforce TestCase type for input * chore: remove unused decorator import and usage * feat(eval): Support TypedDict for input schema validation * fix: Print URL only if result exists * feat(eval): Add pydantic schema validation support * Revert "fix(pipeline): improve async validation scheduling (#348)" This reverts commit a9abf88. * chore: get tests to work with hotfix (#351) * chore: set asyncio loop scope to function * chore(tests): change fixture scope to function * fix(pipeline): improve async validation scheduling (#348) (#352) * release: 1.0.1 --------- Co-authored-by: stainless-app[bot] <142633134+stainless-app[bot]@users.noreply.github.com> Co-authored-by: Vivek Nair <vivek@gentrace.ai> Co-authored-by: meorphis <eric@stainless.com>

Vivek Nair added 6 commits July 21, 2025 15:10

feat(eval_dataset): pass full test case to interaction

413a796

feat(eval_dataset): clean API, always pass TestCase obj

ce49e25

fix(eval): update test case fields and typing

b335fd4

fix(eval): remove local dataset eval and clean up code

71e4419

feat(eval): remove dict support for test cases

d90e344

fix(eval_dataset): simplify test case conversion logic

c991f0a

fix: remove unnecessary type ignore comment

a9d144a

viveknair marked this pull request as ready for review July 22, 2025 00:41

Vivek Nair added 2 commits July 21, 2025 20:50

feat(eval): unify TestCase input and model usage

3cdbd2a

fix(eval_dataset): update default dataset and pipeline ids

057bae9

viveknair requested a review from dsafreno July 22, 2025 00:59

Vivek Nair added 8 commits July 22, 2025 10:00

fix(examples): remove arg, add type ignore comment

5f006d7

fix(eval_dataset): inline test case conversion logic

e8b48bc

feat(eval): add generic type to TestInput inputs

caf4657

chore: Clean up docstring in eval example

4f16e62

fix(eval): improve typing for TestInput inputs

0d7b049

fix(eval_dataset): enforce TestCase type for input

66f1ed2

chore: remove unused decorator import and usage

fcd467d

feat(eval): Support TypedDict for input schema validation

6d6200d

graphite-app bot reviewed Jul 22, 2025

View reviewed changes

dsafreno approved these changes Jul 22, 2025

View reviewed changes

Vivek Nair added 2 commits July 23, 2025 09:41

fix: Print URL only if result exists

e7df82d

feat(eval): Add pydantic schema validation support

0121b47

viveknair merged commit 7b7a451 into next Jul 23, 2025
9 checks passed

stainless-app bot mentioned this pull request Jul 23, 2025

release: 1.0.1 #342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval_dataset): simplify test case conversion logic#349

fix(eval_dataset): simplify test case conversion logic#349
viveknair merged 19 commits intonextfrom
vivek/gt-1841-accept-test-cases-for-eval-dataset-in-interaction

viveknair commented Jul 22, 2025 •

edited

Loading

Uh oh!

linear bot commented Jul 22, 2025

Uh oh!

graphite-app bot Jul 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	print(f"Experiment URL: {result.url}")
	if result:
	print(f"Experiment URL: {result.url}")

Conversation

viveknair commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

linear bot commented Jul 22, 2025

Uh oh!

graphite-app bot Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

viveknair commented Jul 22, 2025 •

edited

Loading