Conversation
Add SDK support for managed datasets, which allow building and maintaining collections of typed test cases for evaluating AI systems. - DatasetsClient with full CRUD for datasets and cases - Typed schema support via Pydantic models / JSON Schema - Export to pydantic-evals Dataset objects for running evaluations - Documentation for UI and SDK workflows
SDK changes: - Add `tags: list[str] | None = None` param to 6 methods on both sync and async clients: create_case, update_case, list_cases, add_case, add_cases, import_cases - list_cases passes tags as query params for server-side filtering - add_cases/import_cases merge tags into each serialized case Docs restructure: - Split single datasets.md into 5 subpages: overview, web UI guide, SDK guide, evaluations, and SDK reference - Add experimental feature flag warning admonition to all pages - Add SDK code snippets in the web UI guide via collapsible examples - Cross-link between pages for discoverability - Add redirect from old URL to new index page - Document new tags parameter across all relevant SDK examples
All datasets code examples make API calls to an external Logfire server that isn't available during testing. Mark self-contained examples with skip-run and non-self-contained fragments with skip to prevent pytest-examples failures.
Deploying logfire-docs with
|
| Latest commit: |
9aa7a0d
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://aa5d8d03.logfire-docs.pages.dev |
| Branch Preview URL: | https://dmontague-managed-datasets.logfire-docs.pages.dev |
- Fix CaseNotFoundError never being raised (all 404s were mapped to DatasetNotFoundError). Case endpoints now pass is_case_endpoint=True to _handle_response which checks the response text. - Fix dead code: `if arguments and len(arguments) == 0` was always False. Changed to `if arguments is not None and len(arguments) == 0`. - Use _UNSET sentinel for nullable fields in update_dataset and update_case so users can explicitly pass None to clear fields. - URL-encode path segments to prevent issues with special characters in dataset/case names.
Fix import_cases to shallow-copy caller's dicts before adding tags, preventing unexpected mutation. Add comprehensive test suite achieving 100% statement and branch coverage for the datasets SDK client.
Use pytest.importorskip to skip the test module on older pydantic versions where pydantic_evals cannot be imported.
Pin pydantic-evals>=1.0.0 with a python_version>='3.10' marker in both the datasets extra and dev dependencies. On Python 3.9, pydantic-evals is not installed and the test module is skipped via pytest.importorskip. Add a note to the datasets SDK docs about the Python 3.10+ requirement.
pytest.importorskip only catches ImportError, but pydantic_evals raises AttributeError on Pydantic 2.4 (missing pydantic.Tag). Use try/except with pytest.skip(allow_module_level=True) to handle both.
… classification - Wrap response.json() in try/except to handle non-JSON error responses (e.g. HTML from load balancers) - Check parsed detail dict's 'detail' field instead of raw response.text to avoid misclassifying 404s when dataset names contain 'case'
- `_BaseLogfireDatasetsClient.__init__` now accepts a `client: T` instance instead of a client type + kwargs - `LogfireDatasetsClient` and `AsyncLogfireDatasetsClient` accept an optional `client=` keyword arg, removing the need for `**client_kwargs: Any` - Fix test helpers to pass mock-transport clients directly (no resource leak) - Fix `await` in non-async docstring examples - Add `httpx>=0.27.2` lower bound to datasets extra
|
Probably should have a method like Maybe it should be @alexmojaki we should discuss on Monday |
Seems like this can always be added later, the way it currently works is nicely tucked away in |
|
|
||
|
|
||
| def _quote(value: str) -> str: | ||
| """URL-encode a path segment to prevent path traversal with special characters.""" |
There was a problem hiding this comment.
this seems bad, like we should either be using a URL param or restricting characters allowed in dataset names
There was a problem hiding this comment.
Fixed — went with restricting characters. Removed _quote() and added client-side name validation via regex (^[a-zA-Z0-9][a-zA-Z0-9._-]*$). Also opened a platform PR (pydantic/platform#17528) to add the same regex as a Pydantic Field(pattern=...) constraint on the backend. Leaving this unresolved so you can comment if you don't like this solution.
| } | ||
|
|
||
|
|
||
| def make_mock_transport(responses: dict[tuple[str, str], httpx.Response | None] | None = None) -> httpx.MockTransport: |
There was a problem hiding this comment.
Noted — deferred to a follow-up. Documented in plans/datasets-future-work.md under "VCR Tests".
| | `create_case(dataset_id_or_name, inputs, *, name, expected_output, metadata, evaluators, source_trace_id, source_span_id, tags)` | Create a case from raw values. | | ||
| | `update_case(dataset_id_or_name, case_id, *, name, inputs, expected_output, metadata, evaluators, tags)` | Update an existing case. | | ||
| | `delete_case(dataset_id_or_name, case_id)` | Delete a case. | | ||
| | `export_dataset(id_or_name, input_type, output_type, metadata_type)` | Export as a typed `pydantic_evals.Dataset`. | |
There was a problem hiding this comment.
Addressed in pydantic/platform#17528 — export now supports a format query param (json vs pydantic-evals), and import handles the pydantic-evals {name, cases, evaluators} format.
- Rename LogfireDatasetsClient → LogfireAPIClient (and async variant) - Remove add_case, create_case, import_cases; consolidate into add_cases - add_cases now accepts both Case objects and plain dicts - Fix update_dataset name type (str, not str | None) - Add pydantic>=2 to datasets extra - Replace _quote() with client-side name validation - Update all docs to reflect API changes - Add plans/datasets-future-work.md for deferred items
- Update add_cases (sync + async) to use /import/ endpoint with
on_conflict='update' by default for upsert behavior
- Add on_conflict parameter ('update' or 'error') to add_cases
- Fix "Managed datasets is" → "Managed datasets are" in 5 docs files
- Fix await-in-non-async docstring examples
- Update SDK reference docs for new on_conflict parameter
- Update test mock transport for new import endpoint
Resolve uv.lock conflict by regenerating lockfile.
Line 90 (the ValueError raise) was uncovered.
| id_or_name: str, | ||
| *, | ||
| name: str = _UNSET, | ||
| input_type: type[Any] | None = None, |
There was a problem hiding this comment.
I find it weird that all the schemas are optional, especially the input schema, considering inputs themselves are required
There was a problem hiding this comment.
No schema is equivalent to {} (accept anything), so it doesn't make a practical difference — inputs are still required, they just aren't validated against a schema. We could require it, but I think it's fine as-is.
| *, | ||
| name: str = _UNSET, | ||
| input_type: type[Any] | None = None, | ||
| output_type: type[Any] | None = None, |
There was a problem hiding this comment.
i can update a schema to something that no longer matches the cases
There was a problem hiding this comment.
Agreed this should be resolved at some point, but I don't want it to block the initial release. We can add backend validation that checks existing cases against the new schema as a follow-up.
Resolve mkdocs.yml conflict: adopt new nav structure from main, add datasets under Evaluate section. Move dataset docs from guides/web-ui/datasets/ to evaluate/datasets/ to align with planned file restructure. Update internal cross-references.
…client The client is intended to grow beyond datasets (e.g. variables APIs), so placing it under datasets felt wrong. Re-exports from the datasets subpackage are kept for convenience.
Changed `if case.evaluators:` to `if case.evaluators is not None:` so that an explicitly empty evaluators list is preserved as `[]` in the serialized output, ensuring evaluators are properly cleared during upsert operations.
| if case.evaluators is not None: # pyright: ignore[reportUnnecessaryComparison] | ||
| data['evaluators'] = _serialize_evaluators(case.evaluators) |
There was a problem hiding this comment.
🚩 _serialize_case always includes evaluators: [] but dicts may omit the key
When add_cases receives pydantic-evals Case objects, _serialize_case at logfire/experimental/api_client.py:196 always includes evaluators in the serialized dict (because Case.evaluators defaults to [], not None). However, when add_cases receives plain dicts, the evaluators key is only present if the caller included it. This means the API receives different payloads depending on whether you pass Case objects vs equivalent dicts — one always has "evaluators": [], the other may omit it entirely. This is unlikely to cause issues if the API treats missing and empty evaluators identically, but it's an asymmetry worth being aware of.
Was this helpful? React with 👍 or 👎 to provide feedback.
- Add test for None evaluators branch in _serialize_case - Add test verifying datasets module re-exports from api_client

Summary
tags: list[str] | None = Noneparameter to all 12 SDK client methods (6 sync + 6 async):create_case,update_case,list_cases,add_case,add_cases, andimport_casesdatasets.mdpage into 5 subpages: overview, web UI guide, SDK guide, running evaluations, and SDK referenceTest plan
python -c "from logfire.experimental.datasets import LogfireDatasetsClient"mkdocs buildwith no warnings