Add experimental datasets package by dmontagu · Pull Request #1711 · pydantic/logfire

dmontagu · 2026-02-14T03:48:45Z

Summary

Add tags: list[str] | None = None parameter to all 12 SDK client methods (6 sync + 6 async): create_case, update_case, list_cases, add_case, add_cases, and import_cases
Restructure datasets documentation from a single datasets.md page into 5 subpages: overview, web UI guide, SDK guide, running evaluations, and SDK reference
Add experimental feature flag admonition to each docs page
Add collapsible SDK equivalent examples in the web UI docs

Test plan

Verify SDK import: python -c "from logfire.experimental.datasets import LogfireDatasetsClient"
Verify docs build: mkdocs build with no warnings
Test tags parameter works end-to-end against a running backend

Add SDK support for managed datasets, which allow building and maintaining collections of typed test cases for evaluating AI systems. - DatasetsClient with full CRUD for datasets and cases - Typed schema support via Pydantic models / JSON Schema - Export to pydantic-evals Dataset objects for running evaluations - Documentation for UI and SDK workflows

SDK changes: - Add `tags: list[str] | None = None` param to 6 methods on both sync and async clients: create_case, update_case, list_cases, add_case, add_cases, import_cases - list_cases passes tags as query params for server-side filtering - add_cases/import_cases merge tags into each serialized case Docs restructure: - Split single datasets.md into 5 subpages: overview, web UI guide, SDK guide, evaluations, and SDK reference - Add experimental feature flag warning admonition to all pages - Add SDK code snippets in the web UI guide via collapsible examples - Cross-link between pages for discoverability - Add redirect from old URL to new index page - Document new tags parameter across all relevant SDK examples

All datasets code examples make API calls to an external Logfire server that isn't available during testing. Mark self-contained examples with skip-run and non-self-contained fragments with skip to prevent pytest-examples failures.

cloudflare-workers-and-pages · 2026-02-14T04:22:16Z

Deploying logfire-docs with Cloudflare Pages

Latest commit:	`9aa7a0d`
Status:	✅ Deploy successful!
Preview URL:	https://aa5d8d03.logfire-docs.pages.dev
Branch Preview URL:	https://dmontague-managed-datasets.logfire-docs.pages.dev

View logs

- Fix CaseNotFoundError never being raised (all 404s were mapped to DatasetNotFoundError). Case endpoints now pass is_case_endpoint=True to _handle_response which checks the response text. - Fix dead code: `if arguments and len(arguments) == 0` was always False. Changed to `if arguments is not None and len(arguments) == 0`. - Use _UNSET sentinel for nullable fields in update_dataset and update_case so users can explicitly pass None to clear fields. - URL-encode path segments to prevent issues with special characters in dataset/case names.

Fix import_cases to shallow-copy caller's dicts before adding tags, preventing unexpected mutation. Add comprehensive test suite achieving 100% statement and branch coverage for the datasets SDK client.

Use pytest.importorskip to skip the test module on older pydantic versions where pydantic_evals cannot be imported.

Pin pydantic-evals>=1.0.0 with a python_version>='3.10' marker in both the datasets extra and dev dependencies. On Python 3.9, pydantic-evals is not installed and the test module is skipped via pytest.importorskip. Add a note to the datasets SDK docs about the Python 3.10+ requirement.

pytest.importorskip only catches ImportError, but pydantic_evals raises AttributeError on Pydantic 2.4 (missing pydantic.Tag). Use try/except with pytest.skip(allow_module_level=True) to handle both.

… classification - Wrap response.json() in try/except to handle non-JSON error responses (e.g. HTML from load balancers) - Check parsed detail dict's 'detail' field instead of raw response.text to avoid misclassifying 404s when dataset names contain 'case'

- `_BaseLogfireDatasetsClient.__init__` now accepts a `client: T` instance instead of a client type + kwargs - `LogfireDatasetsClient` and `AsyncLogfireDatasetsClient` accept an optional `client=` keyword arg, removing the need for `**client_kwargs: Any` - Fix test helpers to pass mock-transport clients directly (no resource leak) - Fix `await` in non-async docstring examples - Add `httpx>=0.27.2` lower bound to datasets extra

dmontagu · 2026-02-15T07:37:39Z

Probably should have a method like logfire.datasets_client() to get the LogfireDatasetsClient.

Maybe it should be logfire.api_client() or logfire.client().datasets to be more general...

@alexmojaki we should discuss on Monday

mkdocs.yml

pyproject.toml

alexmojaki · 2026-02-16T11:31:10Z

Probably should have a method like logfire.datasets_client() to get the LogfireDatasetsClient.

Maybe it should be logfire.api_client() or logfire.client().datasets to be more general...

@alexmojaki we should discuss on Monday

Seems like this can always be added later, the way it currently works is nicely tucked away in experimental.

docs/guides/web-ui/datasets/sdk.md

docs/guides/web-ui/datasets/ui.md

alexmojaki · 2026-02-16T11:52:18Z

logfire/experimental/datasets/client.py

+
+
+def _quote(value: str) -> str:
+    """URL-encode a path segment to prevent path traversal with special characters."""


this seems bad, like we should either be using a URL param or restricting characters allowed in dataset names

Fixed — went with restricting characters. Removed _quote() and added client-side name validation via regex (^[a-zA-Z0-9][a-zA-Z0-9._-]*$). Also opened a platform PR (pydantic/platform#17528) to add the same regex as a Pydantic Field(pattern=...) constraint on the backend. Leaving this unresolved so you can comment if you don't like this solution.

logfire/experimental/datasets/client.py

alexmojaki · 2026-02-16T11:57:47Z

tests/test_datasets_client.py

+}
+
+
+def make_mock_transport(responses: dict[tuple[str, str], httpx.Response | None] | None = None) -> httpx.MockTransport:


prefer vcr if possible

Noted — deferred to a follow-up. Documented in plans/datasets-future-work.md under "VCR Tests".

logfire/experimental/api_client.py

docs/guides/web-ui/datasets/sdk.md

docs/evaluate/datasets/ui.md

docs/evaluate/datasets/sdk-reference.md

alexmojaki · 2026-02-16T12:27:45Z

docs/evaluate/datasets/sdk-reference.md

+| `create_case(dataset_id_or_name, inputs, *, name, expected_output, metadata, evaluators, source_trace_id, source_span_id, tags)` | Create a case from raw values. |
+| `update_case(dataset_id_or_name, case_id, *, name, inputs, expected_output, metadata, evaluators, tags)` | Update an existing case. |
+| `delete_case(dataset_id_or_name, case_id)` | Delete a case. |
+| `export_dataset(id_or_name, input_type, output_type, metadata_type)` | Export as a typed `pydantic_evals.Dataset`. |


These files are identical

Addressed in pydantic/platform#17528 — export now supports a format query param (json vs pydantic-evals), and import handles the pydantic-evals {name, cases, evaluators} format.

docs/evaluate/datasets/sdk.md

logfire/experimental/datasets/client.py

docs/guides/web-ui/datasets/sdk-reference.md

- Rename LogfireDatasetsClient → LogfireAPIClient (and async variant) - Remove add_case, create_case, import_cases; consolidate into add_cases - add_cases now accepts both Case objects and plain dicts - Fix update_dataset name type (str, not str | None) - Add pydantic>=2 to datasets extra - Replace _quote() with client-side name validation - Update all docs to reflect API changes - Add plans/datasets-future-work.md for deferred items

- Update add_cases (sync + async) to use /import/ endpoint with on_conflict='update' by default for upsert behavior - Add on_conflict parameter ('update' or 'error') to add_cases - Fix "Managed datasets is" → "Managed datasets are" in 5 docs files - Fix await-in-non-async docstring examples - Update SDK reference docs for new on_conflict parameter - Update test mock transport for new import endpoint

Resolve uv.lock conflict by regenerating lockfile.

Line 90 (the ValueError raise) was uncovered.

logfire/experimental/datasets/client.py

alexmojaki · 2026-02-17T11:54:54Z

logfire/experimental/api_client.py

+        id_or_name: str,
+        *,
+        name: str = _UNSET,
+        input_type: type[Any] | None = None,


I find it weird that all the schemas are optional, especially the input schema, considering inputs themselves are required

No schema is equivalent to {} (accept anything), so it doesn't make a practical difference — inputs are still required, they just aren't validated against a schema. We could require it, but I think it's fine as-is.

alexmojaki · 2026-02-17T11:55:20Z

logfire/experimental/api_client.py

+        *,
+        name: str = _UNSET,
+        input_type: type[Any] | None = None,
+        output_type: type[Any] | None = None,


i can update a schema to something that no longer matches the cases

Agreed this should be resolved at some point, but I don't want it to block the initial release. We can add backend validation that checks existing cases against the new schema as a follow-up.

logfire/experimental/api_client.py

Resolve mkdocs.yml conflict: adopt new nav structure from main, add datasets under Evaluate section. Move dataset docs from guides/web-ui/datasets/ to evaluate/datasets/ to align with planned file restructure. Update internal cross-references.

…client The client is intended to grow beyond datasets (e.g. variables APIs), so placing it under datasets felt wrong. Re-exports from the datasets subpackage are kept for convenience.

Changed `if case.evaluators:` to `if case.evaluators is not None:` so that an explicitly empty evaluators list is preserved as `[]` in the serialized output, ensuring evaluators are properly cleared during upsert operations.

devin-ai-integration

Devin Review found 1 new potential issue.

View 16 additional findings in Devin Review.

devin-ai-integration · 2026-02-19T02:03:58Z

logfire/experimental/api_client.py

+    if case.evaluators is not None:  # pyright: ignore[reportUnnecessaryComparison]
+        data['evaluators'] = _serialize_evaluators(case.evaluators)


🚩 _serialize_case always includes evaluators: [] but dicts may omit the key

When add_cases receives pydantic-evals Case objects, _serialize_case at logfire/experimental/api_client.py:196 always includes evaluators in the serialized dict (because Case.evaluators defaults to [], not None). However, when add_cases receives plain dicts, the evaluators key is only present if the caller included it. This means the API receives different payloads depending on whether you pass Case objects vs equivalent dicts — one always has "evaluators": [], the other may omit it entirely. This is unlikely to cause issues if the API treats missing and empty evaluators identically, but it's an asymmetry worth being aware of.

Was this helpful? React with 👍 or 👎 to provide feedback.

- Add test for None evaluators branch in _serialize_case - Add test verifying datasets module re-exports from api_client

dmontagu added 2 commits February 13, 2026 14:16

dmontagu self-assigned this Feb 14, 2026

dmontagu requested a review from alexmojaki February 14, 2026 03:48

This comment was marked as resolved.

Sign in to view

Add skip-run/skip tags to datasets doc examples

4760f1e

All datasets code examples make API calls to an external Logfire server that isn't available during testing. Mark self-contained examples with skip-run and non-self-contained fragments with skip to prevent pytest-examples failures.

This comment was marked as resolved.

Sign in to view

Fix import_cases dict mutation and add datasets client tests

66fbaf3

Fix import_cases to shallow-copy caller's dicts before adding tags, preventing unexpected mutation. Add comprehensive test suite achieving 100% statement and branch coverage for the datasets SDK client.

This comment was marked as resolved.

Sign in to view

Skip datasets client tests when pydantic_evals is unavailable

f2986dc

Use pytest.importorskip to skip the test module on older pydantic versions where pydantic_evals cannot be imported.

This comment was marked as resolved.

Sign in to view

dmontagu added 2 commits February 13, 2026 22:31

Handle AttributeError when importing pydantic_evals on older pydantic

a6b65dd

pytest.importorskip only catches ImportError, but pydantic_evals raises AttributeError on Pydantic 2.4 (missing pydantic.Tag). Use try/except with pytest.skip(allow_module_level=True) to handle both.

This comment was marked as resolved.

Sign in to view

dmontagu added 2 commits February 13, 2026 22:48

Add tests for non-JSON error body handling in _handle_response

e504005

alexmojaki changed the title ~~Add tags to datasets SDK client + restructure docs~~ Add experimental datasets package Feb 14, 2026

alexmojaki requested a review from Copilot February 14, 2026 09:21

Copilot started reviewing on behalf of alexmojaki February 14, 2026 09:21 View session

This comment was marked as resolved.

Sign in to view

alexmojaki reviewed Feb 16, 2026

View reviewed changes

mkdocs.yml Outdated Show resolved Hide resolved

Apply suggestion from @alexmojaki

ba89a6c

alexmojaki reviewed Feb 16, 2026

View reviewed changes

mkdocs.yml Outdated Show resolved Hide resolved

Apply suggestion from @alexmojaki

196f90c

alexmojaki reviewed Feb 16, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

alexmojaki reviewed Feb 16, 2026

View reviewed changes

logfire/experimental/datasets/client.py Outdated Show resolved Hide resolved

alexmojaki reviewed Feb 16, 2026

View reviewed changes

logfire/experimental/api_client.py Show resolved Hide resolved

alexmojaki reviewed Feb 16, 2026

View reviewed changes

docs/guides/web-ui/datasets/sdk-reference.md Outdated Show resolved Hide resolved

dmontagu added 2 commits February 16, 2026 17:52

This comment was marked as resolved.

Sign in to view

dmontagu added 2 commits February 17, 2026 01:32

Merge main into dmontague/managed-datasets

6464732

Resolve uv.lock conflict by regenerating lockfile.

Add test for _validate_dataset_name to fix coverage

ea9e7b5

Line 90 (the ValueError raise) was uncovered.

alexmojaki reviewed Feb 17, 2026

View reviewed changes

logfire/experimental/api_client.py Show resolved Hide resolved

dmontagu added 3 commits February 18, 2026 02:00

Use PATCH instead of PUT for update_dataset and update_case endpoints

9461352

Merge main into dmontague/managed-datasets

0c7b263

Resolve mkdocs.yml conflict: adopt new nav structure from main, add datasets under Evaluate section. Move dataset docs from guides/web-ui/datasets/ to evaluate/datasets/ to align with planned file restructure. Update internal cross-references.

Move LogfireAPIClient from experimental.datasets to experimental.api_…

43da186

…client The client is intended to grow beyond datasets (e.g. variables APIs), so placing it under datasets felt wrong. Re-exports from the datasets subpackage are kept for convenience.

This comment was marked as resolved.

Sign in to view

Fix _serialize_case dropping empty evaluators list on upsert

6ec2193

Changed `if case.evaluators:` to `if case.evaluators is not None:` so that an explicitly empty evaluators list is preserved as `[]` in the serialized output, ensuring evaluators are properly cleared during upsert operations.

devin-ai-integration bot reviewed Feb 19, 2026

View reviewed changes

Fix test coverage for api_client and datasets module

9aa7a0d

- Add test for None evaluators branch in _serialize_case - Add test verifying datasets module re-exports from api_client

dmontagu merged commit 48a6757 into main Feb 19, 2026
15 checks passed

dmontagu deleted the dmontague/managed-datasets branch February 19, 2026 07:39



		def _quote(value: str) -> str:
		"""URL-encode a path segment to prevent path traversal with special characters."""

		}


		def make_mock_transport(responses: dict[tuple[str, str], httpx.Response \| None] \| None = None) -> httpx.MockTransport:

		if case.evaluators is not None: # pyright: ignore[reportUnnecessaryComparison]
		data['evaluators'] = _serialize_evaluators(case.evaluators)

Conversation

dmontagu commented Feb 14, 2026

Summary

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

cloudflare-workers-and-pages bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying logfire-docs with Cloudflare Pages

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

dmontagu commented Feb 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexmojaki commented Feb 16, 2026

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmontagu Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmontagu Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

cloudflare-workers-and-pages bot commented Feb 14, 2026 •

edited

Loading

dmontagu Feb 17, 2026 •

edited

Loading

dmontagu Feb 17, 2026 •

edited

Loading