Skip to content

Add experimental datasets package#1711

Merged
dmontagu merged 22 commits intomainfrom
dmontague/managed-datasets
Feb 19, 2026
Merged

Add experimental datasets package#1711
dmontagu merged 22 commits intomainfrom
dmontague/managed-datasets

Conversation

@dmontagu
Copy link
Contributor

Summary

  • Add tags: list[str] | None = None parameter to all 12 SDK client methods (6 sync + 6 async): create_case, update_case, list_cases, add_case, add_cases, and import_cases
  • Restructure datasets documentation from a single datasets.md page into 5 subpages: overview, web UI guide, SDK guide, running evaluations, and SDK reference
  • Add experimental feature flag admonition to each docs page
  • Add collapsible SDK equivalent examples in the web UI docs

Test plan

  • Verify SDK import: python -c "from logfire.experimental.datasets import LogfireDatasetsClient"
  • Verify docs build: mkdocs build with no warnings
  • Test tags parameter works end-to-end against a running backend

Add SDK support for managed datasets, which allow building and
maintaining collections of typed test cases for evaluating AI systems.

- DatasetsClient with full CRUD for datasets and cases
- Typed schema support via Pydantic models / JSON Schema
- Export to pydantic-evals Dataset objects for running evaluations
- Documentation for UI and SDK workflows
SDK changes:
- Add `tags: list[str] | None = None` param to 6 methods on both
  sync and async clients: create_case, update_case, list_cases,
  add_case, add_cases, import_cases
- list_cases passes tags as query params for server-side filtering
- add_cases/import_cases merge tags into each serialized case

Docs restructure:
- Split single datasets.md into 5 subpages: overview, web UI guide,
  SDK guide, evaluations, and SDK reference
- Add experimental feature flag warning admonition to all pages
- Add SDK code snippets in the web UI guide via collapsible examples
- Cross-link between pages for discoverability
- Add redirect from old URL to new index page
- Document new tags parameter across all relevant SDK examples
@dmontagu dmontagu self-assigned this Feb 14, 2026
@dmontagu dmontagu requested a review from alexmojaki February 14, 2026 03:48
devin-ai-integration[bot]

This comment was marked as resolved.

All datasets code examples make API calls to an external Logfire
server that isn't available during testing. Mark self-contained
examples with skip-run and non-self-contained fragments with skip
to prevent pytest-examples failures.
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 14, 2026

Deploying logfire-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9aa7a0d
Status: ✅  Deploy successful!
Preview URL: https://aa5d8d03.logfire-docs.pages.dev
Branch Preview URL: https://dmontague-managed-datasets.logfire-docs.pages.dev

View logs

devin-ai-integration[bot]

This comment was marked as resolved.

- Fix CaseNotFoundError never being raised (all 404s were mapped to
  DatasetNotFoundError). Case endpoints now pass is_case_endpoint=True
  to _handle_response which checks the response text.
- Fix dead code: `if arguments and len(arguments) == 0` was always
  False. Changed to `if arguments is not None and len(arguments) == 0`.
- Use _UNSET sentinel for nullable fields in update_dataset and
  update_case so users can explicitly pass None to clear fields.
- URL-encode path segments to prevent issues with special characters
  in dataset/case names.
devin-ai-integration[bot]

This comment was marked as resolved.

Fix import_cases to shallow-copy caller's dicts before adding tags,
preventing unexpected mutation. Add comprehensive test suite achieving
100% statement and branch coverage for the datasets SDK client.
devin-ai-integration[bot]

This comment was marked as resolved.

Use pytest.importorskip to skip the test module on older pydantic
versions where pydantic_evals cannot be imported.
devin-ai-integration[bot]

This comment was marked as resolved.

Pin pydantic-evals>=1.0.0 with a python_version>='3.10' marker in
both the datasets extra and dev dependencies. On Python 3.9, pydantic-evals
is not installed and the test module is skipped via pytest.importorskip.

Add a note to the datasets SDK docs about the Python 3.10+ requirement.
pytest.importorskip only catches ImportError, but pydantic_evals
raises AttributeError on Pydantic 2.4 (missing pydantic.Tag). Use
try/except with pytest.skip(allow_module_level=True) to handle both.
devin-ai-integration[bot]

This comment was marked as resolved.

… classification

- Wrap response.json() in try/except to handle non-JSON error responses (e.g. HTML from load balancers)
- Check parsed detail dict's 'detail' field instead of raw response.text to avoid misclassifying 404s when dataset names contain 'case'
@alexmojaki alexmojaki changed the title Add tags to datasets SDK client + restructure docs Add experimental datasets package Feb 14, 2026
@alexmojaki alexmojaki requested a review from Copilot February 14, 2026 09:21

This comment was marked as resolved.

- `_BaseLogfireDatasetsClient.__init__` now accepts a `client: T` instance
  instead of a client type + kwargs
- `LogfireDatasetsClient` and `AsyncLogfireDatasetsClient` accept an optional
  `client=` keyword arg, removing the need for `**client_kwargs: Any`
- Fix test helpers to pass mock-transport clients directly (no resource leak)
- Fix `await` in non-async docstring examples
- Add `httpx>=0.27.2` lower bound to datasets extra
@dmontagu
Copy link
Contributor Author

Probably should have a method like logfire.datasets_client() to get the LogfireDatasetsClient.

Maybe it should be logfire.api_client() or logfire.client().datasets to be more general...

@alexmojaki we should discuss on Monday

@alexmojaki
Copy link
Contributor

Probably should have a method like logfire.datasets_client() to get the LogfireDatasetsClient.

Maybe it should be logfire.api_client() or logfire.client().datasets to be more general...

@alexmojaki we should discuss on Monday

Seems like this can always be added later, the way it currently works is nicely tucked away in experimental.



def _quote(value: str) -> str:
"""URL-encode a path segment to prevent path traversal with special characters."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems bad, like we should either be using a URL param or restricting characters allowed in dataset names

Copy link
Contributor Author

@dmontagu dmontagu Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — went with restricting characters. Removed _quote() and added client-side name validation via regex (^[a-zA-Z0-9][a-zA-Z0-9._-]*$). Also opened a platform PR (pydantic/platform#17528) to add the same regex as a Pydantic Field(pattern=...) constraint on the backend. Leaving this unresolved so you can comment if you don't like this solution.

}


def make_mock_transport(responses: dict[tuple[str, str], httpx.Response | None] | None = None) -> httpx.MockTransport:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer vcr if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted — deferred to a follow-up. Documented in plans/datasets-future-work.md under "VCR Tests".

| `create_case(dataset_id_or_name, inputs, *, name, expected_output, metadata, evaluators, source_trace_id, source_span_id, tags)` | Create a case from raw values. |
| `update_case(dataset_id_or_name, case_id, *, name, inputs, expected_output, metadata, evaluators, tags)` | Update an existing case. |
| `delete_case(dataset_id_or_name, case_id)` | Delete a case. |
| `export_dataset(id_or_name, input_type, output_type, metadata_type)` | Export as a typed `pydantic_evals.Dataset`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are identical

Image

Copy link
Contributor Author

@dmontagu dmontagu Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in pydantic/platform#17528 — export now supports a format query param (json vs pydantic-evals), and import handles the pydantic-evals {name, cases, evaluators} format.

- Rename LogfireDatasetsClient → LogfireAPIClient (and async variant)
- Remove add_case, create_case, import_cases; consolidate into add_cases
- add_cases now accepts both Case objects and plain dicts
- Fix update_dataset name type (str, not str | None)
- Add pydantic>=2 to datasets extra
- Replace _quote() with client-side name validation
- Update all docs to reflect API changes
- Add plans/datasets-future-work.md for deferred items
- Update add_cases (sync + async) to use /import/ endpoint with
  on_conflict='update' by default for upsert behavior
- Add on_conflict parameter ('update' or 'error') to add_cases
- Fix "Managed datasets is" → "Managed datasets are" in 5 docs files
- Fix await-in-non-async docstring examples
- Update SDK reference docs for new on_conflict parameter
- Update test mock transport for new import endpoint
devin-ai-integration[bot]

This comment was marked as resolved.

Resolve uv.lock conflict by regenerating lockfile.
Line 90 (the ValueError raise) was uncovered.
id_or_name: str,
*,
name: str = _UNSET,
input_type: type[Any] | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it weird that all the schemas are optional, especially the input schema, considering inputs themselves are required

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No schema is equivalent to {} (accept anything), so it doesn't make a practical difference — inputs are still required, they just aren't validated against a schema. We could require it, but I think it's fine as-is.

*,
name: str = _UNSET,
input_type: type[Any] | None = None,
output_type: type[Any] | None = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can update a schema to something that no longer matches the cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed this should be resolved at some point, but I don't want it to block the initial release. We can add backend validation that checks existing cases against the new schema as a follow-up.

Resolve mkdocs.yml conflict: adopt new nav structure from main,
add datasets under Evaluate section. Move dataset docs from
guides/web-ui/datasets/ to evaluate/datasets/ to align with
planned file restructure. Update internal cross-references.
…client

The client is intended to grow beyond datasets (e.g. variables APIs),
so placing it under datasets felt wrong. Re-exports from the datasets
subpackage are kept for convenience.
devin-ai-integration[bot]

This comment was marked as resolved.

Changed `if case.evaluators:` to `if case.evaluators is not None:` so
that an explicitly empty evaluators list is preserved as `[]` in the
serialized output, ensuring evaluators are properly cleared during
upsert operations.
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 16 additional findings in Devin Review.

Open in Devin Review

Comment on lines +196 to +197
if case.evaluators is not None: # pyright: ignore[reportUnnecessaryComparison]
data['evaluators'] = _serialize_evaluators(case.evaluators)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 _serialize_case always includes evaluators: [] but dicts may omit the key

When add_cases receives pydantic-evals Case objects, _serialize_case at logfire/experimental/api_client.py:196 always includes evaluators in the serialized dict (because Case.evaluators defaults to [], not None). However, when add_cases receives plain dicts, the evaluators key is only present if the caller included it. This means the API receives different payloads depending on whether you pass Case objects vs equivalent dicts — one always has "evaluators": [], the other may omit it entirely. This is unlikely to cause issues if the API treats missing and empty evaluators identically, but it's an asymmetry worth being aware of.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

- Add test for None evaluators branch in _serialize_case
- Add test verifying datasets module re-exports from api_client
@dmontagu dmontagu merged commit 48a6757 into main Feb 19, 2026
15 checks passed
@dmontagu dmontagu deleted the dmontague/managed-datasets branch February 19, 2026 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants