Skip to content

fix: backport #254 — auto-infer is_evaluation when dataset_id, datapoint_id, and run_id are all defined#267

Merged
skylarmb merged 3 commits into
legacy-sdkfrom
devin/1771992997-backport-pr-254
Feb 25, 2026
Merged

fix: backport #254 — auto-infer is_evaluation when dataset_id, datapoint_id, and run_id are all defined#267
skylarmb merged 3 commits into
legacy-sdkfrom
devin/1771992997-backport-pr-254

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented Feb 25, 2026

Summary

Backport of #254 from federated-sdk-release-candidate to legacy-sdk.

When users pass dataset_id, datapoint_id, and run_id to the tracer but forget is_evaluation=True, evaluation context (baggage propagation, session metadata, source defaulting to "evaluation") is silently skipped. This causes session lifecycle issues where sessions are overwritten between experiment and tracer instances.

This adds inference logic in _initialize_core_attributes() that auto-sets is_evaluation=True when all three evaluation identifiers are present. Both self.is_evaluation and config["is_evaluation"] are updated together so that _setup_evaluation_context_dynamically(config) receives the correct value downstream.

Review & Testing Checklist for Human

  • Explicit is_evaluation=False override: The check uses not self.is_evaluation, which cannot distinguish between the Pydantic default False and an explicit is_evaluation=False from the user. If a user passes all three IDs alongside explicit is_evaluation=False, inference will override it to True. Confirm this edge case is acceptable on legacy-sdk.
  • Legacy-sdk code path parity: Verify that _setup_evaluation_context_dynamically(config) on legacy-sdk reads config["is_evaluation"] the same way as on federated-sdk-release-candidate, so the config sync fix is actually needed and correct here.
  • No tests included: Consider adding a unit test that initializes the tracer with all three IDs (without is_evaluation=True) and asserts evaluation mode is inferred.

Notes


Open with Devin

…id are all defined

When all three evaluation identifiers (run_id, dataset_id, datapoint_id) are
present, automatically set is_evaluation=True so users don't need to pass it
explicitly. This prevents session lifecycle issues where evaluation context
was not propagated because the flag was missing.

Co-Authored-By: unknown <>
Co-Authored-By: unknown <>
Ensures _setup_evaluation_context_dynamically reads the correct
is_evaluation value from config after auto-inference.

Co-Authored-By: unknown <>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented Feb 25, 2026

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

Open in Devin Review

@skylarmb
Copy link
Copy Markdown
Contributor

integration tests are already in broken state on base branch, ignoring the failure here...

@skylarmb skylarmb merged commit 4213ae9 into legacy-sdk Feb 25, 2026
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant