Skip to content

fix(experiments): flatten propagated metadata#1641

Merged
hassiebp merged 1 commit intomainfrom
codex/flatten-experiment-metadata-propagation
Apr 24, 2026
Merged

fix(experiments): flatten propagated metadata#1641
hassiebp merged 1 commit intomainfrom
codex/flatten-experiment-metadata-propagation

Conversation

@hassiebp
Copy link
Copy Markdown
Contributor

@hassiebp hassiebp commented Apr 24, 2026

Summary

  • Flatten experiment metadata and experiment item metadata before propagating it to child spans.
  • Validate each flattened metadata value independently against the existing propagated-attribute length limit.
  • Update propagation tests and add an e2e guard for server-side handling of flattened experiment metadata on non-SDK child spans.

Root Cause

run_experiment() serialized the entire experiment metadata dict into one langfuse.experiment.metadata string before propagation. The propagation layer enforces a 200-character limit per propagated string, so moderately sized metadata dicts were dropped even when each individual metadata value was short enough.

Impact

Experiment metadata now propagates as attributes like langfuse.experiment.metadata.job_name, matching the per-key behavior used for trace metadata. This avoids dropping valid metadata just because the combined serialized object is larger than 200 characters.

Validation

  • uv run --frozen ruff check langfuse/_client/attributes.py langfuse/_client/client.py langfuse/_client/propagation.py tests/unit/test_propagate_attributes.py tests/e2e/test_experiments.py
  • uv run --frozen mypy langfuse --no-error-summary
  • uv run --frozen pytest tests/unit/test_propagate_attributes.py -q
  • Commit hook also ran ruff-check, ruff-format, and mypy successfully.

Notes

The new e2e test is intentionally a server-ingestion guard. In this local environment I could not run it end-to-end because no Langfuse server or LANGFUSE_* e2e credentials were available. The adjacent server checkout currently appears to parse only the legacy JSON langfuse.experiment.metadata attribute, so this SDK PR should stay draft until matching server ingestion support for langfuse.experiment.metadata.<key> is available in the CI server image.

Disclaimer: Experimental PR review

Greptile Summary

Replaces the single-blob serialization of experiment_metadata / experiment_item_metadata with per-key flattening (langfuse.experiment.metadata.<key>), matching the pattern already used for trace metadata and allowing each value to be validated independently against the 200-char propagation limit. The propagation, context-read, and baggage-decode paths are all updated consistently, and the _get_propagated_span_key map now correctly includes "metadata" so it is no longer misresolved through the fallback.

Confidence Score: 4/5

Safe to merge; only P2 findings in tests (redundant import, stale assertion pattern).

Core logic changes across attributes.py, propagation.py, and client.py are coherent and well-tested. The two P2 issues are confined to the test file and do not affect production behavior.

tests/unit/test_propagate_attributes.py — redundant import and stale warning-text assertion.

Important Files Changed

Filename Overview
langfuse/_client/attributes.py Adds _flatten_and_serialize_metadata_values to recursively flatten nested dicts into dot-notation string pairs; logic is correct and handles None/empty inputs safely.
langfuse/_client/client.py Switches experiment_metadata and experiment_item_metadata from a single serialized JSON string to a flattened dict; item_metadata correctly guarded with isinstance(..., dict) check.
langfuse/_client/propagation.py Correctly routes experiment metadata fields into the dict-based validation loop, fixes the hardcoded TRACE_METADATA span key in _get_propagated_attributes_from_context, and adds "metadata" to _get_propagated_span_key; baggage decoding handles multiple metadata namespaces without ambiguity.
tests/unit/test_propagate_attributes.py Tests updated to verify per-key propagation; contains a redundant local import (rule violation) and a warning-text assertion that targets the old format rather than the new per-key format.
tests/e2e/test_experiments.py New e2e server-ingestion guard for flattened experiment metadata; intentionally draft (no live server) per PR notes, and the server-side ingestion contract is described as still pending.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: tests/unit/test_propagate_attributes.py
Line: 2791

Comment:
**Redundant local import — `_serialize` is already at module level**

`_serialize` is already imported at the top of this file (line 15: `from langfuse._client.attributes import LangfuseOtelSpanAttributes, _serialize`). This local re-import is unnecessary and violates the team's rule to keep imports at the module level.

```suggestion
```

**Rule Used:** Move imports to the top of the module instead of p... ([source](https://app.greptile.com/review/custom-context?memory=c960fc07-9928-409f-a18b-a780cbdded12))

**Learned From**
[langfuse/langfuse-python#1387](https://github.com/langfuse/langfuse-python/pull/1387)

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: tests/unit/test_propagate_attributes.py
Line: 2904

Comment:
**Warning-text assertion matches the old format, not the new one**

With the new per-key validation, the warning key is `experiment_metadata.<subkey>` (e.g., `experiment_metadata.job_name`), so the warning text would read `"Propagated attribute 'experiment_metadata.job_name' value is over 200 characters"`. The substring being checked — `"experiment_metadata' value is over 200 characters"` — can never appear under the new code path, making the assertion effectively a no-op guard. In this specific test the individual values are ≤200 chars so no warning fires, but if a test is later added with an oversized individual value the assertion wouldn't catch a spurious warning.

Consider updating the assertion to match the new format, e.g.:
```python
assert "experiment_metadata." not in caplog.text or "value is over 200 characters" not in caplog.text
```
or checking that none of the new-format patterns appear in `caplog.text`.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(experiments): flatten propagated met..." | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

Context used:

  • Rule used - Move imports to the top of the module instead of p... (source)

Learned From
langfuse/langfuse-python#1387

@hassiebp hassiebp changed the title [codex] fix(experiments): flatten propagated metadata fix(experiments): flatten propagated metadata Apr 24, 2026
@hassiebp hassiebp marked this pull request as ready for review April 24, 2026 14:36
@github-actions
Copy link
Copy Markdown

@claude review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c94c5a69fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread langfuse/_client/client.py
Comment thread langfuse/_client/attributes.py
Comment thread tests/unit/test_propagate_attributes.py
@hassiebp hassiebp force-pushed the codex/flatten-experiment-metadata-propagation branch from c94c5a6 to 70892a3 Compare April 24, 2026 14:47
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70892a36f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/e2e/test_experiments.py
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferring to a human — per your own PR notes this should stay draft until the server accepts flattened langfuse.experiment.metadata.<key> attributes, and the rollout/legacy-compat angle raised by another reviewer is a cross-system call worth a human sign-off.

Extended reasoning...

Overview

The PR switches experiment metadata (experiment_metadata / experiment_item_metadata) from a single JSON-serialized attribute to per-key flattened attributes (langfuse.experiment.metadata.<key>), so each value is independently validated against the 200-char propagation limit. The change spans langfuse/_client/attributes.py (new _flatten_and_serialize_metadata_values helper), client.py (call-site swap in _process_experiment_item), and propagation.py (dict routing in _propagate_attributes, fixed span-key lookup in _get_propagated_attributes_from_context, and multi-namespace baggage decoding in _get_span_key_from_baggage_key). Tests are updated and a new e2e guard is added.

Security risks

No meaningful new security surface. Metadata keys/values are still length-validated per entry; baggage propagation (as_baggage=True) continues to be opt-in, and the new baggage-decode loop only adds additional known metadata namespaces — no user-controlled prefix expansion.

Level of scrutiny

Higher than a typical bug fix. This is a wire-format change in trace/experiment propagation that interacts with server-side ingestion. You explicitly noted the PR should stay draft until the server parses the flattened attribute, and a peer reviewer raised a P1 about mixed-version deployments dropping legacy langfuse.experiment.metadata. Those are exactly the kinds of cross-system decisions (dual-write vs. gate on server rollout) that warrant a human call rather than a shadow approval.

Other factors

The per-file logic looks internally consistent: _get_propagated_span_key now includes "metadata" (fixing the previous fallback path), and _set_propagated_attribute uses the resolved span_key instead of hardcoded TRACE_METADATA. The one nit worth flagging separately from the ingestion question: in tests/unit/test_propagate_attributes.py the substring "experiment_metadata' value is over 200 characters" can no longer appear under the new per-key warning format (experiment_metadata.<subkey>' value is over 200 characters), so that assertion is effectively a no-op — greptile already flagged this. Outstanding P1/P2 comments from other reviewers have not yet been addressed, which is another reason to hold off on auto-approval.

@hassiebp hassiebp merged commit 9244e14 into main Apr 24, 2026
18 checks passed
@hassiebp hassiebp deleted the codex/flatten-experiment-metadata-propagation branch April 24, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant