Skip to content

fix: preserve histogram/exp_histogram temporality across msgpack round-trip#279

Open
truongnht wants to merge 1 commit into
fluent:masterfrom
truongnht:fix/histogram-temporality-msgpack
Open

fix: preserve histogram/exp_histogram temporality across msgpack round-trip#279
truongnht wants to merge 1 commit into
fluent:masterfrom
truongnht:fix/histogram-temporality-msgpack

Conversation

@truongnht

@truongnht truongnht commented Jul 2, 2026

Copy link
Copy Markdown

Resolves #278

Summary

pack_header() in cmt_encode_msgpack.c only writes the aggregation_type meta field for CMT_COUNTER. Histograms and exp_histograms encoded to msgpack and decoded back always end up with CMT_AGGREGATION_TYPE_UNSPECIFIED, regardless of the temporality (delta/cumulative) they were created with — because unpack_basic_type_meta() in cmt_decode_msgpack.c never applies the decoded aggregation_type back onto those two types either.

This fix mirrors the existing CMT_COUNTER handling for CMT_HISTOGRAM and CMT_EXP_HISTOGRAM, on both the encode and decode side.

Real-world impact

This is the exact code path Fluent Bit's OTLP pipeline uses internally:

OTLP client → in_opentelemetry (decodes OTLP → cmetrics)
            → [internal msgpack buffer]
            → out_opentelemetry (decodes msgpack → re-encodes OTLP)
            → OTLP receiver (e.g. Prometheus)

Any OTLP histogram with delta or cumulative temporality passing through Fluent Bit loses that metadata in the msgpack round-trip. Downstream OTLP receivers that validate temporality (e.g. Prometheus with otlp-deltatocumulative enabled) reject the metric:

invalid temporality and type combination for metric "k6_http_req_duration"

Root cause, traced with evidence

  • cmt_decode_opentelemetry.c / cmt_encode_opentelemetry.c (the direct OTLP codec) correctly round-trip aggregation_type for histograms — confirmed by reading the source.
  • cmt_encode_msgpack.c / cmt_decode_msgpack.c (the internal msgpack codec Fluent Bit uses for its input/output plugin boundary) do not — confirmed by reading the source and reproducing locally.
  • Reproduced against a live cluster: built a patched Fluent Bit v5.0.4 with this fix applied to the vendored lib/cmetrics copy, deployed it in place of the stock image, ran a k6 OTLP load test through the same Fluent Bit pipeline, and confirmed via direct Prometheus API query that k6_http_req_duration_milliseconds_bucket/_count/_sum (previously rejected) now land correctly with real data.

Test plan

Added tests/msgpack_temporality.c: round-trips a histogram, an exp_histogram, and a counter (as a control) through cmt_encode_msgpack_createcmt_decode_msgpack_create and asserts aggregation_type survives.

  • ctest passes: 18/18 (17 pre-existing + this new test), no regressions
  • Verified the new test fails on histogram and exp_histogram (counter still passes, as expected) when reverting src/cmt_encode_msgpack.c / src/cmt_decode_msgpack.c only — confirms the test actually exercises the bug
  • Verified the fix against a real Fluent Bit deployment with real OTLP traffic (see above), not just unit tests

@truongnht truongnht requested a review from edsiper as a code owner July 2, 2026 13:52
…d-trip

pack_header() in cmt_encode_msgpack.c only wrote the "aggregation_type"
meta field for CMT_COUNTER. Histograms and exp_histograms encoded to
msgpack and decoded back always ended up with
CMT_AGGREGATION_TYPE_UNSPECIFIED regardless of the temporality they were
created with, since cmt_decode_msgpack.c's unpack_basic_type_meta() never
applied the decoded aggregation_type to those two types either.

This is the exact path Fluent Bit's OTLP pipeline uses internally
(in_opentelemetry decodes OTLP into cmetrics, buffers as msgpack, then
out_opentelemetry re-encodes to OTLP). Any OTLP histogram with delta or
cumulative temporality passing through Fluent Bit lost that metadata and
downstream OTLP receivers (e.g. Prometheus with otlp-deltatocumulative)
rejected it with "invalid temporality and type combination".

Mirrors the existing CMT_COUNTER handling for CMT_HISTOGRAM and
CMT_EXP_HISTOGRAM on both the encode and decode side.

Adds tests/msgpack_temporality.c: a msgpack round-trip regression test for
histogram, exp_histogram and counter aggregation_type. Verified this test
fails on histogram/exp_histogram (counter passes) without the fix, and
passes for all three with it.

Signed-off-by: Truong Nguyen <truong.nguyen_1@philips.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aggregation_type not serialized for histogram/exp_histogram in msgpack encoder, causing temporality loss on decode

1 participant