[HDX-4383] Skip string inference when body parses as JSON with a level field by dhable · Pull Request #2363 · hyperdxio/hyperdx

dhable · 2026-05-28T18:27:21Z

Summary

Fixes a customer-reported bug where structured JSON logs are tagged with the
wrong severity because the OTel collector's string-keyword inference picks up
incidental severity words inside the body. For example, a Grafana sidecar log
with body {"level":"INFO", "msg":"... mimir-alertmanager-dashboard ..."}
gets SeverityText="fatal", SeverityNumber=21 today because the inference
regex \b(alert|crit|emerg|fatal|error|err|warn|notice|debug|dbug|trace)
matches alert at the start of alertmanager. The JSON-derived level: INFO
field is ignored entirely.

This PR adds a new OTTL log_statements block between the existing JSON-parse
block and the string-inference block in docker/otel-collector/config.yaml.
The new block promotes a JSON-derived level/severity field to
log.severity_text, which causes the string-inference block to be skipped via
its existing severity_number == 0 and severity_text == "" guard.

What it does

Reads only from log.attributes, which is where Block 1 (JSON parse +
flatten(), or OTel map body merge) puts the parsed fields. So the
promotion fires only when a JSON body was actually parsed.
Case-insensitive across keys by enumerating common casings of common
field names used by mainstream logging frameworks: level / Level /
LEVEL (pino, winston, zerolog, zap, logrus, slog, Serilog, NLog),
severity / Severity / SEVERITY (Datadog, GCP Cloud Logging), and
log.level (Elastic ECS, flattened from nested JSON).
Priority-ordered cascade: each set self-guards on
severity_text == "", so the first match wins (level > severity >
log.level). The block as a whole is gated on no producer-set severity,
so producer-supplied values are always preserved.
severity_number mapping uses the same case-insensitive (?i) regex
keyword set as the existing string-inference block, so values like
"Information", "WARN", "Critical" resolve correctly. Unrecognized
values (e.g. "verbose") fall through to INFO — same default as Block 2.
severity_text is normalized by the unchanged Block 3
(ConvertCase(severity_text, "lower")), so "INFO" becomes "info".

Behavior matrix

Scenario	Result
`{"level":"warn", ...}`	`("warn", 13)`
`{"Level":"Information", ...}` (Serilog)	`("information", 9)`
`{"SEVERITY":"ERROR", ...}` (GCP)	`("error", 17)`
`{"log":{"level":"fatal"}, ...}` (ECS, flattened)	`("fatal", 21)`
`{"level":"verbose", ...}` (unknown text)	`("verbose", 9)` — INFO default
`{"msg":"something error happened"}` (no level field)	`("error", 17)` — string inference still runs
Non-JSON body	Unchanged — string inference still runs
Producer pre-set `severity_text` or `severity_number`	Producer values preserved

How to test on Vercel preview

N/A — non-UI change. This modifies the OTel collector's OTTL config; no
frontend code is touched.

Testing

New make ci-lint and make ci-unit both green.
New bats smoke tests (8 cases) under
smoke-tests/otel-collector/data/severity-inference/from-json-*/:
- from-json-level/ — lowercase level
- from-json-level-pascalcase/ — Serilog Level: "Information"
- from-json-severity-uppercase/ — GCP SEVERITY: "ERROR"
- from-json-ecs-log-level/ — Elastic ECS nested log.level
- from-json-level-unknown/ — unrecognized value falls back to INFO
- from-json-no-level/ — JSON body without level → string inference runs
- from-json-level-producer-wins/ — producer-set severity preserved
- from-json-level-body-keyword-conflict/ — the exact customer payload
  from the linked support escalation (body has alertmanager-dashboard
  plus level: INFO); confirms the customer scenario no longer
  reproduces.
All 21 bats tests in severity-inference.bats, auto-parse-json.bats,
and normalize-severity.bats pass against a freshly-rebuilt collector
image.

References

Linear Issue: HDX-4383
Support escalation: ClickHouse/support-escalation#7812

…s level/severity field (HDX-4383) When the log body parses as JSON and contains a level-like field, promote that value to log.severity_text so the downstream string-keyword inference block is short-circuited via its existing 'severity_number == 0 and severity_text == ""' guard. The previous behavior scanned the raw body string for severity keywords with a \b-prefixed regex, which produced false positives whenever a body incidentally contained a word starting with a keyword (e.g. 'alertmanager' matching 'alert' -> FATAL). With this fix, structured logs with a level field get their severity from the field, not from substrings of the body. Implementation: - Insert a new OTTL log_statements block between the existing JSON-parse block and the string-inference block. - The block is gated on no producer-set severity, then runs a priority cascade over common level/severity field names from mainstream logging frameworks (pino, winston, zerolog, zap, logrus, slog, Serilog, GCP Cloud Logging, Elastic ECS), in three case variants each: level / Level / LEVEL, severity / Severity / SEVERITY, log.level. - Severity_number is mapped via case-insensitive (?i) regex matches that mirror the existing string-inference keyword set. Unrecognized level values get severity_number defaulted to INFO (mirrors block 2's else). - Final lowercase normalization block (ConvertCase) is unchanged. Tests: - Eight new bats smoke tests under smoke-tests/otel-collector covering lowercase level, PascalCase Level (Serilog), uppercase SEVERITY (GCP), flattened log.level (ECS), unknown level fallthrough, JSON without level (string inference still runs), producer-set severity precedence, and the exact Grafana/mimir-alertmanager customer payload from the linked support escalation. - All 17 severity-inference + 3 auto-parse + 1 normalize-severity bats tests pass against a freshly-built collector image.

changeset-bot · 2026-05-28T18:27:25Z

🦋 Changeset detected

Latest commit: 73889e0

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@hyperdx/otel-collector	Patch
@hyperdx/api	Patch
@hyperdx/app	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2026-05-28T18:27:27Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
hyperdx-oss	Ready	Preview, Comment	May 28, 2026 8:57pm
hyperdx-storybook	Error		May 28, 2026 8:57pm

github-actions · 2026-05-28T18:31:10Z

E2E Test Results

✅ All tests passed • 193 passed • 3 skipped • 1305s

Status	Count
✅ Passed	193
❌ Failed	0
⚠️ Flaky	2
⏭️ Skipped	3

Tests ran across 4 shards in parallel.

View full report →

github-actions · 2026-05-28T18:32:45Z

🔴 Tier 4 — Critical

Touches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD.

Why this tier:

Critical-path files (1):
- docker/otel-collector/config.yaml

Review process: Deep review from a domain expert. Synchronous walkthrough may be required.
SLA: Schedule synchronous review within 2 business days.

Stats

Production files changed: 26
Production lines changed: 452
Branch: dhable/HDX-4383-skip-string-inference-on-json-level
Author: dhable

To override this classification, remove the review/tier-4 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

github-actions · 2026-05-28T18:45:13Z

Deep Review

🟡 P2 -- recommended

docker/otel-collector/config.yaml:58-69 -- the new severity-number IsMatch rules drop both the \b word boundary used by the existing inference block and the per-rule guard, so a JSON level value that contains a keyword as a non-word-boundary substring (e.g. "deferred", "interrupted", "stacktrace") matches the wrong bin, and any compound value (e.g. "warn_info", "trace_error") is reclassified by whichever rule fires last.
- Fix: anchor each pattern with \b...\b and add and log.severity_number == 0 to every set so the first match wins, mirroring the priority pattern used by the string-inference block below.
- _{correctness, testing, maintainability, reliability, adversarial}
smoke-tests/otel-collector/severity-inference.bats:106 -- no fixture covers the cascade-priority contract (multiple level-like keys present simultaneously) or the non-string level fallback (numeric level from bunyan/pino), so a future reorder or accidental drop of a self-guard would silently change behavior with green tests.
- Fix: add fixtures for {"level":"warn","severity":"error"} (expect warn) and {"level":30,"msg":"connection error"} (expect fallback to body-string inference, error/17).
- _{testing, correctness, reliability, adversarial}

🔵 P3 nitpicks (5)

docker/otel-collector/config.yaml:30-56 -- promoted severity_text is kept verbatim, so logs end up with non-canonical values like "information", "warning", "verbose", "notice" (confirmed by the new fixtures) instead of the canonical six OTel severity strings the rest of the pipeline emits.
- Fix: after each successful severity-number classification, normalize severity_text to the canonical short form (fatal/error/warn/info/debug/trace) like the string-inference block does.
- _{correctness, reliability}
docker/otel-collector/config.yaml:30-56 -- all seven promotion rules guard on IsString(...), so numeric levels emitted by bunyan/pino ({"level":30}) and other native OTLP-style numeric severities are skipped without any documentation or fallback handling.
- Fix: either add a numeric-level mapping (e.g., 10/20/30/40/50/60 → trace/debug/info/warn/error/fatal) or call out the limitation in the changeset and a comment.
- _{correctness, adversarial}
docker/otel-collector/config.yaml:18-20 -- block-level guard only inspects the log record's severity_number, so a JSON body shaped like {"severity_number":17,"level":"info"} silently drops the producer-intended severity_number=17 and resolves to INFO via the promoted level.
- Fix: if surfacing severity_number from JSON attributes is in scope for this fix, promote it too and gate text promotion on severity_number == 0; otherwise leave a comment naming the limitation.
docker/otel-collector/config.yaml:30-56 -- promotion accepts any non-empty string, so whitespace-only or numeric-string levels (e.g., " ", "30") propagate to severity_text unchanged and end up in storage / UI filter dropdowns as malformed values.
- Fix: at the fallthrough on line 75, additionally set severity_text to "info" (or trim and re-check) so the column never carries garbage.
- _adversarial
smoke-tests/otel-collector/severity-inference.bats:106 -- three of the seven enumerated key variants (LEVEL, lowercase severity, PascalCase Severity) and the DEBUG/TRACE numeric mapping from JSON-promoted text have no direct fixtures; a typo in one of the unverified set statements would not surface.
- Fix: add one combined fixture exercising each untested variant and assert each row's severity_text/SeverityNumber against the expected mapping.

Reviewers (6): correctness, testing, maintainability, project-standards, reliability, adversarial.

Testing gaps:

Cascade priority when both level and severity keys are present.
Non-string level value (numeric bunyan/pino) and empty-string level value.
DEBUG/TRACE classification from JSON-promoted severity_text.
JSON body delivered as an OTLP map (not stringified JSON), to exercise the merge_maps(log.attributes, log.body, ...) branch with the new block.
Untested key variants LEVEL, severity, Severity.

…n-level

wrn14897

thanks for the fix

…n-level

dhable added the ai-generated AI-generated content; review carefully before merging. label May 28, 2026

chore: add changeset for HDX-4383

91cb49e

vercel Bot had a problem deploying to Preview – hyperdx-storybook May 28, 2026 18:30 Failure

dhable marked this pull request as ready for review May 28, 2026 18:32

github-actions Bot added the review/tier-4 Critical — deep review + domain expert sign-off label May 28, 2026

dhable added the automerge label May 28, 2026

Merge branch 'main' into dhable/HDX-4383-skip-string-inference-on-jso…

af0d4a4

…n-level

vercel Bot had a problem deploying to Preview – hyperdx-storybook May 28, 2026 18:47 Failure

vercel Bot deployed to Preview – hyperdx-oss May 28, 2026 18:50 View deployment

dhable requested a review from wrn14897 May 28, 2026 20:53

wrn14897 approved these changes May 28, 2026

View reviewed changes

Merge branch 'main' into dhable/HDX-4383-skip-string-inference-on-jso…

73889e0

…n-level

vercel Bot had a problem deploying to Preview – hyperdx-storybook May 28, 2026 20:54 Failure

vercel Bot deployed to Preview – hyperdx-oss May 28, 2026 20:57 View deployment

kodiakhq Bot merged commit ad3f1c9 into main May 28, 2026
18 of 19 checks passed

kodiakhq Bot deleted the dhable/HDX-4383-skip-string-inference-on-json-level branch May 28, 2026 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HDX-4383] Skip string inference when body parses as JSON with a level field#2363

[HDX-4383] Skip string inference when body parses as JSON with a level field#2363
kodiakhq[bot] merged 4 commits into
mainfrom
dhable/HDX-4383-skip-string-inference-on-json-level

dhable commented May 28, 2026

Uh oh!

changeset-bot Bot commented May 28, 2026 •

edited

Loading

Uh oh!

vercel Bot commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 28, 2026 •

edited

Loading

Uh oh!

wrn14897 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dhable commented May 28, 2026

Summary

What it does

Behavior matrix

How to test on Vercel preview

Testing

References

Uh oh!

changeset-bot Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Tier 4 — Critical

Uh oh!

github-actions Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deep Review

🟡 P2 -- recommended

Uh oh!

wrn14897 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented May 28, 2026 •

edited

Loading

vercel Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading

github-actions Bot commented May 28, 2026 •

edited

Loading