[HDX-4383] Skip string inference when body parses as JSON with a level field#2363
Conversation
…s level/severity field (HDX-4383) When the log body parses as JSON and contains a level-like field, promote that value to log.severity_text so the downstream string-keyword inference block is short-circuited via its existing 'severity_number == 0 and severity_text == ""' guard. The previous behavior scanned the raw body string for severity keywords with a \b-prefixed regex, which produced false positives whenever a body incidentally contained a word starting with a keyword (e.g. 'alertmanager' matching 'alert' -> FATAL). With this fix, structured logs with a level field get their severity from the field, not from substrings of the body. Implementation: - Insert a new OTTL log_statements block between the existing JSON-parse block and the string-inference block. - The block is gated on no producer-set severity, then runs a priority cascade over common level/severity field names from mainstream logging frameworks (pino, winston, zerolog, zap, logrus, slog, Serilog, GCP Cloud Logging, Elastic ECS), in three case variants each: level / Level / LEVEL, severity / Severity / SEVERITY, log.level. - Severity_number is mapped via case-insensitive (?i) regex matches that mirror the existing string-inference keyword set. Unrecognized level values get severity_number defaulted to INFO (mirrors block 2's else). - Final lowercase normalization block (ConvertCase) is unchanged. Tests: - Eight new bats smoke tests under smoke-tests/otel-collector covering lowercase level, PascalCase Level (Serilog), uppercase SEVERITY (GCP), flattened log.level (ECS), unknown level fallthrough, JSON without level (string inference still runs), producer-set severity precedence, and the exact Grafana/mimir-alertmanager customer payload from the linked support escalation. - All 17 severity-inference + 3 auto-parse + 1 normalize-severity bats tests pass against a freshly-built collector image.
🦋 Changeset detectedLatest commit: 73889e0 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
E2E Test Results✅ All tests passed • 193 passed • 3 skipped • 1305s
Tests ran across 4 shards in parallel. |
🔴 Tier 4 — CriticalTouches auth, data models, config, tasks, OTel pipeline, ClickHouse, or CI/CD. Why this tier:
Review process: Deep review from a domain expert. Synchronous walkthrough may be required. Stats
|
Deep Review🟡 P2 -- recommended
🔵 P3 nitpicks (5)
Reviewers (6): correctness, testing, maintainability, project-standards, reliability, adversarial. Testing gaps:
|
Summary
Fixes a customer-reported bug where structured JSON logs are tagged with the
wrong severity because the OTel collector's string-keyword inference picks up
incidental severity words inside the body. For example, a Grafana sidecar log
with body
{"level":"INFO", "msg":"... mimir-alertmanager-dashboard ..."}gets
SeverityText="fatal",SeverityNumber=21today because the inferenceregex
\b(alert|crit|emerg|fatal|error|err|warn|notice|debug|dbug|trace)matches
alertat the start ofalertmanager. The JSON-derivedlevel: INFOfield is ignored entirely.
This PR adds a new OTTL
log_statementsblock between the existing JSON-parseblock and the string-inference block in
docker/otel-collector/config.yaml.The new block promotes a JSON-derived level/severity field to
log.severity_text, which causes the string-inference block to be skipped viaits existing
severity_number == 0 and severity_text == ""guard.What it does
log.attributes, which is where Block 1 (JSON parse +flatten(), or OTel map body merge) puts the parsed fields. So thepromotion fires only when a JSON body was actually parsed.
field names used by mainstream logging frameworks:
level/Level/LEVEL(pino, winston, zerolog, zap, logrus, slog, Serilog, NLog),severity/Severity/SEVERITY(Datadog, GCP Cloud Logging), andlog.level(Elastic ECS, flattened from nested JSON).setself-guards onseverity_text == "", so the first match wins (level>severity>log.level). The block as a whole is gated on no producer-set severity,so producer-supplied values are always preserved.
severity_numbermapping uses the same case-insensitive(?i)regexkeyword set as the existing string-inference block, so values like
"Information","WARN","Critical"resolve correctly. Unrecognizedvalues (e.g.
"verbose") fall through to INFO — same default as Block 2.severity_textis normalized by the unchanged Block 3(
ConvertCase(severity_text, "lower")), so"INFO"becomes"info".Behavior matrix
{"level":"warn", ...}("warn", 13){"Level":"Information", ...}(Serilog)("information", 9){"SEVERITY":"ERROR", ...}(GCP)("error", 17){"log":{"level":"fatal"}, ...}(ECS, flattened)("fatal", 21){"level":"verbose", ...}(unknown text)("verbose", 9)— INFO default{"msg":"something error happened"}(no level field)("error", 17)— string inference still runsseverity_textorseverity_numberHow to test on Vercel preview
N/A — non-UI change. This modifies the OTel collector's OTTL config; no
frontend code is touched.
Testing
make ci-lintandmake ci-unitboth green.smoke-tests/otel-collector/data/severity-inference/from-json-*/:from-json-level/— lowercaselevelfrom-json-level-pascalcase/— SerilogLevel: "Information"from-json-severity-uppercase/— GCPSEVERITY: "ERROR"from-json-ecs-log-level/— Elastic ECS nestedlog.levelfrom-json-level-unknown/— unrecognized value falls back to INFOfrom-json-no-level/— JSON body without level → string inference runsfrom-json-level-producer-wins/— producer-set severity preservedfrom-json-level-body-keyword-conflict/— the exact customer payloadfrom the linked support escalation (body has
alertmanager-dashboardplus
level: INFO); confirms the customer scenario no longerreproduces.
severity-inference.bats,auto-parse-json.bats,and
normalize-severity.batspass against a freshly-rebuilt collectorimage.
References