-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Fix log level detection #12651
fix: Fix log level detection #12651
Conversation
pkg/distributor/distributor.go
Outdated
@@ -904,42 +904,46 @@ func extractLogLevelFromLogLine(log string) string { | |||
} | |||
|
|||
if firstNonSpaceChar == '{' && lastNonSpaceChar == '}' { | |||
if strings.Contains(log, `:"err"`) || strings.Contains(log, `:"ERR"`) || | |||
strings.Contains(log, `:"error"`) || strings.Contains(log, `:"ERROR"`) { | |||
levelIndex := strings.Index(log, `"level":`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
problem is that it could be Level, LEVEL, serverity, Serverity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i've mentioned it in the caveats in notes.. can we impose this in our documentation? or make this string configurable (case sensetive)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we pass a hint to this function? for example, I think we could change this code to something like the following
firstEntry := stream.Entries[0]
firtsStructuredMetadata := logproto.FromLabelAdaptersToLabels(firstEntry.StructuredMetadata)
prevTs := firstEntry.Timestamp
addLogLevel := validationContext.allowStructuredMetadata &&
validationContext.discoverLogLevels &&
!lbs.Has(labelLevel)
logLevelField := detectLogLevelFieldFromLogEntry(firstEntry, firtsStructuredMetadata)
this would allow detectLogLEvelFieldFromLogEntry
to do a case insensitive search for a bunch of different options (ie. level
, severity
, etc) since it was only doing it once for the whole stream per push. We could then pass that as a hint to detectLogLevelFromLogEntry
, allowing it to short-circuit when passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, I think we should may be support different field name ?
almost wonder if we should use some fast json and logfmt parser really to grab the key !
I found an old issue, a comment on which claims that 3 letter abbreviations for levels (dbg, wrn) are common as well (i haven't seen them yet). I've added support for them too in this PR. Also, supports "level" or "LEVEL" and updated the documentation. @cyriltovena If everything else looks okay, we can merge this fix first? To support different labels such as severity including any other cases of ``level`, I think it should be a part of configuration. We can add support for that in a separate PR. |
this is a good idea, i'll try and benchmark this. My hunch is that it's going to be much slower than the current bruteforce way... |
don't know about that I think it can be quite good. I'm specially worried for structured log. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about trying to detect the level field once per stream per push rather than on every line?
pkg/distributor/distributor.go
Outdated
@@ -904,42 +904,46 @@ func extractLogLevelFromLogLine(log string) string { | |||
} | |||
|
|||
if firstNonSpaceChar == '{' && lastNonSpaceChar == '}' { | |||
if strings.Contains(log, `:"err"`) || strings.Contains(log, `:"ERR"`) || | |||
strings.Contains(log, `:"error"`) || strings.Contains(log, `:"ERROR"`) { | |||
levelIndex := strings.Index(log, `"level":`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we pass a hint to this function? for example, I think we could change this code to something like the following
firstEntry := stream.Entries[0]
firtsStructuredMetadata := logproto.FromLabelAdaptersToLabels(firstEntry.StructuredMetadata)
prevTs := firstEntry.Timestamp
addLogLevel := validationContext.allowStructuredMetadata &&
validationContext.discoverLogLevels &&
!lbs.Has(labelLevel)
logLevelField := detectLogLevelFieldFromLogEntry(firstEntry, firtsStructuredMetadata)
this would allow detectLogLEvelFieldFromLogEntry
to do a case insensitive search for a bunch of different options (ie. level
, severity
, etc) since it was only doing it once for the whole stream per push. We could then pass that as a hint to detectLogLevelFromLogEntry
, allowing it to short-circuit when passed.
Actually think it's fine for doing this on every line, I suggest though we make a custom image that runs the detection without applying the change, to test in prod and see the impact. |
I'm spending some time today using json-parser and logfmt parser and benchmark. |
is it entirely safe to assume that the label for "LEVEL" will be exactly the same in every line within a stream? |
Turns out parsing is ~3x slower than the brute force way. (using our inbuilt parser implementations from logql_log) 1st run is using no parsers (original implementation with bugfix). 2nd run makes an attempt to parse using json and logfmt |
What this PR does / why we need it:
Currently detection of log levels is not truly case insensitive. For eg. "warn" or "WARN" is detected, however "Warn" is not. The implementation uses string.Contains over the entire log line to check for these levels.
However, checking ignoring case would need us to toLower the entire log line which is inefficient. This PR detects the
level
first ->"level":
in case of JSON andlevel=
in case of logfmt and then compares the next 12 characters after that ignoring case.The benchmarks show similar result on my computer with the fixed version at times few 100 ns faster.
implementation from main branch:
implementation with Fix (This PR):
The PR also supports log level TRACE and FATAL and even 3 letter abbreviations that might be common.
Edit:
Added more improvements after some discussions-
Few caveats:
- "level" should still be lowercase or block case ("LEVEL"). Documentation for configuration is updated. This does not support labels like Severity yet.Which issue(s) this PR fixes:
Ref: #12645
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)docs/sources/setup/upgrade/_index.md
production/helm/loki/Chart.yaml
and updateproduction/helm/loki/CHANGELOG.md
andproduction/helm/loki/README.md
. Example PRdeprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR