New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix in-memory logger #24616
Fix in-memory logger #24616
Conversation
Our Admin > Troubleshooting > Logs page broke, just showing a spinner and never showing the logs. Don't quite understand why this fixes it. Javadocs are https://logging.apache.org/log4j/2.x/log4j-api/apidocs/org/apache/logging/log4j/LogManager.html#getContext-boolean- ```clojure logger=> (log/warn "test") nil logger=> (count @messages*) 0 ;; no in-memory logs so page is empty ;; change `(LogManager/getContext true)` in the momoized ns-logger fn ;; and then observe: logger=> (log/warn "test") nil logger=> (count @messages*) 4 ``` Some explorations that might shine some light: ```clojure logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext false) (str *ns*)))) {} logger=> (into {} (.getAppenders (.getLogger (LogManager/getContext true) (str *ns*)))) {"metabase-appender" #object[metabase.logger.proxy$org.apache.logging.log4j.core.appender.AbstractAppender$ff19274a "0x4d680247" "metabase-appender"]} ``` So something is not hooked up quite right.
Codecov Report
@@ Coverage Diff @@
## master #24616 +/- ##
==========================================
- Coverage 65.29% 65.25% -0.04%
==========================================
Files 2788 2803 +15
Lines 85301 85326 +25
Branches 10530 10653 +123
==========================================
- Hits 55696 55682 -14
- Misses 25258 25266 +8
- Partials 4347 4378 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably figure out how to write a test for this e.g. log a message and see if it comes back in the logs API endpoint. But since we didn't have one yet I think it's ok to do that in a follow-on PR.
Probably also worth opening an issue about the FE spinning forever. I'm guessing the API response comes back empty, but since there are no results the FE never hides the loading spinner? Seems like a bug. We should probably show the "No results" image or "there was a problem" image or something like that. IRL hopefully the endpoint is always going to be returning things, but if this is how it handles empty results I'm worried about its error handling in general. If the request times out or something does it show the spinner forever as well? How about a 5xx error response? We need to make that page more robust |
Tests added and FE issue created for error handling / empty logs cases. #24621 |
When in a namespace with a record, `Foo resolves to ns.Foo. But outside it resolves to ns/Foo. When running tests from the command line *ns* is user so it gets more complicated.
History: 39.2 we set `Multi-Release: true` in our manifest file and query speed drops like a stone. Jeff was able to track this to our logger calls in tight loops. We revert the multi-release and keep seeing the warning on startup > WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance. Benchmarking on 39.2 (bench (log/trace "hi")) -> 15383 ns So we freaked out and set up a memoizing logger factory (bench (log/trace "hi")) -> 141 ns What a win. But we never noticed that the default *logger-factory* being picked up was slf4j ( `(log.impl/name log/*logger-factory*)` -> "org.slf4j" ). On 39.2 if you set the factory to the log4j2 version you get back to a great number: `(bench (log/trace "hi"))` -> 25 ns And thus ensuring that our logger is the default log4j2 version is even faster than our memoizing logger-factory. Memoizing factory: 141 ns slf4j factory: 2269 ns log4j2 factory: 31 ns What does `(LogManager/getContext false)` mean versus using `true`? We only need and want a single context. But log4j2 by default uses a context selector called `ClassLoaderContextSelector`. We could put all of this behind us if we used a context selector type `BasicContextSelector` but this is surprisingly hard to do: you have to set a system property. And since all of this stuff gets initialized in static initializers, we don't have an opportunity to do this programmatically. The way around this is to require people to pass this system property on startup which is not acceptable. So getContext true checks for a threadlocal context in a specific static variable and falls back to a Default context. getContext false looks at classloaders and ends up at a different context. BUT: the log.impl version uses a closure over getContext false instead of getting it each time. And I suspect that when it does this there is only one so it is the default and continues to use this one. In our LoggerFactory implementation we were looking up the context each time. This still seems to work and everything is playing nice in our application classloader but its totally possible that our drivers are not hitting this. I'll have to investigate this. Verification: - build the uberjar locally (`bin/build`) - copy to some temp directory and also copy criterium.jar ```shell MB_JETTY_PORT=4000 java "$(socket-repl 4001)" -cp locally-built.jar:criterium.jar metabase.core ``` ```clojure /tmp/locally-built via ☕ v17.30 on ☁️ metabase-query ❯ nc localhost 4001 user=> (doto 'metabase.logger require in-ns) metabase.logger metabase.logger=> (require '[criterium.core :refer [bench]]) nil metabase.logger=> (bench (log/trace "hi")) Evaluation count : 1686535500 in 60 samples of 28108925 calls. Execution time mean : 22.487972 ns Execution time std-deviation : 0.101004 ns Execution time lower quantile : 22.326806 ns ( 2.5%) Execution time upper quantile : 22.648368 ns (97.5%) Overhead used : 6.924761 ns nil metabase.logger=> (count (messages)) 358 metabase.logger=> ``` Verifies that we are on the order of 22 nanoseconds and the in-memory logger has messages in it.
New entrypoint for the application: metabase.bootstrap sets two properties for logging (context selector, log4j2 factory) and ensures those properties are set before any logging code is loaded.
deps.edn
Outdated
amalloy/ring-gzip-middleware {:mvn/version "0.1.4"} ; Ring middleware to GZIP responses if client can handle it | ||
bigml/histogram {:mvn/version "4.1.4"} ; Histogram data structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pretty big indentation diff here -- I'm worried this will cause merge conflicts in other PRs that bump deps or whatever. Maybe worth reverting it
We do have a comment-column
declaration here
Line 1 in 4f09fa3
;; -*- comment-column: 80; -*- |
so not sure why it did this to you 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woah. inadvertent. good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM but I humbly request you revert those comment indentation changes in deps.edn
to reduce unnecessary code churn
This reverts commit 4f09fa3.
Our Admin > Troubleshooting > Logs page broke, just showing a spinner
and never showing the logs.
Remove our memoizing logger
For 39.2 we made our uberjar multi-release but we had to revert due to degraded query performance. Jeff tracked this down to a huge decrease in speed of logging, even noops like
(log/trace "info")
.To re-institute the multi-release property, we've attempted to increase the speed of this logging by memoizing the lookup from namespace to logger.
Summary from
But we didn't notice one critical issue when doing this: it was using the
(log.impl/slf4j-factory)
which would delegate to log4j2.But if we just set the logger factory as the log4j2 factor we're back to native speeds:
One issue still remained: the difference between
(LogManager/getContext true)
vs when called withfalse
. Log4j2 by default uses aClassLoaderContextSelector
which can handle multiple logging contexts, usually for web apps in the same jvm. (see logging separation docs). We can change this with a system property to aBasicContextSelector
where there is only ever one context. But to do this we need the system property set before the logging static initializers are run. Thusmetabase.bootstrap
.EDIT: no need to backport this since release-x.44.x doesn't have the new logger (related to multi-release and upgrading graal/js dep)