New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable ingest-attachment logging #93878
Disable ingest-attachment logging #93878
Conversation
Since we're not using that library anymore.
https://github.com/qos-ch/slf4j/tree/master/jcl-over-slf4j is where the implementation lives, there's a LICENSE there but no notice, nor do I see a notice in the directory above that.
https://github.com/qos-ch/slf4j/tree/master/slf4j-nop is where the implementation lives, there's a LICENSE there but no notice, nor do I see a notice in the directory above that.
Otherwise the LLRC brings it in and we end up in jar-hell.
It won't work anyway because we're in a limited context where log4j and other logging libraries don't have the permissions necessary to log successfully.
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @joegallo, I've created a changelog YAML for you. |
I did a bit of |
Note to self that we need to use the edit: Success! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@masseyke and I had a brief discussion of this PR offline. I'd had two questions he answered.
Q: Why have modules/ingest-attachment/src/main/config/log4j2.properties
set all these classes to off
if you're just using a no-op logger?
A: Because that config file is for log4j, and the no-op logger is just for slf4j. We aren't using a bridge to route slf4j to log4j or vice-versa.
Q: Why not bridge all log APIs to a single no-op log impl?
A: Eventually we might bridge SLF4j apis to the Log4j impl, but not until after the TikaImpl permissions are fixed. But even then, we'll likely still leave these specific classes on "off" logging, since their output is next to useless, and just adds noise. This change would be valuable even without the performance implications.
I thought I'd throw in my notes from talking with @seanstory because I probably won't remember this in 6 months: Before this PR:
With this PR:
In the future after we have fixed the #93714 and changed slf4j to use the log4j bridge instead of the no-op implementation:
Note that even in the future state, we plan to have most tika logging turned off in log4j2.properties because it has not been very useful in the context of the ingest attachment processor. But a user could turn it back on using the cluster settings API (or editing the file directly if they have access). Also note that this problem is exacerbated by log rollover (since it requires log4j to have additional permissions, and it slows things down a little, and it causes more stack traces). We were able to make things fail spectacularly before this PR by setting the rollover log size to something much lower than the default. |
Note for future github archaeologists: diff --git a/distribution/src/config/log4j2.properties b/distribution/src/config/log4j2.properties
index 36b5b03d9a1..bfdd7e215b0 100644
--- a/distribution/src/config/log4j2.properties
+++ b/distribution/src/config/log4j2.properties
@@ -18,7 +18,7 @@ appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.time.modulate = true
appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
-appender.rolling.policies.size.size = 128MB
+appender.rolling.policies.size.size = 1KB
appender.rolling.strategy.type = DefaultRolloverStrategy
appender.rolling.strategy.fileIndex = nomax
appender.rolling.strategy.action.type = Delete ^ this did the trick nicely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Keith Massey <keith.massey@elastic.co>
Co-authored-by: Keith Massey <keith.massey@elastic.co>
currently when Elasticsearch starts up there are warnings on the console complaining about SLF4J providers not found. this is emitted twice 1 - when repository-azure module is loaded 2 - when xpack-security (depending on transport-netty) module is loaded This commits adds a slf4j-nop to fix that warning related to #93714 and #93878
Closely related to #91964 and #93608
We run the actual text extraction part of the ingest-attachment module inside a highly restricted environment, in order to protect ourselves from security bugs in dependencies:
elasticsearch/modules/ingest-attachment/src/main/java/org/elasticsearch/ingest/attachment/TikaImpl.java
Lines 96 to 99 in e82571f
That environment is so locked down that logging from inside of it won't work reliably and correctly. Additionally, our logging there is an enormous mess. Some dependencies are using log4j directly (like Elasticsearch actually already does), some are using slf4j or commons-logging (both entirely unconfigured).
Additionally, because of the way logging fails, it can end up repeatedly throwing exceptions internally while accomplishing very little (at great cost of time). Fixing or working around the logging issues will significantly improve the performance of text extraction.
This PR does not solve the problem of the limited permissions (that's #93714), rather it silences all logging during text extraction. It arranges our dependencies better, though, so that once we fix log4j for ourselves, we can use that here, too, and reroute slf4j logging to use log4j as well.
As a matter of inside baseball, we'd like an ordinary code review from within the @elastic/es-data-management team, but we're also going to get a 10,000 ft level review from @seanstory as well as a "no objections" review from @rjernst.