Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable turning on IndexWriter's infoStream #5891

Closed
mikemccand opened this issue Apr 21, 2014 · 8 comments

Comments

Projects
None yet
4 participants
@mikemccand
Copy link
Contributor

commented Apr 21, 2014

For diagnosing low level IndexWriter issues, it's sometimes useful to enable IndexWriter's infoStream output, but one cannot do this from Elasticsearch today.

@kimchy

This comment has been minimized.

Copy link
Member

commented Apr 21, 2014

++, I suggest this setting will be real time setting (changed using index update settings API). I think the interesting question would be into which file will it be logged? I think it should be a separate file, that is placed in the same logging dir location.

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

commented Apr 21, 2014

I like the idea of having this be a real-time setting, so you can go into an existing index that's doing something strange and turn on IW.infoStream. The challenge is, this is not a real-time setting in IndexWriter today (you must close/reopen the writer), which is annoying.

I think we should just make our own delegating InfoStream impl that we pass to IW on init, so IW always logs messages to it, but then we tell it to turn on/off whether it sends those messages to a log file (I agree, it should be a separate file in the normal logging dir).

I'll give this a shot.

@kimchy

This comment has been minimized.

Copy link
Member

commented Apr 21, 2014

Wondering if a delegate InfoStream will not hurt perf, since the string construction to log it will happen now all the time within Lucene. Potentially, this setting can be realtime, but we will close and open the IndexWriter with it within our Engine (we can do that with some settings that require a fresh IW).

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Apr 21, 2014

This can safely be a realtime setting.

There is actually always a infoStream set, it is just a NullInfoStream that just always returns false for isEnabled? All checks in IndexWriter look like this:

  if (infoStream.isEnabled("DWPT")) {
    infoStream.message("DWPT", "done abort");
  }

So its sorta like logger apis in that sense. You can safely have one that flips on and off at will:

@rmuir

This comment has been minimized.

Copy link
Contributor

commented Apr 21, 2014

About linking InfoStream to a logger, it should be pretty simple. We added this to solr with this class: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/update/LoggingInfoStream.java

However, there was more going on there (e.g. legacy settings for infostream and so on).

Maybe for ES, there should be no option for infostream at all? it should just be set, and it should check if trace or debug level logging is enabled?

Basically: the user would be unaware of infostream. They just crank up logging and see more stuff...

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

commented Apr 21, 2014

I like that solution; this way the user doesn't need to learn another setting. This is just increased verbosity.

@kimchy

This comment has been minimized.

Copy link
Member

commented Apr 21, 2014

we could have it under a specific logging name, yea, and then its just a matter of enabling trace logging on it specifically. I missed the enhancement to InfoStream, nice!

mikemccand added a commit to mikemccand/elasticsearch that referenced this issue Apr 22, 2014

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

commented Apr 22, 2014

OK, I pushed an initial commit .... I found a LoggerInfoStream which
didn't seem to be used and whose intent seemed to be the same as this
issue (just for an older time when Lucene took PrintStream), so I
re-purposed it into a Lucene InfoStream.

I'm not sure about passing the indexSettings/shardId down, but I
want to make sure the shardId is in the log so we can separate out
the logs from multiple IndexWriters. Is there a cleaner way?

I also hit & "fixed" an unrelated NPE when I set logging to trace but
I'm unsure if it's a problem that newSearcher is null when topReader
was true? I put a nocommit.

I turned on trace logging in the tests (-Des.logger.level=TRACE) and
confirmed the IndexWriter's infoStream messages are appearing. I
still need to figure out how to make a test case that checks
it... maybe I need to set up a MockAppender...

mikemccand added a commit that referenced this issue Apr 24, 2014

Send Lucene's IndexWriter infoStream messages to Logger lucene.iw, le…
…vel=TRACE

Lucene's IndexWriter logs many low-level details to its infoStream
which can be helpful for diagnosing; with this change, if you enable
TRACE logging for the "lucene.iw" logger name then IndexWriter's
infoStream output will be captured and can later be scrutinized
e.g. using https://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/src/python/iwLogToGraphs.py
to generate graphs like http://people.apache.org/~mikemccand/lucenebench/iw.html

Closes #5891
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.