-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scroll SearchContextMissingException and NPE in ES logs #25820
Comments
This error looks significant. It' saying there was an attempt to retrieve the JSON of a matching doc but the index being hit has specifically disabled the storage of the |
No we definitely do not have
|
Having a hard time recreating this. |
There might be a related issue in https://discuss.elastic.co/t/nullpointerexception-in-scroll/98717 |
This has occurred again a couple of times in the last few days. I noticed a different NPE in the ES logs:
|
This seems to be a different issue. In the first set of stacktraces, it looked like search context was disappearing. In this stack trace we are dealing with nulls in SearchHit array. I can come up with some reasonable explanation how search context could disappear (some sort of race condition between context cleanup and use). However, SearchHits array here is a local variable and I don't really see any code paths that could lead to having nulls in it. |
@qwerty4030 do you still experience this issue? It looks like we are still not able to reproduce. |
@cbuescher Unfortunately I don't have access to this cluster anymore. I was hoping the stacktraces would be enough to figure it out. Hopefully its fixed in a later version! |
@cbuescher I am the lead dev on this cluster now and we are still seeing this error. Over the last 30 days we have 81 occurrences of this exception. /cc @qwerty4030 |
The two main reasons for this exception seem to be using the original scroll id for all scroll requests, and/or timing out of the search context. |
@cbuescher I'll start putting that information together. One thing that I noticed while researching this error this morning is that at the time that the error occurred we have a gap in the monitoring graphs. I've attached two screenshots that show this. I've only checked 2 of the occurrences so far but the two that I checked both exhibit the same gap in the graphs The gap appears in the primary master node as well as in at least one of the data nodes. These images are for the primary master node. |
No further feedback received. @talanb / @qwerty4030 if you have the requested |
Elasticsearch version: 5.3.3
Plugins installed: [discovery-ec2, repository-s3, x-pack]
JVM version : 1.8.0_131 Java HotSpot(TM) 64-Bit Server VM
OS version : 3.13.0-121-generic AWS Ubuntu 14.04 LTS
Description of the problem including expected versus actual behavior:
A scroll request in a daily scheduled job sometimes fails due to
cause [SearchContextMissingException[No search context found for id [162106278]]
. I'm pretty confident that our code is correct: we use the scroll id from the previous request and the scroll timeout is set to 10 minutes. This error only occurs sometimes within a few minutes of starting the scroll (size is 1000) and about 20k docs have been processed at that point. Around the time of the error I noticed an NPE, IAE, and GC warnings in the ES logs. The index has 10 shards and 1 replica with 10 data nodes. Also around this time we experience a 60% increase in search latency (looking at kibana monitoring).Unable to reproduce, but this error has occurred 3 times in the last 7 days (run once per day).
Provide logs (if relevant):
Exception from our code:
Exceptions in ES logs (about 15s before, same logs for many/most of the other nodes):
The text was updated successfully, but these errors were encountered: