-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow recovery during rolling cluster restarts #21884
Comments
For the record, this is tracked on our side at https://phabricator.wikimedia.org/T145065 |
We have exactly the same issue.
Installed plugins:
|
@gehel do you have option to repeat the experiment and debug it? most specifically I'm interested to see if the synced flush in step 2 actually worked. It's a best effort api and it reports it's success in the response - do you check for that? Also, we can validate it worked by looking at shard level stats but any indexing might violate it. |
@bleskes Yes, I can repeat the experiment. I did check the results of synced flush and did not see any issue there. I can check again. I have a full cluster restart comping up for this week, if you have a list of things to check during that restart, I'll be more than happy to collect any helpful data! |
Here a few things to try:
|
I'm digging into this and I'm finding a few things. First, I do see failures of flushing some shards, because of "pending operations". I see 31 failures on a total of ~3K indices. Unsurprisingly, some of the failures are on our high traffic / large size indices, but some are also on smaller indices. If I understand correctly that probably means that contrary to my belief, we still have some writes going on. I'll continue digging in this direction. |
Digging further, I also see the shard mgwiki_content_first[0]` being recovered from another node
That shard was flushed without failure:
and the
By the time that shard was being recovered, the
I'm not really sure what all this means... |
Shards are still recovered when synced flush (we need to compare the id) but it's very quick.
This means that there was another sync flush and the shards got a new I'm going to close this for now. Please do reopen if things look wrong again. |
We actually still had writes going on. We found the source. I'm going to do another try tomorrow and see if I have the same result. |
With writes properly disabled, it works much better :) We are still having issues with one of our index, but this needs some more investigation on our side. @bleskes thanks a lot for the help! |
YW! thanks for coming back to report. |
Manual synced flush is worthless in some cases. The https://discuss.elastic.co/t/shard-sync-id-keeps-changing-in-read-only-cluster-es-5-5/110835 |
@larschri This is indeed a shortcoming of the current logic. The synced flush triggered by the inactivity forces a new sync id while the node is offline and thus when it comes back online it's marker is different. This is all not an issue with version 6.0 since the new ops based recovery will kick in and and recovery will be just as quick. It is, however, something that we should fix for 5.6. Can you please open a new issue? |
Thanks @larschri!
…On Fri, Dec 15, 2017 at 12:40 PM, Lars Christian Jensen < ***@***.***> wrote:
@bleskes <https://github.com/bleskes> #27838
<#27838>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#21884 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA9bJ5cjbhQD4Wgf_m0kaNsdfQ1LoFVwks5tAlqagaJpZM4LAOWo>
.
|
Elasticsearch version: 2.3.5
Plugins installed:
JVM version:
java version "1.7.0_121"
OpenJDK Runtime Environment (IcedTea 2.6.8) (7u121-2.6.8-1ubuntu0.14.04.1)
OpenJDK 64-Bit Server VM (build 24.121-b00, mixed mode)
OS version: Ubuntu Trusty (14.04)
Description of the problem including expected versus actual behavior:
When doing a rolling restart of our elasticsearch cluster, it takes ~ 1h to go back to green cluster state after restarting a node. We tried a few strategies, including:
We can see during recovery that all shards that were on the restarted node are restored from an active replica, causing a large amount of network traffic which is throttled and taking time. With our 2 main elasticsearch clusters being 24 and 31 nodes respectively, a full cluster restart takes ~24h.
With writes disabled and synced flush, I would expect the shards to be recovered from disk on the restarted node.
Steps to reproduce:
I'm at a loss as to which info might make sense to include here, but I'm more than happy to send anything that might be useful.
The text was updated successfully, but these errors were encountered: