-
Notifications
You must be signed in to change notification settings - Fork 24.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add checksum
option for index.shard.check_on_startup
#9183
Conversation
The current "checkindex" on startup is very very expensive. This is like running one of the old school hard drive diagnostic checkers and usually not a good idea. But we can do a CRC32 verification of files. We don't even need to open an indexreader to do this, its much more lightweight. This patch also sets it as the default. I think its a really good idea, but we don't have to do that here. No tests yet.
Note: the other idea i had was to only do it by default when old segment files are present. This is more complicated but maybe a better default. |
I like the idea a lot we might even be able to keep it on by default? |
LGTM. +1 to having it as the default |
Mainly what i want to catch here is the situation where there is problems on upgrade. Otherwise problems may not be found until later when its harder to deal with. Note I discovered some issues for 1.x branch here (assuming the code is the same). We need to revisit logic in #9142 and either also handle it in these Store verification methods, which are already used in other places in the code, or just refactor it to handle it in a different place in the Store code (e.g. when we know the old checksum is useless, null it out in metadata so nothing goes wrong). |
I think i found another existing bug? if you use this option today, it will leak a lock because it does not close CheckIndex. |
good catch I fixed this in other places to :) |
regarding setting it by default, I would love to run a quick test on, lets say, a 200gb index (on AWS for example) and see how long it takes to do the checksum checks? Historically, ES used to do it (though it was more expensive computationally wise), and it slowed down full cluster restart by hours :(. I didn't know back then if the time was spent for computation, or just reading a lot of data, the cluster in question was quite big, almost a petabyte of data. So it would be good just to have an idea about the cost of it. |
I don't want to argue about defaults, lets just turn it off. I at least need this option to improve testing. |
@rmuir do me a favor and go ahead with |
Yeah, i just want to turn it on in tests. especially static backwards index ones. honestly we don't even need to publicize/document the option yet. we can defer all of that. |
Conflicts: src/main/java/org/elasticsearch/index/shard/IndexShard.java
I turned this off by default, but enabled it in tests. I think its important we make this step because again, the existing code leaks write.lock etc. |
Latest changes LGTM. |
checksum
option for index.shard.check_on_startup
The current "checkindex" on startup is very very expensive. This is
like running one of the old school hard drive diagnostic checkers and
usually not a good idea.
But we can do a CRC32 verification of files. We don't even need to
open an indexreader to do this, its much more lightweight.
This patch also sets it as the default. I think its a really good idea,
but we don't have to do that here.
No tests yet.