New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovery detects false corruption if legacy checksums are present for a new written segment #8587

Closed
s1monw opened this Issue Nov 21, 2014 · 1 comment

Comments

Projects
None yet
1 participant
@s1monw
Contributor

s1monw commented Nov 21, 2014

BWC tests run into a failure this morning which is caused by the verification of the old adler 32 checksums we added recently.

http://build-us-00.elasticsearch.org/job/es_bwc_1x/5047/CHECK_BRANCH=tags%2Fv1.2.4,jdk=JDK7,label=bwc/

the problem here is the following

  • index was created with es 1.2.4 which still records Adler32 checksums in the legacy checksum file
  • we flush but apparently the last segment _h didn't make it into the commit but was recorded in the checksums file.
  • we uprade the node to 1.4.1-SNAPSHOT - the index opens just fine
  • we apply the transaction log and IndexWriter starts writing a segment _h
    • note: now we have a Adler32 checksum for _h in the checksum file but the files are actually not the once that where checksummed.
  • since we have a replica we initiate a recovery and in our Store.java code line #638 we prefer the adler checksum even though we could get the original checksum from lucene.
  • on recovery we now compare the checksums and they obviously don't match - in turn fail the primary 👎

I think the fix here is to prefer new checksums since they are taken from the file if we know we have them....

@s1monw

This comment has been minimized.

Show comment
Hide comment
@s1monw

s1monw Nov 21, 2014

Contributor

just as a sidenote - the code that has the problem was never released to this is not affecting 1.4.0

Contributor

s1monw commented Nov 21, 2014

just as a sidenote - the code that has the problem was never released to this is not affecting 1.4.0

s1monw added a commit to s1monw/elasticsearch that referenced this issue Nov 21, 2014

[STORE] Use Lucene checksums if segment version is >= 4.9.0
We started to use the lucene CRC32 checksums instead of the legacy Adler32
in `v1.3.0` which was the first version using lucene `4.9.0`. We can safely
assume that if the segment was written with this version that checksums
from lucene can be used even if the legacy checksum claims that it has a Adler32
for a given file / segment.

Closes #8587

@s1monw s1monw closed this in b6b3382 Nov 21, 2014

@s1monw s1monw removed the v1.4.1 label Mar 17, 2015

@s1monw s1monw added the v1.4.5 label Mar 17, 2015

s1monw added a commit that referenced this issue Mar 17, 2015

[STORE] Use Lucene checksums if segment version is >= 4.9.0
We started to use the lucene CRC32 checksums instead of the legacy Adler32
in `v1.3.0` which was the first version using lucene `4.9.0`. We can safely
assume that if the segment was written with this version that checksums
from lucene can be used even if the legacy checksum claims that it has a Adler32
for a given file / segment.

Closes #8587

Conflicts:
	src/main/java/org/elasticsearch/index/store/Store.java

@s1monw s1monw added the resiliency label Mar 19, 2015

mute pushed a commit to mute/elasticsearch that referenced this issue Jul 29, 2015

[STORE] Use Lucene checksums if segment version is >= 4.9.0
We started to use the lucene CRC32 checksums instead of the legacy Adler32
in `v1.3.0` which was the first version using lucene `4.9.0`. We can safely
assume that if the segment was written with this version that checksums
from lucene can be used even if the legacy checksum claims that it has a Adler32
for a given file / segment.

Closes #8587

Conflicts:
	src/main/java/org/elasticsearch/index/store/Store.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment