-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix a very rare case of corruption in compression used for internal cluster communication. #7210
Conversation
*/ | ||
public class CorruptedCompressorTests extends ElasticsearchTestCase { | ||
|
||
public void testCorruption() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing @Test
annotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@test doesnt do anything :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know if the method starts with test
we are good anyway but why do we use the annotation all over the place then :) Either we remove it everywhere or we stick to it I'd say...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't use @Test
unless someone makes me, for the exact reason Robert pointed out. It is just extra characters with no benefit.
Please disable unsafe encode/decode complete.
|
* This is a fork of {@link com.ning.compress.lzf.impl.VanillaChunkEncoder} to quickly fix | ||
* an extremely rare bug. See CorruptedCompressorTests for details on reproducing the bug. | ||
*/ | ||
public class ElasticsearchChunkEncoder extends VanillaChunkEncoder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, we were using "X" prefix to designate temporary implementations like this one. So, a more traditional name would be XVanillaChunkEncoder
. For example 2edde35
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed to XVanillaChunkEncoder
.
Ok, I think I addressed all the comments. The only unchanged thing is the license file, because I don't know which license to put in there (the original file had no license header). |
The PR to the compress-lzf project was merged, and a 1.0.2 release was made. I removed the X encoder and made the upgrade to 1.0.2. |
@@ -1381,6 +1381,7 @@ | |||
<!-- t-digest --> | |||
<exclude>src/main/java/org/elasticsearch/search/aggregations/metrics/percentiles/tdigest/TDigestState.java</exclude> | |||
<exclude>src/test/java/org/elasticsearch/search/aggregations/metrics/GroupTree.java</exclude> | |||
<exclude>src/test/java/org/elasticsearch/common/compress/lzf/XVanillaChunkEncoder.java</exclude> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need this with 1.0.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Removed.
internal cluster communication. See CorruptedCompressorTests for details on how this bug can be hit. This change also removes the ability to use the unsafe variant of ChunkedEncoder, removing support for the compress.lzf.decoder setting.
looks good, thanks Ryan. |
+1 as well |
Thanks. Pushed. |
The compression bug fixed in elastic#7210 can still strike us since we are running BWC test against these version. This commit disables compression forcefully if the compatibility version is < 1.3.2 to prevent debugging already known issues.
The compression bug fixed in elastic#7210 can still strike us since we are running BWC test against these version. This commit disables compression forcefully if the compatibility version is < 1.3.2 to prevent debugging already known issues.
This commit forces a full recovery if the source node is < 1.4.0 and prevents any recoveries from pre 1.3.2 nodes if compression is enabled to work around elastic#7210 Closes elastic#9922
This commit forces a full recovery if the source node is < 1.4.0 and prevents any recoveries from pre 1.3.2 nodes if compression is enabled to work around elastic#7210 Closes elastic#9922 Conflicts: src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java
Upgrading from 1.1.1 to 1.6.0 and noticing this output from our cluster
|
@taf2 Turn off compression before upgrading. |
@rjernst thanks! which kind of compression do we disable... is it this option in /etc/elasticsearch/elasticsearch.yml ? or another option? |
okay sorry it looks like we need to disable indices.recovery.compress - but is this something that needs to be disabled on all nodes in the cluster or just the new 1.6.0 node we're starting up now? |
All nodes in the cluster, before starting the upgrade. The problem is old nodes with this setting enabled would use the old buggy code, which can then cause data copied between and old and new node to become corrupted. |
excellent thank you - we have run the following on the existing cluster:
|
Thank you that did the trick! |
This commit forces a full recovery if the source node is < 1.4.0 and prevents any recoveries from pre 1.3.2 nodes if compression is enabled to work around elastic#7210 Closes elastic#9922 Conflicts: src/main/java/org/elasticsearch/indices/recovery/RecoveryTarget.java
See CorruptedCompressorTests for details on how this bug can be hit.