New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resiliency: Throw exception if the JVM will corrupt data. #7580

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@rmuir
Contributor

rmuir commented Sep 4, 2014

Detect the worst-offender, known hotspot versions that can
cause index corruption, and fail on startup.

[Bootstrap] Throw exception if the JVM will corrupt data.
Detect the worst-offender, known hotspot versions that can
cause index corruption, and fail on startup.

@rmuir rmuir added v1.4.0 labels Sep 4, 2014

@@ -195,11 +198,19 @@ public static void main(String[] args) {
}
String stage = "Initialization";
try {
try {

This comment has been minimized.

@dadoonet

dadoonet Sep 4, 2014

Member

Not needed change :)

This comment has been minimized.

@rmuir

rmuir Sep 4, 2014

Contributor

I think these changes are unavoidable and auto-created by the eclipse configuration in elasticsearch :(

));
private static void checkJVM() {

This comment has been minimized.

@s1monw

s1monw Sep 4, 2014

Contributor

can we also call this from a test baseclass - I don't want our tests to run on a broken JVM?!

if (!foreground) {
Loggers.disableConsoleLogging();
System.out.close();
}
// fail if using broken version
if (Boolean.parseBoolean(System.getProperty("elasticsearch.bypass.vm.check"))) {

This comment has been minimized.

@s1monw

s1monw Sep 4, 2014

Contributor

I think we should make this a constant. For consistentcy we should use the es. prefix rather than elasticsearch.

@s1monw

This comment has been minimized.

Contributor

s1monw commented Sep 4, 2014

I left two comments - looks good in general

@spinscale

This comment has been minimized.

Member

spinscale commented Sep 4, 2014

"Please upgrade" is the last message the user gets in the exception, without having any clue where to upgrade to and which other versions are affected (maybe even the latest one is).

Can we point the user somewhere, where he gets help instead of being unspecific here? A documentation page which contains unsafe versions we could link to or something?

@dadoonet

This comment has been minimized.

Member

dadoonet commented Sep 4, 2014

+1 for @spinscale comment. I really like when projects log errors with links to documentation.

@rmuir

This comment has been minimized.

Contributor

rmuir commented Sep 4, 2014

Who will make and maintain such a page and ensure it doesnt go 404?

@clintongormley

This comment has been minimized.

Member

clintongormley commented Sep 4, 2014

@rmuir the docs build process checks all links within the docs before pushing any changes live. it also has hooks for custom checks, eg we also check the links in the config.yml file to ensure that they are still correct. We could add a custom hook for this page too.

@rmuir

This comment has been minimized.

Contributor

rmuir commented Sep 4, 2014

I'm not really a webpage designer. I think this is blowing up the scope of the issue with unnecessary requirements.

@rmuir rmuir closed this Sep 4, 2014

@dadoonet

This comment has been minimized.

Member

dadoonet commented Sep 4, 2014

I think we should think of something somehow generic. Such as what Twitter4J does.
For each error you get, you have an associated Error code which is unique.

The log says:

Relevant discussions can be found on the Internet at:
    http://www.google.co.jp/search?q=93f2523c or
    http://www.google.co.jp/search?q=8b74a4e3
TwitterException{exceptionCode=[93f2523c-8b74a4e3], statusCode=401, message=Invalid or expired token, code=89, retryAfter=-1, rateLimitStatus=null, version=3.0.3}

I think we should think about something similar but in another issue as we somehow hijacked this one :)

@clintongormley

This comment has been minimized.

Member

clintongormley commented Sep 4, 2014

Well, we need to document good versions anyway. It doesn't require any design, just a page of docs. This is versioned along with the code. You tell me what info needs to be on the page and I'll write it.

@rmuir

This comment has been minimized.

Contributor

rmuir commented Sep 4, 2014

Crashing with an empty SIGSEGV is better than today. It prevents data corruption. We don't even need an error message at all, a webpage is over the top. Its stupid to block this issue on such things.

@clintongormley

This comment has been minimized.

Member

clintongormley commented Sep 4, 2014

Rather than adding a link to the error message, which may well change in the future, let's just add "Please see the documentation" to the error message. Searching for "java version" or "jvm version" will bring up the docs section added in #7591

@nik9000

This comment has been minimized.

Contributor

nik9000 commented Sep 4, 2014

@clintongormley watch out for how personalized google can be - that probably would work for you or me but it might not work for someone who's searched for elasticsearch fewer times. Besides, if you move the page it should get a redirect. Or you could go all out and point to a version on the internet archive or something....

@clintongormley

This comment has been minimized.

Member

clintongormley commented Sep 4, 2014

@nik9000 i meant the search in the ES docs :)

won't work now, but will when #7591 is merged

@nik9000

This comment has been minimized.

Contributor

nik9000 commented Sep 4, 2014

@nik9000 i meant the search in the ES docs :)

Ah! Cool. Can you link to a page that searches the Elasticsearch docs? Something like [this](http://www.elasticsearch.org/guide/?s=java version)?

@s1monw s1monw removed the review label Sep 5, 2014

@s1monw

This comment has been minimized.

Contributor

s1monw commented Sep 5, 2014

@rmuir I left some comments - I think it's close though!

@clintongormley clintongormley changed the title from [Bootstrap] Throw exception if the JVM will corrupt data. to Resiliency: Throw exception if the JVM will corrupt data. Sep 8, 2014

@s1monw

This comment has been minimized.

Contributor

s1monw commented Sep 10, 2014

@rmuir can we move forward here it is marked as a blocker an I'd really like to move forward here

@rmuir

This comment has been minimized.

Contributor

rmuir commented Sep 10, 2014

I don't think the change will make it in. Please understand my point of view:

I didn't just take a few minutes and whip up a quick patch here, i tested extensively with all impacted hotspot versions (both Oracle JDK and openjdk/IcedTea), including testing the workaround bypass, and testing that "good versions" are ok, etc. This took me a good bit of time.

This is an exceptionally dangerous piece of functionality. With all the scope creep of this issue ("extra features" and "nice to haves" etc), comes the need to retest. There is no way I would touch this logic in a functional way, such as renaming parameters, without retesting.

So we should decide what is really a priority, and what can be a followup issue.

Currently I am fighting other fires, and I just dont have enough time to do it right, sorry.

@s1monw s1monw added v1.5.0 and removed v1.4.0.Beta1 labels Sep 11, 2014

@s1monw

This comment has been minimized.

Contributor

s1monw commented Sep 11, 2014

moved out to 1.5

@clintongormley clintongormley removed the blocker label Sep 26, 2014

@s1monw s1monw added v1.6.0 and removed v1.5.0 labels Mar 17, 2015

@s1monw

This comment has been minimized.

Contributor

s1monw commented Mar 20, 2015

@rmuir @rjernst should we revisit this or should I just close it for now?

@s1monw

This comment has been minimized.

Contributor

s1monw commented Mar 20, 2015

I really think we should use what we have and expand in other issues. This PR is a good improvement and it's been sitting here for way too long. @rmuir can you rebase this branch and add maybe missing JVMs? all the other requests can be added later.

@rjernst

This comment has been minimized.

Member

rjernst commented Mar 20, 2015

+1 to get this in as is.

@rmuir

This comment has been minimized.

Contributor

rmuir commented Mar 21, 2015

I updated the check since i had to re-test all JVMs anyway.

I refactored this out of bootstrap, made messaging more verbose, added support for compiler workaround flags, and added logic for IBM (because users have hit this on the mailing list).

Example output with different versions:

1.7.0:

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar org.elasticsearch.bootstrap.JVMCheck

Exception in thread "main" java.lang.RuntimeException: Java version: 1.7.0 suffers from critical bug https://bugs.openjdk.java.net/browse/JDK-7070134 which can cause data corruption.
Please upgrade the JVM, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.
If you absolutely cannot upgrade, please add -XX:-UseLoopPredicate to the JVM_OPTS environment variable.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -XX:-UseLoopPredicate org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:48:58 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: Workaround flag -XX:-UseLoopPredicate for bug https://bugs.openjdk.java.net/browse/JDK-7070134 found.
This will result in degraded performance!
Upgrading is preferred, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -Des.bypass.vm.check=true org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:59:57 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: bypassing jvm version check for version [1.7.0], this can result in data corruption!

1.7.0_40:

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar org.elasticsearch.bootstrap.JVMCheck

Exception in thread "main" java.lang.RuntimeException: Java version: 1.7.0_40 suffers from critical bug https://bugs.openjdk.java.net/browse/JDK-8024830 which can cause data corruption.
Please upgrade the JVM, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.
If you absolutely cannot upgrade, please add -XX:-UseSuperWord to the JVM_OPTS environment variable.
Upgrading is preferred, this workaround will result in degraded performance.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -XX:-UseSuperWord org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:51:03 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: Workaround flag -XX:-UseSuperWord for bug https://bugs.openjdk.java.net/browse/JDK-8024830 found.
This will result in degraded performance!
Upgrading is preferred, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -Des.bypass.vm.check=true org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:58:52 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: bypassing jvm version check for version [1.7.0_40], this can result in data corruption!

IBM:

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar org.elasticsearch.bootstrap.JVMCheck

Exception in thread "main" java.lang.RuntimeException: IBM runtimes suffer from several bugs which can cause data corruption.
Please upgrade the JVM, see http://www.elastic.co/guide/en/elasticsearch/reference/current/_installation.html for current recommendations.

$ java -cp elasticsearch-2.0.0-SNAPSHOT.jar:lib/lucene-core-5.1.0-snapshot-1662607.jar -Des.bypass.vm.check=true org.elasticsearch.bootstrap.JVMCheck

Mar 21, 2015 1:57:16 AM org.elasticsearch.bootstrap.JVMCheck check
WARNING: bypassing jvm version check for version [1.7.0], this can result in data corruption!

@kimchy

This comment has been minimized.

Member

kimchy commented Mar 21, 2015

LGTM

@rmuir rmuir closed this in 6da99b3 Mar 21, 2015

rmuir added a commit that referenced this pull request Mar 21, 2015

[Bootstrap] Throw exception if the JVM will corrupt data.
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes #7580

@rmuir rmuir added v1.5.0 v1.4.5 and removed v1.6.0 labels Mar 21, 2015

rmuir added a commit that referenced this pull request Mar 21, 2015

[Bootstrap] Throw exception if the JVM will corrupt data.
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes #7580

rmuir added a commit that referenced this pull request Mar 21, 2015

[Bootstrap] Throw exception if the JVM will corrupt data.
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes #7580
@clintongormley

This comment has been minimized.

Member

clintongormley commented Apr 4, 2015

w00t!

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

[Bootstrap] Throw exception if the JVM will corrupt data.
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes elastic#7580

mute pushed a commit to mute/elasticsearch that referenced this pull request Jul 29, 2015

[Bootstrap] Throw exception if the JVM will corrupt data.
Detect the worst-offenders, all IBM versions and several known hotspot
versions that can cause index corruption, and fail on startup.

Provide/detect compiler workarounds when they exist, but warn about
performance degradation.

In all cases the check can be bypassed completely with a safety
switch via undocumented system property (es.bypass.vm.check=true)

Closes elastic#7580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment