non fatal npe in warmer #8140

jillesvangurp · 2014-10-17T14:03:24Z

I've seen the stacktrace below in our integration tests a couple of time. We're starting elasticsearch as an embedded node. The error appears to be non fatal since our integration tests pass anyway. I've seen this stacktrace twice in the past week but can't reproduce it reliably.

We are running our maven tests concurrently and in randomized order, so there are a lot of integration tests hitting our elasticsearch node all at once right after it starts and reports a green status.

Using elasticsearch 1.4.0 Beta1

17-10-2014T15:56:40+0200 W warmer - [test-node-gstJI] [inbot_users_v27][2] failed to load random access for [_type:usercontact]
org.elasticsearch.common.util.concurrent.UncheckedExecutionException: java.lang.NullPointerException
at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2203) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.getAndLoadIfNotPresent(FixedBitSetFilterCache.java:132) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.access$100(FixedBitSetFilterCache.java:75) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$FixedBitSetFilterWarmer$1.run(FixedBitSetFilterCache.java:284) ~[elasticsearch-1.4.0.Beta1.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0]
at java.lang.Thread.run(Thread.java:744) [na:1.8.0]
Caused by: java.lang.NullPointerException: null
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:157) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:132) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) ~[elasticsearch-1.4.0.Beta1.jar:na]
at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2197) ~[elasticsearch-1.4.0.Beta1.jar:na]
... 8 common frames omitted

clintongormley · 2014-10-17T14:17:48Z

thanks for reporting @jillesvangurp - we'll take a look

martijnvg · 2014-10-17T15:24:26Z

@jillesvangurp I think I found a small issue where there is a small window of time that a field is unset, but the warmer that needs is running. In order to confirm this can you share how you start the embedded node and run your test? (for example you wait for green status before running?) Sharing code snippets how the node is brought up before the tests run would even be more helpful.

jillesvangurp · 2014-10-17T15:28:37Z

Sure no problem:

            String defaultIndexDirectory = "target/data-"+UUID.randomUUID().toString();
            String indexDir = config.getString("estestserver.indexdir",defaultIndexDirectory);
            String logDir = config.getString("estestserver.logdir",defaultIndexDirectory + "/logs");
            String esPort = config.getString("estestserver.port","9299");
            File file = new File(defaultIndexDirectory);
            file.mkdirs();
            LOG.info("using " + file.getAbsolutePath() + " for es data and logs");
            Settings settings = ImmutableSettings.settingsBuilder()
                    .put("name", "test-node-"+RandomStringUtils.randomAlphabetic(5))
                    .put("cluster.name", "linko-dev-cluster-"+RandomStringUtils.randomAlphabetic(5))
                    .put("index.gateway.type", "none")
                    .put("gateway.type", "none")
                    .put("discovery.zen.ping.multicast.ping.enabled", "false")
                    .put("discovery.zen.ping.multicast.enabled", "false")
                    .put("path.data", indexDir)
                    .put("path.logs", logDir)
                    .put("foreground", "true")
                    .put("http.port", esPort)
                    .put("http.cors.enabled", "true")
                    .put("http.cors.allow-origin","/https?:\\/\\/(localhost|kibana.*\\.linko\\.io)(:[0-9]+)?/")
                    .build();

            LOG.info(settings.toDelimitedString(';'));

            NodeBuilder nodeBuilder = NodeBuilder.nodeBuilder()
                    .settings(settings)
                    .loadConfigSettings(false);
            node = nodeBuilder
                    .build();


            // register a shutdown hook
            Runtime.getRuntime().addShutdownHook(new Thread() {
                @Override
                public void run() {
                    node.close();
                }
            });
            node.start();

            // wait until the shards are ready
            node.client().admin().cluster().prepareHealth().setWaitForGreenStatus().execute().actionGet();

jillesvangurp · 2014-10-18T10:39:42Z

One additional bit of information that I just realized may be relevant here is that we have a parent child relation between user and usercontact. So, the exception is happening when it is doing something with the child type. For reference, here's a gist with the full mapping for the index: https://gist.github.com/jillesvangurp/d0cd29573b876f9cc4d3

jillesvangurp · 2014-10-18T10:50:34Z

Also, we're using testNg and surefire. The elasticsearch node is started in a @BeforeSuite. We use a very large threadcount and randomized order to surface any issues related to inter test dependencies and stability of our system. This pretty much means all our integration test classes are starting at the same time. We generally use randomized test data and there are a lot of calls to /_refresh to ensure indices are committed in each tests.

<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <configuration>
                    <parallel>classes</parallel>
                    <threadCount>50</threadCount>
                    <runOrder>random</runOrder>
                    <argLine>-Xms1024m -Xmx2048m</argLine>
                    <properties>
                        <property>
                            <name>listener</name>
                            <value>io.linko.ng.testutil.TestProgressLogger</value>
                        </property>
                    </properties>
                </configuration>
            </plugin>

martijnvg · 2014-10-20T14:59:15Z

Thanks for the provided information @jillesvangurp

I opened: #8168 for this issue. Are you able to verify if the non fatal NPE doesn't occur any more with this fix in your test infrastructure?

jillesvangurp · 2014-10-20T15:11:16Z

No problem. Unfortunately, most of our builds don't trigger this exception; so it is a bit hard for me to confirm. I've only spotted it twice out of dozens of test runs over the past week. If you issue another beta, I'll be able to depend on that at least.

… prevent possible NPE. The IndicesWarmer gets set before the InternalIndexService gets set, which can lead to a small time window were InternalIndexService isn't set Closes elastic#8140 Closes elastic#8168

… prevent possible NPE. The IndicesWarmer gets set before the InternalIndexService gets set, which can lead to a small time window were InternalIndexService isn't set Closes #8140 Closes #8168

… prevent possible NPE. The IndicesWarmer gets set before the InternalIndexService gets set, which can lead to a small time window were InternalIndexService isn't set Closes elastic#8140 Closes elastic#8168

clintongormley added >bug v1.4.0 labels Oct 17, 2014

clintongormley assigned martijnvg Oct 17, 2014

martijnvg closed this as completed in 7594ccb Oct 22, 2014

martijnvg mentioned this issue Nov 3, 2014

In fixed bitset service fix order where the warmer listener is added. #8168

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

non fatal npe in warmer #8140

non fatal npe in warmer #8140

jillesvangurp commented Oct 17, 2014

clintongormley commented Oct 17, 2014

martijnvg commented Oct 17, 2014

jillesvangurp commented Oct 17, 2014

jillesvangurp commented Oct 18, 2014

jillesvangurp commented Oct 18, 2014

martijnvg commented Oct 20, 2014

jillesvangurp commented Oct 20, 2014

non fatal npe in warmer #8140

non fatal npe in warmer #8140

Comments

jillesvangurp commented Oct 17, 2014

clintongormley commented Oct 17, 2014

martijnvg commented Oct 17, 2014

jillesvangurp commented Oct 17, 2014

jillesvangurp commented Oct 18, 2014

jillesvangurp commented Oct 18, 2014

martijnvg commented Oct 20, 2014

jillesvangurp commented Oct 20, 2014