Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

After upgrading from ES 1.7 to 2.0, getting lots of IllegalIndexShardStateException when bulk creating indices #14927

Closed
bjorn-ali-goransson opened this issue Nov 23, 2015 · 6 comments

Comments

Projects
None yet
3 participants
@bjorn-ali-goransson
Copy link
Contributor

commented Nov 23, 2015

Hello,

We have a tool that creates a language-contenttype matrix of indices at first run (it checks what indices are missing from the matrix), which functioned well using ES 1.7 but now we're getting lots of IllegalIndexShardStateException exceptions.

So we issue a _cluster/health?wait_for_status=yellow before each index creation HTTP request, still same issue.

We wait 1000ms before the cluster health request, and 1000ms after each index creation request, still same issue.

Could it be that the yellow status is reached too fast? We always run with an empty ES instance when testing (ie remove data and logs dir before starting). Sometimes even the first index to be created seems to cause the error?

Tips for further investigation is appreciated.

[2015-11-23 07:37:58,716][INFO ][cluster.metadata         ] [my-name] [content__1fzc2hfheucqdv2_q5nmpa__en] creating index, cause [api], templates [], shards [5]/[1], mappings [content]
[2015-11-23 07:38:00,983][INFO ][cluster.metadata         ] [my-name] [content__p9p2lcwchuwe4atddvvd2w__en] creating index, cause [api], templates [], shards [5]/[1], mappings [content]
[2015-11-23 07:38:03,281][INFO ][cluster.metadata         ] [my-name] [content__tuslngocl06jrzeywjpdzw__en] creating index, cause [api], templates [], shards [5]/[1], mappings [content]
[2015-11-23 07:38:05,576][INFO ][cluster.metadata         ] [my-name] [content__hkwufjcos0urbpkg6t7dnq__en] creating index, cause [api], templates [], shards [5]/[1], mappings [content]
[2015-11-23 07:38:07,968][INFO ][cluster.metadata         ] [my-name] [content__w6mzamir00cz2uk6ot7ijw__en] creating index, cause [api], templates [], shards [5]/[1], mappings [content]
[2015-11-23 07:38:08,299][DEBUG][action.admin.indices.stats] [my-name] [indices:monitor/stats] failed to execute operation for shard [[content__w6mzamir00cz2uk6ot7ijw__en][3], node[NSXOyiZxTT63LKBFnm3ESA], [P], v[1], s[INITIALIZING], a[id=OckfAhmIRiSxOjXG8J4Mcg], unassigned_info[[reason=INDEX_CREATED], at[2015-11-23T06:38:07.969Z]]]
[content__w6mzamir00cz2uk6ot7ijw__en][[content__w6mzamir00cz2uk6ot7ijw__en][3]] BroadcastShardOperationFailedException[operation indices:monitor/stats failed]; nested: IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]];
  at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:399)
  at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:376)
  at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.messageReceived(TransportBroadcastByNodeAction.java:365)
  at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
  at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
Caused by: [content__w6mzamir00cz2uk6ot7ijw__en][[content__w6mzamir00cz2uk6ot7ijw__en][3]] IllegalIndexShardStateException[CurrentState[RECOVERING] operations only allowed when shard state is one of [POST_RECOVERY, STARTED, RELOCATED]]
  at org.elasticsearch.index.shard.IndexShard.readAllowed(IndexShard.java:957)
  at org.elasticsearch.index.shard.IndexShard.acquireSearcher(IndexShard.java:791)
  at org.elasticsearch.index.shard.IndexShard.docStats(IndexShard.java:612)
  at org.elasticsearch.action.admin.indices.stats.CommonStats.<init>(CommonStats.java:131)
  at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:165)
  at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:47)
  at org.elasticsearch.action.support.broadcast.node.TransportBroadcastByNodeAction$BroadcastByNodeTransportRequestHandler.onShardOperation(TransportBroadcastByNodeAction.java:395)
  ... 7 more
@bleskes

This comment has been minimized.

Copy link
Member

commented Nov 23, 2015

@jasontedor do you mind taking a look at this?

@jasontedor

This comment has been minimized.

Copy link
Member

commented Nov 23, 2015

The IllegalIndexShardStateExceptions that you are seeing in the log messages here are not from the index creation, but from stats requests that are being issued. These stats requests appear to be issued while the physical shards are in the state of IndexShardState.RECOVERING which is a state that physical shards pass through, for instance, upon index creation.

I think that the reason that you're seeing this behavior on 2.0 but were not seeing it on 1.7 is because of a change to the shards that are considered during an indices stats request.

This shouldn't be negatively impacting your cluster other than spamming your logs. If you want to prevent it, you'll need to track down the source of those stats requests and, if needed, make them wait for the index health to be green before issuing a stats request against the index.

@jasontedor jasontedor closed this Nov 23, 2015

@bleskes

This comment has been minimized.

Copy link
Member

commented Nov 23, 2015

@jasontedor correct if I'm wrong, but I don't see the change to IndexShard.docStats that enforces readAllowed() . Also, I think it's wrong to block stats report if the engine is already open and we are recovering. I'm thinking, for example, of reporting the translog length, or the memory signature of lucene, number of segments etc. If the engine is not yet open, we can ignore the shard as it is not relevant for the stats. I feel a trace message is more appropriate here.

@jasontedor

This comment has been minimized.

Copy link
Member

commented Nov 23, 2015

correct if I'm wrong, but I don't see the change to IndexShard.docStats that enforces readAllowed()

@bleskes I wasn't suggesting it was a change to IndexShard#docStats; the change is to TransportIndicesStatsAction#shards.

If the engine is not yet open, we can ignore the shard as it is not relevant for the stats.

The shard doesn't get counted as "failed" in the stats response.

I feel a trace message is more appropriate here.

Okay.

@bleskes

This comment has been minimized.

Copy link
Member

commented Nov 23, 2015

Oh , I see. OK :) before we didn't even try...

As I said, I think we can be less strict then readAllowed() but that's not a regression then. Re the trace - note how we ignore this error completely in the response:

 if (!TransportActions.isShardNotAvailableException(throwable)) {
                        exceptions.add(new DefaultShardOperationFailedException(throwable.getIndex(), throwable.getShardId().getId(), throwable));
                    }
@jasontedor

This comment has been minimized.

Copy link
Member

commented Nov 23, 2015

Yes, see #14950.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.