Make refresh a replicated action #13068

brwe · 2015-08-24T09:41:01Z

prerequisite to #9421
see also #12600

s1monw · 2015-08-25T09:12:11Z

core/src/main/java/org/elasticsearch/action/admin/indices/flush/TransportShardFlushAction.java

+        logger.trace("{} flush request executed on replica", indexShard.shardId());
+    }
+    @Override
+    protected boolean resolveIndex() {


I know it's unrelated to this change but why don't we have defaults for these boolean they look the same in 90% fo the cases?

I don't know. Change that in this pr or better in another one?

we can do another one

opened #13218

s1monw · 2015-08-25T09:19:32Z

I love this change looks very good. I left some comments around unittesting

bleskes · 2015-08-28T13:51:54Z

core/src/main/java/org/elasticsearch/action/admin/indices/flush/TransportShardFlushAction.java

+
+    @Override
+    protected ShardIterator shards(ClusterState clusterState, InternalRequest request) {
+        return clusterService.operationRouting().shards(clusterService.state(), request.concreteIndex(), request.request().getShardId().id()).shardsIt();


can we use the incoming state here?

also, why not use indexShards() now that we see this as a write op - then we don't need to change the visibility of the shards methond on Operation Routing

can we use the incoming state here?

yes

also, why not use indexShards() now that we see this as a write op

you mean clusterService.operationRouting().indexShards()? that one needs a type and id and we don't have that here. or is there another one that does not?

indeed the indexShards suggestion is bad - it is not the right construct here as it is tied to a single doc. Since we are after shard ids here (not grouping) , I think we should simplify the API to return a list of shardIds (which will solve this too). See comments here : https://github.com/elastic/elasticsearch/pull/13068/files#r38203633

This is now a method that returns List: https://github.com/elastic/elasticsearch/pull/13068/files#diff-8ec8c1c769c4acb6f880e4e15d2b96f6R120 Is that what your meant?

scatch that, different shards() method...

bleskes · 2015-08-28T14:40:36Z

I agree with @s1monw that this looks great. I left minor comments here and there.

s1monw · 2015-08-31T12:30:47Z

LGTM

brwe · 2015-08-31T13:34:20Z

@s1monw thanks for the review!

addressed all comments. @bleskes want to have one more look?

bleskes · 2015-08-31T14:04:28Z

core/src/test/java/org/elasticsearch/action/support/replication/BroadcastReplicationTests.java

+
+    public BroadcastResponse executeAndAssertImmediateResponse(TransportBroadcastReplicationAction refreshAction, BroadcastRequest request) throws InterruptedException, ExecutionException {
+        Date beginDate = new Date();
+        BroadcastResponse response = (BroadcastResponse) refreshAction.execute(request).get();


you can use

BroadcastResponse response = (BroadcastResponse) refreshAction.execute(request).actionGet("30s");

which will throw an exception if things take longer than 30s.

Btw - if you need to measure duration , please use System.nanoTime(), which doesn't suffer from ntp corrections.

much nicer!

bleskes · 2015-08-31T14:15:22Z

Looks awesome. I only miss a test for the aggregation of results from multiple shard level responses.

brwe · 2015-08-31T15:37:06Z

@bleskes addressed all comments and added test here: https://github.com/elastic/elasticsearch/pull/13068/files#diff-6030559b5ed4d55d9a754523f5c6ce6dR137

bleskes · 2015-08-31T15:58:39Z

LGTM! (minor comments, no need for another review)

prerequisite to elastic#9421 see also elastic#12600

Make refresh a replicated action

Make refresh a replicated action Conflicts: core/src/main/java/org/elasticsearch/action/admin/indices/flush/TransportFlushAction.java core/src/main/java/org/elasticsearch/action/admin/indices/refresh/TransportRefreshAction.java core/src/test/java/org/elasticsearch/action/IndicesRequestIT.java

Since #13068 refresh and flush requests go to the primary first and are then replicated. One difference to before is though that if a shard is not available (INITIALIZING for example) we wait a little for an indexing request but for refresh we don't and just give up immediately. Before, refresh requests were just send to the shards regardless of what their state is. In tests we sometimes create an index, issue an indexing request, refresh and then get the document. But we do not wait until all nodes know that all primaries have ben assigned. Now potentially one node can be one cluster state behind and not know yet that the shards have ben started. If the refresh is executed through this node then the refresh request will silently fail on shards that are started already because from the nodes perspective they are still initializing. As a consequence, documents that expected to be available in the test are now not. Example test failures are here: http://build-us-00.elastic.co/job/elasticsearch-20-oracle-jdk7/395/ This commit changes the timeout to 1m (default) to make sure we don't miss shards when we refresh. This will trigger the same retry mechanism as for indexing requests. We still have to make a decision if this change of behavior is acceptable. see #13238

Currently, we do not allow reads on shards which are in POST_RECOVERY which unfortunately can cause search failures on shards which just recovered if there no replicas (elastic#9421). The reason why we did not allow reads on shards that are in POST_RECOVERY is that after relocating a shard might miss a refresh if the node that executed the refresh is behind with cluster state processing. If that happens, a user might execute index/refresh/search but still not find the document that was indexed. We changed how refresh works now in elastic#13068 to make sure that shards cannot miss a refresh this way by sending refresh requests the same way that we send write requests. This commit changes IndexShard to allow reads on POST_RECOVERY now. In addition it adds two test: - test for issue elastic#9421 (After relocation shards might temporarily not be searchable if still in POST_RECOVERY) - test for visibility issue with relocation and refresh if reads allowed when shard is in POST_RECOVERY closes elastic#9421

Currently, we do not allow reads on shards which are in POST_RECOVERY which unfortunately can cause search failures on shards which just recovered if there no replicas (#9421). The reason why we did not allow reads on shards that are in POST_RECOVERY is that after relocating a shard might miss a refresh if the node that executed the refresh is behind with cluster state processing. If that happens, a user might execute index/refresh/search but still not find the document that was indexed. We changed how refresh works now in #13068 to make sure that shards cannot miss a refresh this way by sending refresh requests the same way that we send write requests. This commit changes IndexShard to allow reads on POST_RECOVERY now. In addition it adds two test: - test for issue #9421 (After relocation shards might temporarily not be searchable if still in POST_RECOVERY) - test for visibility issue with relocation and refresh if reads allowed when shard is in POST_RECOVERY closes #9421

…n action Before elastic#13068 refresh and flush ignored all exceptions that matched TransportActions.isShardNotAvailableException(e) and this should not change. In addition, refresh and flush which are based on broadcast replication might now get UnavailableShardsException from TransportReplicationAction if a shard is unavailable and this is not caught by TransportActions.isShardNotAvailableException(e). This must be ignored as well.

…n action Before #13068 refresh and flush ignored all exceptions that matched TransportActions.isShardNotAvailableException(e) and this should not change. In addition, refresh and flush which are based on broadcast replication might now get UnavailableShardsException from TransportReplicationAction if a shard is unavailable and this is not caught by TransportActions.isShardNotAvailableException(e). This must be ignored as well.

brwe mentioned this pull request Aug 24, 2015

Fix for search failures if shard is in POST_RECOVERY #12600

Closed

clintongormley added :Internal >bug labels Aug 24, 2015

s1monw reviewed Aug 25, 2015
View reviewed changes

brwe force-pushed the broadcast_replication branch from 3e47662 to 23bdc5a Compare August 28, 2015 12:45

bleskes reviewed Aug 28, 2015
View reviewed changes

brwe force-pushed the broadcast_replication branch 2 times, most recently from 4400824 to 5c19e15 Compare August 31, 2015 12:26

brwe mentioned this pull request Aug 31, 2015

add default impl for resolveIndex() #13218

Merged

bleskes reviewed Aug 31, 2015
View reviewed changes

Make refresh a replicated action

d81f426

prerequisite to elastic#9421 see also elastic#12600

brwe force-pushed the broadcast_replication branch from c29cabb to d81f426 Compare September 1, 2015 07:20

brwe added a commit that referenced this pull request Sep 1, 2015

Merge pull request #13068 from brwe/broadcast_replication

333831c

Make refresh a replicated action

brwe merged commit 333831c into elastic:master Sep 1, 2015

brwe added v2.1.0 v2.0.0 labels Sep 1, 2015

brwe mentioned this pull request Sep 1, 2015

Timeout for refresh and flush #13238

Closed

brwe mentioned this pull request Sep 1, 2015

Allow reads on shards that are in POST_RECOVERY #13246

Merged

brwe mentioned this pull request Sep 4, 2015

Fix exception handling for unavailable shards in broadcast replication action #13341

Merged

clintongormley added v2.0.0-beta2 and removed v2.0.0 labels Sep 14, 2015

clintongormley removed the v2.1.0 label Nov 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make refresh a replicated action #13068

Make refresh a replicated action #13068

brwe commented Aug 24, 2015

s1monw Aug 25, 2015

brwe Aug 28, 2015

s1monw Aug 31, 2015

brwe Aug 31, 2015

s1monw commented Aug 25, 2015

bleskes Aug 28, 2015

bleskes Aug 28, 2015

brwe Aug 31, 2015

bleskes Aug 31, 2015

brwe Aug 31, 2015

brwe Aug 31, 2015

bleskes commented Aug 28, 2015

s1monw commented Aug 31, 2015

brwe commented Aug 31, 2015

bleskes Aug 31, 2015

brwe Aug 31, 2015

bleskes commented Aug 31, 2015

brwe commented Aug 31, 2015

bleskes commented Aug 31, 2015

Make refresh a replicated action #13068

Make refresh a replicated action #13068

Conversation

brwe commented Aug 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Aug 25, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes commented Aug 28, 2015

s1monw commented Aug 31, 2015

brwe commented Aug 31, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bleskes commented Aug 31, 2015

brwe commented Aug 31, 2015

bleskes commented Aug 31, 2015