[Close Index API] Add unique UUID to ClusterBlock #36775

tlrx · 2018-12-18T14:59:19Z

Note: this pull request will be merged in the close-index-api-refactoring branch

This pull request add a unique id to cluster blocks, so that they can be uniquely identified if needed. This is important for the Close Index API where multiple concurrent closing requests can be executed at the same time. By adding a UUID to cluster block, we can generate unique "index closed" blocks that can be verified on shards and then checked again from the cluster state before closing the index.

elasticmachine · 2018-12-18T14:59:21Z

Pinging @elastic/es-distributed

...java/org/elasticsearch/action/admin/indices/close/TransportVerifyShardBeforeCloseAction.java

server/src/main/java/org/elasticsearch/cluster/block/ClusterBlocks.java

ywelsch · 2018-12-18T15:34:14Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

-            if (currentState.blocks().hasIndexBlock(index.getName(), INDEX_CLOSED_BLOCK) == false) {
-                blocks.addIndexBlock(index.getName(), INDEX_CLOSED_BLOCK);
-            }
+            blocks.addIndexBlock(index.getName(), closingBlock);


what if the index already has a close block? Do we keep on adding these blocks for every close request? Is there a risk here for the cluster state to become very large in case of many close requests?
Should we replace the current close block by a new one or perhaps limit the number of close blocks?

Hum. I think that we should always keep the last index closed block only and replace the current close block by a new one.

I think we have different options here, with different tradeoffs.

when starting a close on an index that is not yet fully closed, replace the current closed block by a new one. That means that the possibly ongoing close action cannot successfully complete and the new one will be the one to successfully complete. This is problematic because concurrent calls to close will now make them fail.

add a new closed block next to the already existing one. Opening will erase all closed blocks. That allows concurrent closes to successfully complete, but possibly adds many blocks to cluster state.

reuse the existing closed block if there is already one. This combines the advantages of both solutions above.

* reuse the existing closed block if there is already one. This combines the advantages of both solutions above.

It is indeed more elegant but if multiple close actions share the same unique block the actions become kind of dependant of each others, no? If the first response received is a failure (let's say, a shard is relocating and in INITIALIZING state) the index block should be removed, but we could have potential responses with the same block that arrive later and they won't be able to succeed.

The reason why the block carries a unique id is to make sure that you can distinguish the
add close block -> moved index state to closed from the add close block -> open index (removes close block) -> add close block (from another call to close) -> moved index state to closed (from the original close block call) situation.

If the first response received is a failure (let's say, a shard is relocating and in INITIALIZING state) the index block should be removed, but we could have potential responses with the same block that arrive later and they won't be able to succeed.

Things might be simpler if we were to never automatically clean up these blocks. In case where the block is put in place, the system should do everything it can to see the closing to completion, and not go back and reopen the index.

ywelsch · 2018-12-18T15:41:32Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

            try {
                final IndexMetaData indexMetaData = metadata.getSafe(index);
-                if (indexMetaData.getState() != IndexMetaData.State.CLOSE) {
-                    if (result.getValue().isAcknowledged()) {
+                assert currentState.blocks().hasIndexBlock(index.getName(), closingBlock);


how do we know that the close block has not been removed? An open command could have come in in the mean-time, removing the close block?

I changed the logic here to accommodate with keeping only the last index closed block.

An open command could have come in in the mean-time, removing the close block?

Opening indices only removes blocks and changes state of indices that are not OPEN so I think we're good and block should not be removed by an open command.

tlrx · 2018-12-19T15:37:04Z

@ywelsch I've updated the code. Can you have another look please?

tlrx · 2018-12-19T15:41:28Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

-                if (indexMetaData.getState() != IndexMetaData.State.CLOSE) {
-                    if (result.getValue().isAcknowledged()) {
-                        logger.debug("closing index {} succeed, removing index routing table", index);
+                if (acknowledged) {


I tried to make this logic easy to understand but we can maybe do better, any suggestion welcome.

I also think that if we go with always keeping a single block in cluster state we should compute the AcknowledgedResponse by diffing the current state and the updated state.

This reverts commit 79e2e99

tlrx · 2018-12-21T11:06:29Z

@ywelsch I updated the code according to our last discussion:

when concurrent close actions are executed, a unique per index closing block is reused if it exists
the closing block remains if the close action failed
the unique closing block is replaced by the existing INDEX_CLOSED_BLOCK once close action succeeded
open actions removes all closing blocks

I also added a bit more tests. Let me know what you think! Thanks

ywelsch

I've left a few more comments. The logic looks better.

server/src/main/java/org/elasticsearch/cluster/block/ClusterBlock.java

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

ywelsch · 2018-12-21T12:50:50Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

+                final ClusterBlock closingBlock = blockedIndices.get(index);
+                if (currentState.blocks().hasIndexBlock(index.getName(), closingBlock) == false) {
+                    logger.debug("closing index {} succeed but block has been removed in the mean time", index);
+                    continue;


why has it succeeded here? The index is not closed and the block has disappeared, so it must have been reopened in the meanwhile. This should be counted as a failure, not a success.

Well, the message in the log is confusing. What succeeded here is the verification of shards using the block that has now disappeared. I changed to change the log message.

This should be counted as a failure, not a success.

I agree. I pushed a change that combines the results of the TransportVerifyShardBeforeCloseAction and the cluster state changes in order to determine if the current close action has indeed closed the index.

server/src/test/java/org/elasticsearch/indices/state/ReopenWhileClosingIT.java

ywelsch · 2018-12-21T12:55:29Z

server/src/test/java/org/elasticsearch/indices/state/ReopenWhileClosingIT.java

+        assertAcked(client().admin().indices().prepareOpen(reopenedIndices.toArray(Strings.EMPTY_ARRAY)));
+
+        releaseBlock.close();
+        closeIndexResponse.get();


should this throw an exception / convey a failed closing?

The close action should not be acknowledged so we can check that.

When we'll improve the Close Index API response we can decide if we want to throw an exception or report an error message. For now I think we should not change the API and just not ack the response.

ok, can we assert that acked == false here then?

It has been changed to assertFalse(closeIndexResponse.get().isAcknowledged());, I think we're good

tlrx · 2018-12-21T14:53:23Z

@ywelsch Thanks a lot! I've updated the code again.

ywelsch · 2018-12-21T16:00:22Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

                // Create a new index closed block
-                indexBlock = createIndexClosedBlock();
+                indexBlock = createIndexClosingBlock();
+                assert Strings.hasLength(indexBlock.uuid()) : "Closing block should have a UUID";


this assertion should also cover the case where indexBlock was not previously null, i.e., rewrite this as:

} else { if (indexBlock == null) { indexBlock = createIndexClosingBlock(); } assert Strings.hasLength(indexBlock.uuid()) : "Closing block should have a UUID"; }

ywelsch · 2018-12-21T16:10:59Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

+                                                    if (result.getValue().isAcknowledged()) {
+                                                        if (updatedMetaData.getState() == IndexMetaData.State.CLOSE) {
+                                                            IndexMetaData previousMetaData = currentState.metaData().index(result.getKey());
+                                                            if (previousMetaData != null) {


how could this ever be null?

I can't, I tend to be too defensive on NPEs.

ywelsch · 2018-12-21T16:15:28Z

server/src/main/java/org/elasticsearch/cluster/metadata/MetaDataIndexStateService.java

+                                                        if (updatedMetaData.getState() == IndexMetaData.State.CLOSE) {
+                                                            IndexMetaData previousMetaData = currentState.metaData().index(result.getKey());
+                                                            if (previousMetaData != null) {
+                                                                acknowledged = (previousMetaData.getState() == IndexMetaData.State.OPEN);


I'm not sure I follow this logic here. Wouldn't it be much simpler just to say. If the index is closed at this point, all is good. If not, something must have gone wrong, thereby acked == false.

This amounts to the following simple logic:

for (Map.Entry<Index, AcknowledgedResponse> result : results.entrySet()) { IndexMetaData updatedMetaData = updatedState.metaData().index(result.getKey()); if (updatedMetaData != null && updatedMetaData.getState() != IndexMetaData.State.CLOSE) { acknowledged = false; break; } }

Agreed, let's do that.

As suggested in #36775, this pull request renames the following methods: ClusterBlocks.hasGlobalBlock(int) ClusterBlocks.hasGlobalBlock(RestStatus) ClusterBlocks.hasGlobalBlock(ClusterBlockLevel) to something that better reflects the property of the ClusterBlock that is searched for: ClusterBlocks.hasGlobalBlockWithId(int) ClusterBlocks.hasGlobalBlockWithStatus(RestStatus) ClusterBlocks.hasGlobalBlockWithLevel(ClusterBlockLevel)

tlrx · 2019-01-07T10:09:31Z

Thanks @ywelsch for the review. I fixed conflicts and updated the code. Let me know if you have more comments or concerns. Thanks

…of indices to reopen)

tlrx · 2019-01-07T11:42:09Z

I pushed a fix for a test bug in ReopenWhileClosingIT (the test tried to execute an open index request by passing an empty list of indices). Other tests failures are caused by the target feature branch not being up to date with master.

ywelsch

LGTM

tlrx · 2019-01-07T15:45:28Z

Thanks @ywelsch, and sorry for the numerous comments you had to make.

This commit backports to 6.x of the Close Index API refactoring. It cherry-picks the following commits from master: 3ca885e [Close Index API] Add TransportShardCloseAction for pre-closing verifications (#36249) 8e5dd20 [Close Index API] Refactor MetaDataIndexStateService (#36354) 7372529 [Tests] Reduce randomization in CloseWhileRelocatingShardsIT (#36694) 103c4d4 [Close Index API] Mark unavailable shard copy as stale during verification (#36755) 1959388 [Close Index API] Propagate tasks ids between Freeze, Close and Verify(#36630) e149b08 [Close Index API] Add unique UUID to ClusterBlock (#36775) dc371ef [Tests] Fix ReopenWhileClosingIT with correct min num shards The following two commits were needed to adapt the change to 6.x: ef6ae69 [Close Index API] Adapt MetaDataIndexStateServiceTests after merge 21b7653 [Tests] Adapt CloseIndexIT tests for 6.x Related to #33888

[Close Index API] Add UUID to ClusterBlock

07d9144

tlrx added >enhancement v7.0.0 :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Dec 18, 2018

tlrx requested a review from ywelsch December 18, 2018 14:59

ywelsch reviewed Dec 18, 2018

View reviewed changes

tlrx added 4 commits December 19, 2018 09:36

missing space

e976493

hasIndexBlockWithId

62000b2

Keep last cluster block only

79e2e99

Fix log message

48c9269

tlrx requested a review from ywelsch December 19, 2018 15:00

tlrx commented Dec 19, 2018

View reviewed changes

tlrx added 4 commits December 19, 2018 16:45

Merge branch 'close-index-api-refactoring' into add-cluster-block-uuid

590e3e6

Revert "Keep last cluster block only"

ec7cc6f

This reverts commit 79e2e99

Reuse blocks

7caf533

Revert TcpTransport.java

a2af914

tlrx added 2 commits December 21, 2018 12:27

Merge branch 'close-index-api-refactoring' into add-cluster-block-uuid

a2f735e

Fix TransportVerifyShardBeforeCloseActionTests

a1c1856

ywelsch suggested changes Dec 21, 2018

View reviewed changes

Apply more feedback

faab39a

tlrx requested a review from ywelsch December 21, 2018 14:53

tlrx mentioned this pull request Dec 21, 2018

Rename ClusterBlocks.hasGlobalBlock methods #36941

Merged

ywelsch suggested changes Dec 21, 2018

View reviewed changes

Fix conflicts with base branch

9d40a5d

Apply feedback

349f254

Fix ReopenWhileClosingIT (OpenIndexRequest requires a non empty list …

6b10564

…of indices to reopen)

ywelsch approved these changes Jan 7, 2019

View reviewed changes

tlrx added 2 commits January 7, 2019 13:29

Unmute WaitForRefreshAndCloseIT

692ea01

Merge branch 'close-index-api-refactoring' into add-cluster-block-uuid

798b4f5

tlrx merged commit e149b08 into elastic:close-index-api-refactoring Jan 7, 2019

tlrx deleted the add-cluster-block-uuid branch January 7, 2019 15:45

This was referenced Jan 7, 2019

Replicate closed indices #33888

Closed

Backport the Close Index API refactoring to 6.x #37359

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Close Index API] Add unique UUID to ClusterBlock #36775

[Close Index API] Add unique UUID to ClusterBlock #36775

tlrx commented Dec 18, 2018

elasticmachine commented Dec 18, 2018

ywelsch Dec 18, 2018

tlrx Dec 19, 2018

ywelsch Dec 19, 2018

tlrx Dec 19, 2018

ywelsch Dec 19, 2018

ywelsch Dec 18, 2018

tlrx Dec 19, 2018

tlrx commented Dec 19, 2018

tlrx Dec 19, 2018

tlrx commented Dec 21, 2018

ywelsch left a comment

ywelsch Dec 21, 2018

tlrx Dec 21, 2018

ywelsch Dec 21, 2018

tlrx Dec 21, 2018

ywelsch Jan 7, 2019

tlrx Jan 7, 2019

tlrx commented Dec 21, 2018

ywelsch Dec 21, 2018

tlrx Jan 7, 2019

ywelsch Dec 21, 2018

tlrx Jan 7, 2019

ywelsch Dec 21, 2018

tlrx Jan 7, 2019

tlrx commented Jan 7, 2019

tlrx commented Jan 7, 2019

ywelsch left a comment

tlrx commented Jan 7, 2019

[Close Index API] Add unique UUID to ClusterBlock #36775

[Close Index API] Add unique UUID to ClusterBlock #36775

Conversation

tlrx commented Dec 18, 2018

elasticmachine commented Dec 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Dec 19, 2018

Choose a reason for hiding this comment

tlrx commented Dec 21, 2018

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Dec 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlrx commented Jan 7, 2019

tlrx commented Jan 7, 2019

ywelsch left a comment

Choose a reason for hiding this comment

tlrx commented Jan 7, 2019