Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Close Index API] Add unique UUID to ClusterBlock #36775

Merged
merged 17 commits into from
Jan 7, 2019

Conversation

tlrx
Copy link
Member

@tlrx tlrx commented Dec 18, 2018

Note: this pull request will be merged in the close-index-api-refactoring branch

This pull request add a unique id to cluster blocks, so that they can be uniquely identified if needed. This is important for the Close Index API where multiple concurrent closing requests can be executed at the same time. By adding a UUID to cluster block, we can generate unique "index closed" blocks that can be verified on shards and then checked again from the cluster state before closing the index.

@tlrx tlrx added >enhancement v7.0.0 :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Dec 18, 2018
@tlrx tlrx requested a review from ywelsch December 18, 2018 14:59
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

if (currentState.blocks().hasIndexBlock(index.getName(), INDEX_CLOSED_BLOCK) == false) {
blocks.addIndexBlock(index.getName(), INDEX_CLOSED_BLOCK);
}
blocks.addIndexBlock(index.getName(), closingBlock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the index already has a close block? Do we keep on adding these blocks for every close request? Is there a risk here for the cluster state to become very large in case of many close requests?
Should we replace the current close block by a new one or perhaps limit the number of close blocks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum. I think that we should always keep the last index closed block only and replace the current close block by a new one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have different options here, with different tradeoffs.

  • when starting a close on an index that is not yet fully closed, replace the current closed block by a new one. That means that the possibly ongoing close action cannot successfully complete and the new one will be the one to successfully complete. This is problematic because concurrent calls to close will now make them fail.
  • add a new closed block next to the already existing one. Opening will erase all closed blocks. That allows concurrent closes to successfully complete, but possibly adds many blocks to cluster state.
  • reuse the existing closed block if there is already one. This combines the advantages of both solutions above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* reuse the existing closed block if there is already one. This combines the advantages of both solutions above.

It is indeed more elegant but if multiple close actions share the same unique block the actions become kind of dependant of each others, no? If the first response received is a failure (let's say, a shard is relocating and in INITIALIZING state) the index block should be removed, but we could have potential responses with the same block that arrive later and they won't be able to succeed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why the block carries a unique id is to make sure that you can distinguish the
add close block -> moved index state to closed from the add close block -> open index (removes close block) -> add close block (from another call to close) -> moved index state to closed (from the original close block call) situation.

If the first response received is a failure (let's say, a shard is relocating and in INITIALIZING state) the index block should be removed, but we could have potential responses with the same block that arrive later and they won't be able to succeed.

Things might be simpler if we were to never automatically clean up these blocks. In case where the block is put in place, the system should do everything it can to see the closing to completion, and not go back and reopen the index.

try {
final IndexMetaData indexMetaData = metadata.getSafe(index);
if (indexMetaData.getState() != IndexMetaData.State.CLOSE) {
if (result.getValue().isAcknowledged()) {
assert currentState.blocks().hasIndexBlock(index.getName(), closingBlock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do we know that the close block has not been removed? An open command could have come in in the mean-time, removing the close block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic here to accommodate with keeping only the last index closed block.

An open command could have come in in the mean-time, removing the close block?

Opening indices only removes blocks and changes state of indices that are not OPEN so I think we're good and block should not be removed by an open command.

@tlrx tlrx requested a review from ywelsch December 19, 2018 15:00
@tlrx
Copy link
Member Author

tlrx commented Dec 19, 2018

@ywelsch I've updated the code. Can you have another look please?

if (indexMetaData.getState() != IndexMetaData.State.CLOSE) {
if (result.getValue().isAcknowledged()) {
logger.debug("closing index {} succeed, removing index routing table", index);
if (acknowledged) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to make this logic easy to understand but we can maybe do better, any suggestion welcome.

I also think that if we go with always keeping a single block in cluster state we should compute the AcknowledgedResponse by diffing the current state and the updated state.

@tlrx
Copy link
Member Author

tlrx commented Dec 21, 2018

@ywelsch I updated the code according to our last discussion:

  • when concurrent close actions are executed, a unique per index closing block is reused if it exists
  • the closing block remains if the close action failed
  • the unique closing block is replaced by the existing INDEX_CLOSED_BLOCK once close action succeeded
  • open actions removes all closing blocks

I also added a bit more tests. Let me know what you think! Thanks

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few more comments. The logic looks better.

final ClusterBlock closingBlock = blockedIndices.get(index);
if (currentState.blocks().hasIndexBlock(index.getName(), closingBlock) == false) {
logger.debug("closing index {} succeed but block has been removed in the mean time", index);
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why has it succeeded here? The index is not closed and the block has disappeared, so it must have been reopened in the meanwhile. This should be counted as a failure, not a success.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the message in the log is confusing. What succeeded here is the verification of shards using the block that has now disappeared. I changed to change the log message.

This should be counted as a failure, not a success.

I agree. I pushed a change that combines the results of the TransportVerifyShardBeforeCloseAction and the cluster state changes in order to determine if the current close action has indeed closed the index.

assertAcked(client().admin().indices().prepareOpen(reopenedIndices.toArray(Strings.EMPTY_ARRAY)));

releaseBlock.close();
closeIndexResponse.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this throw an exception / convey a failed closing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The close action should not be acknowledged so we can check that.

When we'll improve the Close Index API response we can decide if we want to throw an exception or report an error message. For now I think we should not change the API and just not ack the response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, can we assert that acked == false here then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been changed to assertFalse(closeIndexResponse.get().isAcknowledged());, I think we're good

@tlrx tlrx requested a review from ywelsch December 21, 2018 14:53
@tlrx
Copy link
Member Author

tlrx commented Dec 21, 2018

@ywelsch Thanks a lot! I've updated the code again.

// Create a new index closed block
indexBlock = createIndexClosedBlock();
indexBlock = createIndexClosingBlock();
assert Strings.hasLength(indexBlock.uuid()) : "Closing block should have a UUID";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assertion should also cover the case where indexBlock was not previously null, i.e., rewrite this as:

} else {
  if (indexBlock == null) {
    indexBlock = createIndexClosingBlock();
  }
  assert Strings.hasLength(indexBlock.uuid()) : "Closing block should have a UUID";
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

if (result.getValue().isAcknowledged()) {
if (updatedMetaData.getState() == IndexMetaData.State.CLOSE) {
IndexMetaData previousMetaData = currentState.metaData().index(result.getKey());
if (previousMetaData != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how could this ever be null?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't, I tend to be too defensive on NPEs.

if (updatedMetaData.getState() == IndexMetaData.State.CLOSE) {
IndexMetaData previousMetaData = currentState.metaData().index(result.getKey());
if (previousMetaData != null) {
acknowledged = (previousMetaData.getState() == IndexMetaData.State.OPEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow this logic here. Wouldn't it be much simpler just to say. If the index is closed at this point, all is good. If not, something must have gone wrong, thereby acked == false.

This amounts to the following simple logic:

for (Map.Entry<Index, AcknowledgedResponse> result : results.entrySet()) {
    IndexMetaData updatedMetaData = updatedState.metaData().index(result.getKey());
    if (updatedMetaData != null && updatedMetaData.getState() != IndexMetaData.State.CLOSE) {
        acknowledged = false;
        break;
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, let's do that.

tlrx added a commit that referenced this pull request Jan 7, 2019
As suggested in #36775, this pull request renames the following methods:

ClusterBlocks.hasGlobalBlock(int)
ClusterBlocks.hasGlobalBlock(RestStatus)
ClusterBlocks.hasGlobalBlock(ClusterBlockLevel)

to something that better reflects the property of the ClusterBlock that is searched for:

ClusterBlocks.hasGlobalBlockWithId(int)
ClusterBlocks.hasGlobalBlockWithStatus(RestStatus)
ClusterBlocks.hasGlobalBlockWithLevel(ClusterBlockLevel)
tlrx added a commit that referenced this pull request Jan 7, 2019
As suggested in #36775, this pull request renames the following methods:

ClusterBlocks.hasGlobalBlock(int)
ClusterBlocks.hasGlobalBlock(RestStatus)
ClusterBlocks.hasGlobalBlock(ClusterBlockLevel)

to something that better reflects the property of the ClusterBlock that is searched for:

ClusterBlocks.hasGlobalBlockWithId(int)
ClusterBlocks.hasGlobalBlockWithStatus(RestStatus)
ClusterBlocks.hasGlobalBlockWithLevel(ClusterBlockLevel)
@tlrx
Copy link
Member Author

tlrx commented Jan 7, 2019

Thanks @ywelsch for the review. I fixed conflicts and updated the code. Let me know if you have more comments or concerns. Thanks

@tlrx
Copy link
Member Author

tlrx commented Jan 7, 2019

I pushed a fix for a test bug in ReopenWhileClosingIT (the test tried to execute an open index request by passing an empty list of indices). Other tests failures are caused by the target feature branch not being up to date with master.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlrx tlrx merged commit e149b08 into elastic:close-index-api-refactoring Jan 7, 2019
@tlrx tlrx deleted the add-cluster-block-uuid branch January 7, 2019 15:45
@tlrx
Copy link
Member Author

tlrx commented Jan 7, 2019

Thanks @ywelsch, and sorry for the numerous comments you had to make.

tlrx added a commit that referenced this pull request Jan 14, 2019
This commit backports to 6.x of the Close Index API refactoring.

It cherry-picks the following commits from master:
3ca885e [Close Index API] Add TransportShardCloseAction for pre-closing verifications (#36249)
8e5dd20 [Close Index API] Refactor MetaDataIndexStateService (#36354)
7372529 [Tests] Reduce randomization in CloseWhileRelocatingShardsIT (#36694)
103c4d4 [Close Index API] Mark unavailable shard copy as stale during verification (#36755)
1959388 [Close Index API] Propagate tasks ids between Freeze, Close and Verify(#36630)
e149b08 [Close Index API] Add unique UUID to ClusterBlock (#36775)
dc371ef [Tests] Fix ReopenWhileClosingIT with correct min num shards

The following two commits were needed to adapt the change to 6.x:
ef6ae69 [Close Index API] Adapt MetaDataIndexStateServiceTests after merge
21b7653 [Tests] Adapt CloseIndexIT tests for 6.x

Related to #33888
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants