Add support for replicating closed indices #39499

tlrx · 2019-02-28T09:21:46Z

This pull request adds the support for replicating closed indices.

Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again.

This pull request changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed.

This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health.

Some APIs have been adapted to make them work with indices whatever the state they are in:

Cluster Health API
Cluster Reroute API
Cluster Allocation Explain API
Recovery API
Cat Indices
Cat Shards
Cat Health
Cat Recovery

Some of these APIs support the expand_wilcards parameter that allows to retrieve information only for closed and/or open indices, like:

GET /_cluster/health?expand_wildcards=closed&level=shards
{
  "cluster_name" : "distribution_run",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 1,
  "active_shards" : 1,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0,
  "indices" : {
    "my-closed-index" : {
      "status" : "green",
      "number_of_shards" : 1,
      "number_of_replicas" : 0,
      "active_primary_shards" : 1,
      "active_shards" : 1,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "shards" : {
        "0" : {
          "status" : "green",
          "primary_active" : true,
          "active_shards" : 1,
          "relocating_shards" : 0,
          "initializing_shards" : 0,
          "unassigned_shards" : 0
        }
      }
    }
  }
}

This pull request contains all the following changes that have been uniquely reviewed (most recent first):

└─ $ ▶ git log --oneline --no-merges --graph master..replicated-closed-indices

c6c42a1 Adapt NoOpEngineTests after Do not wait for advancement of checkpoint in recovery #39006
3f9993d Wait for shards to be active after closing indices (Wait for shards to be active after closing indices #38854)
5e7a428 Adapt the Cluster Health API to closed indices (Adapt the Cluster Health and Cat Indices APIs to closed indices #39364)
3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (Adapt CloseFollowerIndexIT for replicated closed indices #38767)
71f5c34 Recover closed indices after a full cluster restart (Recover closed indices after a full cluster restart #39249)
4db7fd9 Adapt the Recovery API for closed indices (Adapt the Recovery API for closed indices #38421)
4fd1bb2 Adapt more tests suites to closed indices (Adapt more tests suites to closed indices #39186)
0519016 Add replica to primary promotion test for closed indices (Add replica to primary promotion test for closed indices #39110)
b756f6c Test the Cluster Shard Allocation Explain API with closed indices (Test the Cluster Shard Allocation Explain API with closed indices #38631)
c484c66 Remove index routing table of closed indices in mixed versions clusters (Remove index routing table of closed indices in mixed versions clusters #38955)
00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex()
e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices #38329)
cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (Adapt testIndexCanChangeCustomDataPath for replicated closed indices #38327)
b9becdd Adapt testPendingTasks() for replicated closed indices (Adapt testPendingTasks() for replicated closed indices #38326)
02cc730 Allow shards of closed indices to be replicated as regular shards (Allow shards of closed indices to be replicated as regular shards #38024)
e53a9be Fix compilation error in IndexShardIT after merge with master
cae4155 Relax NoOpEngine constraints (Relax NoOpEngine constraints on translog #37413)
54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes
c63fd69 [RCI] Add NoOpEngine for closed indices ([RCI] Add NoOpEngine for closed indices #33903)

This pull request adds support for 8.0 and should be backported to 7.x.

Relates to #33888

This commit adds a new NoOpEngine implementation based on the current ReadOnlyEngine. This new implementation uses an empty DirectoryReader with no segments readers and will always returns 0 docs. The NoOpEngine is the default Engine created for IndexShards of closed indices. It expects an empty translog when it is instantiated. Relates to #33888

Changes were made in #34357 and #36467

When a NoOpEngine is instanciated, the current implementation verifies that the translog contains no operations and that it contains the same UUID as the last Lucene commit data.We can relax those two constraints because the Close Index API now ensure that all translog operations are flushed before closing a shard. The detection of coherence between translog UUID / Lucene commit data is not specific to NoOpEngine, and is already done by IndexShard.innerOpenEngineAndTranslog(). Related to #33888

…8024) This commit allows shards of indices in CLOSE state to be replicated as normal shards. It changes the MetaDataIndexStateService so that index routing tables of closed indices are kept in cluster state when the index is closed. Index routing tables are modified so that shard routings are reinitialized with the INDEX_CLOSED unassigned information. The IndicesClusterStateService is modified to remove IndexService instances of closed or reopened indices. In combination with the ShardRouting being in INITIALIZING state the shards are recreated on the data nodes to reflect the new state. If the index state is closed, the IndexShard instances will be created using the NoOpEngine as the engine implementation. This commit also mutes two tests that rely on the fact that shard locks are released when an index is closed, which is not the case anymore with replicated closed indices (actually the locks are released but reacquired once the shard is reinitialized after being closed). These tests will be adapted in follow up PRs. Finally, many things will require to be adapted or improved in follow up PRs (see #33888) but this is the first big step towards replicated closed indices. Relates to #33888

Relates to #33888

…38327) Relates to #33888 and #38024

…dices (#38329) Replicated closed indices do not need to be refreshed, neither they need their translogs or global checkpoint to be fsync. This pull request changes how `BaseAsyncTask` tasks are rescheduled in `IndexService` instances so that the tasks are rescheduled only when the index is opened. Relates to #33888

…rs (#38955) This pull request removes the legacy way of closing indices (aka "direct close") in mixed versions clusters, since this backward compatibility logic is not required anymore on master/8.0.0. It also changes the closing logic so that routing tables of closed indices are removed when the cluster contains a node in version < 8.0. Relates #33888

…8631) This pull request modifies the `ClusterAllocationExplainIT` test suite so that it always runs the tests with opened and closed indices. The only test that was not adapted for closed indices is `testAllocationFilteringOnIndexCreation` because we don't allow to directly create indices in the closed state. Relates to #33888

This commit adds a simple test which verifies that a replica can be promoted as a primary when the index is closed. Relates to #33888

* Adapt more tests suites to closed indices Similarly to #38631, this pull request modifies multiple test suites so that they runs the tests with opened or closed indices. The suites are testing: - shard allocation filtering - shard allocation awereness - Reroute API Relates to #33888

This commit adapts the Recovery API to make it work with shards of replicated closed indices. Relates #33888

Closing an index is a process that can be broken down into several steps: 1. first, the state of the cluster is updated to add a write block on the index to be closed 2. then, a transport replication action is executed on all shards of the index. This action checks that the maximum sequence number and the global checkpoint have identical values, indicating that all in flight writing operations have been completed on the shard. 3. finally, and if the previous steps were successful, the cluster state is updated again to change the state of the index from `OPEN`to `CLOSE`. During the last step, the master node retrieves the minimum node version among all the nodes that compose the cluster: * If a node is in pre 8.0 version, the index is closed and the index routing table is removed from the cluster state. This is the "old" way of closing indices and closed indices with no routing table are not replicated. * If all nodes are in version 8.0 or higher, the index is closed and its routing table is reinitialized in cluster state. This is the new way of closing indices and such closed indices will be replicated in the cluster. But routing tables are not persisted in the cluster state, so after a full cluster restart there is no way to make the distinction between an index closed in 7.x and an index closed and replicated on 8.0. This commit introduces a new private index settings named `index.verified_before_close` that is added to closed indices that are replicated at closing time. This setting serves as a marker to indicate that the index has been closed using the new Close Index API on a cluster that supports replication of closed indices. This way, after a full cluster restart, the Gateway service can automatically recovers those closed indices as if they were opened indices. Closed indices that don't have this setting (because they were closed on a pre-8.0 cluster, or a cluster in mixed version) won't be recovered and will need to be reopened and closed again on a 8.0 cluster. Note that reopening the index removes the private setting. Relates to #33888

Now the test `CloseFollowerIndexIT` has been added in #38702, it needs to be adapted for replicated closed indices. The test closes the follower index which is lagging behind the leader index. When it's closed, no sanity checks are executed because it's a follower index (this is a consequence of #38702). But with replicated closed indices, the index is reinitialized as a closed index with a `NoOpEngine` and such engines make strong assertions on the values of the maximum sequence number and the global checkpoint. Since the values do not match, the shards cannot be created and fail and the cluster health turns RED. This commit adapts the `CloseFollowerIndexIT` test so that it wraps the default `UncaughtExceptionHandler` with a handler that tolerates any exception thrown by `ReadOnlyEngine.assertMaxSeqNoEqualsToGlobalCheckpoint()`. Replacing the default uncaught exception handler requires specific permissions, and instead of creating another gradle project it duplicates the `internalClusterTest` task to make it work without security manager for this specific test only. Relates to #33888

This commit adapts the Cluster Health API to support replicated closed indices. In order to do that, it removes the hard coded indices options from the `ClusterHealthRequest` and replaces it with a new `IndicesOptions.lenientExpand()` option. This option will be used by the master node (once it is upgraded to 8.0) to compute the global cluster health using both opened and closed indices information by default. The `expand_wildcards` REST parameter is also documented and tests where added to ensure that a specific expansion type can be used to monitoring the health of a only opened or only closed indices. Since the Cat Indices relies on the Cluster Health API, it has been adapted to report information about closed indices too. Note that the health and number of shards/replicas is only printed out for closed indices that have an index routing table. Closed indices without routing table have the same output as before. Related to #33888

This commit changes the Close Index API to add a `wait_for_active_shards` parameter that allows to wait for shards of closed indices to be active before returning a response. Relates #33888

elasticmachine · 2019-02-28T09:21:48Z

Pinging @elastic/es-distributed

tlrx · 2019-02-28T11:30:01Z

@ywelsch CI is happy and this is ready for your approval.

Note that some commits were not reviewed as part of pull requests (mostly merge conflicts fixes):
c6c42a1, 00f1828 and e53a9be.

ywelsch

LGTM. Thank you

server/src/main/java/org/elasticsearch/action/admin/indices/close/CloseIndexRequest.java

tlrx · 2019-02-28T17:35:41Z

Thanks @ywelsch

Backport support for replicating closed indices (#39499) Before this change, closed indexes were simply not replicated. It was therefore possible to close an index and then decommission a data node without knowing that this data node contained shards of the closed index, potentially leading to data loss. Shards of closed indices were not completely taken into account when balancing the shards within the cluster, or automatically replicated through shard copies, and they were not easily movable from node A to node B using APIs like Cluster Reroute without being fully reopened and closed again. This commit changes the logic executed when closing an index, so that its shards are not just removed and forgotten but are instead reinitialized and reallocated on data nodes using an engine implementation which does not allow searching or indexing, which has a low memory overhead (compared with searchable/indexable opened shards) and which allows shards to be recovered from peer or promoted as primaries when needed. This new closing logic is built on top of the new Close Index API introduced in 6.7.0 (#37359). Some pre-closing sanity checks are executed on the shards before closing them, and closing an index on a 8.0 cluster will reinitialize the index shards and therefore impact the cluster health. Some APIs have been adapted to make them work with closed indices: - Cluster Health API - Cluster Reroute API - Cluster Allocation Explain API - Recovery API - Cat Indices - Cat Shards - Cat Health - Cat Recovery This commit contains all the following changes (most recent first): * c6c42a1 Adapt NoOpEngineTests after #39006 * 3f9993d Wait for shards to be active after closing indices (#38854) * 5e7a428 Adapt the Cluster Health API to closed indices (#39364) * 3e61939 Adapt CloseFollowerIndexIT for replicated closed indices (#38767) * 71f5c34 Recover closed indices after a full cluster restart (#39249) * 4db7fd9 Adapt the Recovery API for closed indices (#38421) * 4fd1bb2 Adapt more tests suites to closed indices (#39186) * 0519016 Add replica to primary promotion test for closed indices (#39110) * b756f6c Test the Cluster Shard Allocation Explain API with closed indices (#38631) * c484c66 Remove index routing table of closed indices in mixed versions clusters (#38955) * 00f1828 Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex() * e845b0a Do not schedule Refresh/Translog/GlobalCheckpoint tasks for closed indices (#38329) * cf9a015 Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#38327) * b9becdd Adapt testPendingTasks() for replicated closed indices (#38326) * 02cc730 Allow shards of closed indices to be replicated as regular shards (#38024) * e53a9be Fix compilation error in IndexShardIT after merge with master * cae4155 Relax NoOpEngine constraints (#37413) * 54d110b [RCI] Adapt NoOpEngine to latest FrozenEngine changes * c63fd69 [RCI] Add NoOpEngine for closed indices (#33903) Relates to #33888

tlrx added 30 commits January 30, 2019 11:35

[RCI] Adapt NoOpEngine to latest FrozenEngine changes

54d110b

Changes were made in #34357 and #36467

Fix compilation error in IndexShardIT after merge with master

e53a9be

Merge branch 'master' into replicated-closed-indices

46a6fba

Merge branch 'master' into replicated-closed-indices

fd1046c

Merge branch 'master' into replicated-closed-indices

82b4ba6

Merge branch 'master' into replicated-closed-indices

404d065

Merge branch 'master' into replicated-closed-indices

5c61cac

Merge branch 'master' into replicated-closed-indices

d746ef5

Merge branch 'master' into replicated-closed-indices

f53cd65

Merge branch 'master' into replicated-closed-indices

a663613

Adapt testPendingTasks() for replicated closed indices (#38326)

b9becdd

Relates to #33888

Adapt testIndexCanChangeCustomDataPath for replicated closed indices (#…

cf9a015

…38327) Relates to #33888 and #38024

Merge branch 'master' into replicated-closed-indices

0cd3636

Merge branch 'master' into replicated-closed-indices

b55aca7

Merge branch 'master' into replicated-closed-indices

03335b2

Merge branch 'master' into replicated-closed-indices

040476a

Mute CloseFollowerIndexIT.testCloseAndReopenFollowerIndex()

00f1828

Merge branch 'master' into replicated-closed-indices

43f555b

Merge branch 'master' into replicated-closed-indices

538cdcd

Add replica to primary promotion test for closed indices (#39110)

0519016

This commit adds a simple test which verifies that a replica can be promoted as a primary when the index is closed. Relates to #33888

Merge branch 'master' into replicated-closed-indices

05debeb

Adapt the Recovery API for closed indices (#38421)

4db7fd9

This commit adapts the Recovery API to make it work with shards of replicated closed indices. Relates #33888

Merge branch 'master' into replicated-closed-indices

d0cf376

tlrx added 4 commits February 26, 2019 09:29

Wait for shards to be active after closing indices (#38854)

3f9993d

This commit changes the Close Index API to add a `wait_for_active_shards` parameter that allows to wait for shards of closed indices to be active before returning a response. Relates #33888

tlrx added >enhancement :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v8.0.0 labels Feb 28, 2019

tlrx added 2 commits February 28, 2019 10:38

Merge branch 'master' into replicated-closed-indices

1b7eb90

Adapt NoOpEngineTests after #39006

c6c42a1

tlrx requested a review from ywelsch February 28, 2019 11:26

tlrx mentioned this pull request Feb 28, 2019

Backport support for replicating closed indices to 7.x #39506

Merged

ywelsch approved these changes Feb 28, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/action/admin/indices/close/CloseIndexRequest.java Outdated Show resolved Hide resolved

Amend comment

79b9a5a

tlrx merged commit 309a3e4 into master Feb 28, 2019

dakrone mentioned this pull request Feb 28, 2019

Add support for closed indices to ILM #39524

Closed

tlrx added backport pending v7.2.0 and removed backport pending labels Mar 1, 2019

jakelandis mentioned this pull request Apr 16, 2019

Backport FullClusterRestart tests for 7.0 #41269

Merged

xiahongju mentioned this pull request Oct 28, 2019

improve reroute with many shards #48579

Closed

colings86 deleted the replicated-closed-indices branch May 27, 2020 07:40

tlrx mentioned this pull request Dec 17, 2020

Undocumented(?) change in default for wait_for_active_shards on close index requests #66419

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for replicating closed indices #39499

Add support for replicating closed indices #39499

tlrx commented Feb 28, 2019 •

edited

Loading

elasticmachine commented Feb 28, 2019

tlrx commented Feb 28, 2019

ywelsch left a comment

tlrx commented Feb 28, 2019

Add support for replicating closed indices #39499

Add support for replicating closed indices #39499

Conversation

tlrx commented Feb 28, 2019 • edited Loading

elasticmachine commented Feb 28, 2019

tlrx commented Feb 28, 2019

ywelsch left a comment

Choose a reason for hiding this comment

tlrx commented Feb 28, 2019

tlrx commented Feb 28, 2019 •

edited

Loading