Ensure Node Shutdown Waits for Running Restores to Complete #76070

original-brownbear · 2021-08-04T09:28:48Z

We must wait for ongoing restores to complete before shutting down the repositories
service. Otherwise we may leak file descriptors because tasks for releasing the store
are submitted to the SNAPSHOT or some searchable snapshot pools that quietly accept
but never reject/fail tasks after shutdown.

same as #46178 where we had the same bug in recoveries

closes #75686

We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as elastic#46178 where we had the same bug in recoveries closes elastic#75686

elasticmachine · 2021-08-04T09:28:51Z

Pinging @elastic/es-distributed (Team:Distributed)

original-brownbear · 2021-08-04T10:08:11Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -2981,6 +3030,9 @@ void ensureNotClosing(final Store store) throws AlreadyClosedException {
                    if (store.isClosing()) {
                        throw new AlreadyClosedException("store is closing");
                    }
+                    if (lifecycle.started() == false) {


Added this here as well since we close the repositories service before the indices service and would otherwise have to wait for the restores to actually complete in some cases I think.

I think so.

tlrx

LGTM

tlrx · 2021-08-04T12:17:55Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+                return;
+            }
+            final boolean added = ongoingRestores.add(shardId);
+            assert added;


Maybe add some context?

tlrx · 2021-08-04T12:18:49Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -2981,6 +3030,9 @@ void ensureNotClosing(final Store store) throws AlreadyClosedException {
                    if (store.isClosing()) {
                        throw new AlreadyClosedException("store is closing");
                    }
+                    if (lifecycle.started() == false) {


I think so.

original-brownbear · 2021-08-04T13:33:27Z

Thanks Tanguy!

…76070) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as elastic#46178 where we had the same bug in recoveries closes elastic#75686

…76095) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as #46178 where we had the same bug in recoveries closes #75686

…76092) We must wait for ongoing restores to complete before shutting down the repositories service. Otherwise we may leak file descriptors because tasks for releasing the store are submitted to the `SNAPSHOT` or some searchable snapshot pools that quietly accept but never reject/fail tasks after shutdown. same as #46178 where we had the same bug in recoveries closes #75686

original-brownbear added >test Issues or PRs that are addressing/adding tests :Distributed/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v8.0.0 v7.14.1 v7.15.0 labels Aug 4, 2021

elasticmachine added the Team:Distributed Meta label for distributed team label Aug 4, 2021

original-brownbear added 2 commits August 4, 2021 11:28

meh deadlocks

dd9f1e1

more checks

e682bf6

original-brownbear commented Aug 4, 2021

View reviewed changes

original-brownbear requested review from tlrx and DaveCTurner August 4, 2021 10:25

tlrx approved these changes Aug 4, 2021

View reviewed changes

original-brownbear added 3 commits August 4, 2021 14:25

context

4048edd

reword

a965370

Merge remote-tracking branch 'elastic/master' into 75686

d1c0897

original-brownbear merged commit f62618c into elastic:master Aug 4, 2021

original-brownbear deleted the 75686 branch August 4, 2021 13:33

original-brownbear added the backport pending label Aug 4, 2021

original-brownbear mentioned this pull request Aug 4, 2021

Ensure Node Shutdown Waits for Running Restores to Complete (#76070) #76092

Merged

original-brownbear removed the backport pending label Aug 4, 2021

original-brownbear mentioned this pull request Aug 4, 2021

Ensure Node Shutdown Waits for Running Restores to Complete (#76070) #76095

Merged

mark-vieira added v8.0.0-alpha1 and removed v8.0.0 labels Aug 4, 2021

jakelandis added v8.0.0 and removed v8.0.0-alpha1 labels Aug 4, 2021

jakelandis added v8.0.0-alpha2 and removed v8.0.0 labels Sep 15, 2021

original-brownbear restored the 75686 branch April 18, 2023 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure Node Shutdown Waits for Running Restores to Complete #76070

Ensure Node Shutdown Waits for Running Restores to Complete #76070

original-brownbear commented Aug 4, 2021

elasticmachine commented Aug 4, 2021

original-brownbear Aug 4, 2021

tlrx Aug 4, 2021

tlrx left a comment

tlrx Aug 4, 2021

original-brownbear Aug 4, 2021

tlrx Aug 4, 2021

original-brownbear commented Aug 4, 2021

Ensure Node Shutdown Waits for Running Restores to Complete #76070

Ensure Node Shutdown Waits for Running Restores to Complete #76070

Conversation

original-brownbear commented Aug 4, 2021

elasticmachine commented Aug 4, 2021

original-brownbear Aug 4, 2021

Choose a reason for hiding this comment

tlrx Aug 4, 2021

Choose a reason for hiding this comment

tlrx left a comment

Choose a reason for hiding this comment

tlrx Aug 4, 2021

Choose a reason for hiding this comment

original-brownbear Aug 4, 2021

Choose a reason for hiding this comment

tlrx Aug 4, 2021

Choose a reason for hiding this comment

original-brownbear commented Aug 4, 2021