pkg/ingester: prevent shutdowns from processing during joining handoff #1114

rfratto · 2019-10-03T16:03:09Z

This commit fixes a race condition where an ingester that is shut down during the joining handoff (i.e., receiving chunks from a leaving ingester before claiming its tokens) hangs and can never shut down
cleanly. A shutdown mutex is implemented which is obtained at the start of the handoff process and released after the handoff process completes. This race condition also prevented the leaving ingester from completing its shutdown, since it waits for the joining ingester to claim the tokens.

This means that if a brand new ingester is shut down, it will always have finished receiving chunks from the previous leaving ingester and have finished obtaining the tokens from the ingester attempting to leave.

Checklist

Tests updated

This commit fixes a race condition where an ingester that is shut down during the joining handoff (i.e., receiving chunks from a leaving ingester before claiming its tokens) hangs and can never shut down cleanly. A shutdown mutex is implemented which is obtained at the start of the handoff process and released after the handoff process completes. This race condition also prevented the leaving ingester from completing its shutdown, since it waits for the joining ingester to claim the tokens. This means that if a brand new ingester is shut down, it will always have finished receiving chunks from the previous leaving ingester and have finished obtaining the tokens from the ingester attempting to leave.

pstibrany · 2019-10-04T07:27:17Z

Wouldn't it be better in this case to cancel transfer instead, when shutdown is triggered? Then the ingester has less to deal with.

(TransferChunks method could be passed a context so that it knows when it gets canceled).

rfratto · 2019-10-04T10:41:08Z

I'm not sure about that. If we cancel the transfer, then we're forced to do a flush. The transfers are fairly quick and this shouldn't hold back the shutdown process too much.

I'm also not sure how easy it would be for TransferChunks to be cancelled since it's not invoked from the lifecycler. It's invoked as a remote method from the original leaving ingester.

slim-bean

LGTM!

rfratto requested a review from cyriltovena October 3, 2019 16:03

rfratto added component/loki kind/bug labels Oct 3, 2019

rfratto requested a review from slim-bean October 7, 2019 16:30

slim-bean approved these changes Oct 9, 2019

View reviewed changes

slim-bean merged commit 2cb3c82 into grafana:master Oct 9, 2019

rfratto deleted the fix-shutdown-during-transfer branch November 19, 2019 14:32

chaudum added the type/bug Somehing is not working as expected label Jun 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/ingester: prevent shutdowns from processing during joining handoff #1114

pkg/ingester: prevent shutdowns from processing during joining handoff #1114

rfratto commented Oct 3, 2019

pstibrany commented Oct 4, 2019

rfratto commented Oct 4, 2019

slim-bean left a comment

pkg/ingester: prevent shutdowns from processing during joining handoff #1114

pkg/ingester: prevent shutdowns from processing during joining handoff #1114

Conversation

rfratto commented Oct 3, 2019

pstibrany commented Oct 4, 2019

rfratto commented Oct 4, 2019

slim-bean left a comment

Choose a reason for hiding this comment