Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loki: query scheduler should send shutdown to frontends when ReplicationSet changes #4614

Merged
merged 2 commits into from
Nov 2, 2021

Conversation

slim-bean
Copy link
Collaborator

@slim-bean slim-bean commented Nov 2, 2021

What this PR does / why we need it:

This PR attempts to improve a race when adding/removing instances into the scheduler ring where messages could end up stuck in a schedulers queue with no queriers coming back to process them.

Currently when an elected scheduler is no longer "elected" everyone finds out about this through the ring, but the schedulers themselves do nothing and technically stay active, the frontend and queriers poll the ring and when they find out about the scheduler change, disconnect from the old and connect to the new.

The distributed nature of this creates the possibility that all the queriers find out about a scheduler change and disconnect from the scheduler while the frontend still stays connected. The frontend could then be stuck waiting for inflight queries to process (which will never happen because all the queriers have moved on) ultimately leading to a timeout or failure of the query.

To mitigate this we make sure that as soon as the scheduler knows it's not in the ReplicationSet of who should act as a scheduler it should send a shutdown message to connected frontends so they will cancel requests and retry them in other schedulers.

Special notes for your reviewer:

Checklist

  • Documentation added
  • Tests updated
  • Add an entry in the CHANGELOG.md about the changes.

…when the ReplicationSet changes so that inflight queries are canceled and retried in the frontends.
@slim-bean slim-bean requested a review from a team as a code owner November 2, 2021 16:44
Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a suggestion, but I think we can alternatively just reuse the NewRingWatcher in the scheduler rather than implementing similar functionality.

pkg/util/ring.go Outdated Show resolved Hide resolved
@owen-d owen-d merged commit fe8bc91 into main Nov 2, 2021
@owen-d owen-d deleted the schedulers-notify-frontends branch November 2, 2021 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants