Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max query-scheduler instances support #3005

Merged
merged 10 commits into from
Sep 22, 2022

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Sep 21, 2022

What this PR does

This PR is a follow up #2957. In #2957 I introduced the ring for query-scheduler. In this PR I'm adding support to configure a max number of query-scheduler instances effectively used. This setting is 0 by default, which means all available query-schedulers are used. However, in the read-write deployment we'll set this to 2, in order to use only 2 query-schedulers regardless how many backend replicas you're running.

How it works:

  • Added -query-scheduler.max-used-instances config option (experimental). When > 0, the queries are enqueued on the configured max number of query-scheduler instances. It's only supported by query-scheduler ring discovery for now (there's a config validation check).
  • The query-frontend only connects to the in-use query-schedulers. The querier workers connect both to in-use and not-in-use query-schedulers. The querier max concurrency is used to determine the number of connections towards in-use query-schedulers. Then, the querier also open 1 connection to each not-in-use query-schedulers, to make sure their queues are always drained (e.g. queries going to not-in-use query-schedulers when the set of in-use query-schedulers change).
  • To determine the set of in-use query-schedulers I've not used the ring's replication factor (e.g. looking for an hardcoded token, like Loki is doing) but I'm sorting the scheduler addresses and taking the first N. The reason is that using the replication factor has some drawbacks due to how it's implemented in the ring (e.g. when there are multiple unhealthy instances, you may fail lookup the ring).

I manually tested this PR in microservices, read-write and monolithic mode.

Which issue(s) this PR fixes or relates to

Part of #2749

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

m.stop()

delete(w.managers, address)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: we were not doing it before. It's not strictly required since we never restart a stopped service, but to avoid future bugs I prefer to clean up the map of managers when stopping.

@pracucci pracucci force-pushed the add-shard-size-to-query-scheduler branch from 2c0ce89 to 0027952 Compare September 21, 2022 16:45

// Re-balance the connections between the available query-frontends / query-schedulers.
w.mu.Lock()
w.resetConcurrency()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: Not calling resetConcurrency() was a bug, because it means we don't rebalance the connections when a instance leaves. I've added a dedicated CHANGELOG entry to mention it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. In typical usage when one scheduler stops, another one also starts and we call resetConcurrency then. But we should not rely on that.

@pracucci pracucci marked this pull request as ready for review September 22, 2022 07:32
@pracucci pracucci requested review from osg-grafana and a team as code owners September 22, 2022 07:32
@@ -168,6 +197,67 @@ func TestResetConcurrency(t *testing.T) {
}
}

func TestQuerierWorker_getDesiredConcurrency(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see a test case where maxConcurrent < numInUse.
In that case concurrency := w.cfg.MaxConcurrentRequests / numInUse will be 0 and there's no guarrantee that some querier will connect to every scheduler.

In particular, I'd just try to ensure that every instance gets at least 1 connection.

Was this also the previous behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's already covered by "should create 1 connection for each instance if max concurrency is set to 0", but I've added an additional explicit one in 9107113.

In particular, I'd just try to ensure that every instance gets at least 1 connection.
Was this also the previous behavior?

Yes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, I'd just try to ensure that every instance gets at least 1 connection.
Was this also the previous behavior?
Yes.

See:

// If concurrency is 0 then MaxConcurrentRequests is less than the total number of
// frontends/schedulers. In order to prevent accidentally starving a frontend or scheduler we are just going to
// always connect once to every target. This is dangerous b/c we may start exceeding PromQL
// max concurrency.
if concurrency == 0 {
concurrency = 1
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "This is dangerous b/c we may start exceeding PromQL max concurrency." has been removed because it's not true anymore (we don't limit PromQL engine concurrency anymore).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's already covered by "should create 1 connection for each instance if max concurrency is set to 0"

Oh, right, sorry. I thought that was somehow a special case. Thank you for adding the testcase anyway.


Given there's already one connection ensured to each instance, we don't really need to ensure it at two levels (in getDesiredConcurrency and in resetConcurrency). But not sure if worth the change.

@pracucci pracucci force-pushed the add-shard-size-to-query-scheduler branch from 0027952 to 9107113 Compare September 22, 2022 08:30
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, nice job.

pkg/util/servicediscovery/ring.go Show resolved Hide resolved
pkg/util/servicediscovery/ring.go Outdated Show resolved Hide resolved
pkg/util/servicediscovery/ring_test.go Outdated Show resolved Hide resolved
pkg/util/servicediscovery/dns.go Outdated Show resolved Hide resolved
pkg/util/servicediscovery/ring.go Show resolved Hide resolved

// Re-balance the connections between the available query-frontends / query-schedulers.
w.mu.Lock()
w.resetConcurrency()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. In typical usage when one scheduler stops, another one also starts and we call resetConcurrency then. But we should not rely on that.

Comment on lines 272 to 282
// Skip if there's no manager for it.
if m := w.managers[instance.Address]; m == nil {
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this happen? That would be a bug. Also instead of skipping, we can call InstanceAdded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't happen. Instead of calling InstanceAdded I would prefer to log it as an error, because looks like a bug we should fix. Also, I noticed there's a race condition between querierWorker.stopping() and the check whether the service is stopping in other functions, which could potentially lead to create new connections after the unlock in stopping(). I addressed in 7e3f8b7.

WDYT?

Copy link
Member

@pstibrany pstibrany Sep 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging seems fine to me.

I don't think doing ServiceContext check inside the lock prevents a race, as BasicService implementation will cancel the context before calling stopping function, and this is in no way related to the locks used inside the worker.

Any extra connection created after context has been cancelled will be closed by m.stop() call in stopping function of the worker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connections are created by InstanceAdded() which is a callback of the service discovery. The service discovery is stopped after the calls to m.stop() done in stopping() so I think you could end up with the following race:

  • goroutine1: calls InstanceAdded(), check service context (all good), and stops before the call to the Lock()
  • goroutine2: cancel the service context, calls stopping(), enters the lock, call m.stop(), then Unlock()
  • goroutine1: continues the execution inside InstanceAdded()

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see what you mean now. Sounds good.

Copy link
Contributor

@colega colega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I like how instance.InUse looks in the code.

@pracucci
Copy link
Collaborator Author

@colega @pstibrany I've added an integration test in 873f725. Could you take a look, please?

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci pracucci force-pushed the add-shard-size-to-query-scheduler branch from 873f725 to 1daf7d1 Compare September 22, 2022 09:45
@pstibrany
Copy link
Member

@colega @pstibrany I've added an integration test in 873f725. Could you take a look, please?

Looks fine to me.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@colega
Copy link
Contributor

colega commented Sep 22, 2022

LGTM, good test 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants