Add max query-scheduler instances support #3005

pracucci · 2022-09-21T12:56:42Z

What this PR does

This PR is a follow up #2957. In #2957 I introduced the ring for query-scheduler. In this PR I'm adding support to configure a max number of query-scheduler instances effectively used. This setting is 0 by default, which means all available query-schedulers are used. However, in the read-write deployment we'll set this to 2, in order to use only 2 query-schedulers regardless how many backend replicas you're running.

How it works:

Added -query-scheduler.max-used-instances config option (experimental). When > 0, the queries are enqueued on the configured max number of query-scheduler instances. It's only supported by query-scheduler ring discovery for now (there's a config validation check).
The query-frontend only connects to the in-use query-schedulers. The querier workers connect both to in-use and not-in-use query-schedulers. The querier max concurrency is used to determine the number of connections towards in-use query-schedulers. Then, the querier also open 1 connection to each not-in-use query-schedulers, to make sure their queues are always drained (e.g. queries going to not-in-use query-schedulers when the set of in-use query-schedulers change).
To determine the set of in-use query-schedulers I've not used the ring's replication factor (e.g. looking for an hardcoded token, like Loki is doing) but I'm sorting the scheduler addresses and taking the first N. The reason is that using the replication factor has some drawbacks due to how it's implemented in the ring (e.g. when there are multiple unhealthy instances, you may fail lookup the ring).

I manually tested this PR in microservices, read-write and monolithic mode.

Which issue(s) this PR fixes or relates to

Part of #2749

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pracucci · 2022-09-21T13:40:49Z

pkg/querier/worker/worker.go

 		m.stop()
+
+		delete(w.managers, address)


Note to reviewers: we were not doing it before. It's not strictly required since we never restart a stopped service, but to avoid future bugs I prefer to clean up the map of managers when stopping.

pracucci · 2022-09-21T16:46:07Z

pkg/querier/worker/worker.go

+
+	// Re-balance the connections between the available query-frontends / query-schedulers.
+	w.mu.Lock()
+	w.resetConcurrency()


Note to reviewers: Not calling resetConcurrency() was a bug, because it means we don't rebalance the connections when a instance leaves. I've added a dedicated CHANGELOG entry to mention it.

Nice find. In typical usage when one scheduler stops, another one also starts and we call resetConcurrency then. But we should not rely on that.

colega · 2022-09-22T08:27:10Z

pkg/querier/worker/worker_test.go

@@ -168,6 +197,67 @@ func TestResetConcurrency(t *testing.T) {
 	}
 }

+func TestQuerierWorker_getDesiredConcurrency(t *testing.T) {


I would like to see a test case where maxConcurrent < numInUse.
In that case concurrency := w.cfg.MaxConcurrentRequests / numInUse will be 0 and there's no guarrantee that some querier will connect to every scheduler.

In particular, I'd just try to ensure that every instance gets at least 1 connection.

Was this also the previous behavior?

I think it's already covered by "should create 1 connection for each instance if max concurrency is set to 0", but I've added an additional explicit one in 9107113.

In particular, I'd just try to ensure that every instance gets at least 1 connection.
Was this also the previous behavior?

Yes.

In particular, I'd just try to ensure that every instance gets at least 1 connection.
Was this also the previous behavior?
Yes.

See:

mimir/pkg/querier/worker/worker.go

Lines 264 to 270 in 12db6c0

// If concurrency is 0 then MaxConcurrentRequests is less than the total number of

// frontends/schedulers. In order to prevent accidentally starving a frontend or scheduler we are just going to

// always connect once to every target. This is dangerous b/c we may start exceeding PromQL

// max concurrency.

if concurrency == 0 {

concurrency = 1

}

The comment "This is dangerous b/c we may start exceeding PromQL max concurrency." has been removed because it's not true anymore (we don't limit PromQL engine concurrency anymore).

I think it's already covered by "should create 1 connection for each instance if max concurrency is set to 0"

Oh, right, sorry. I thought that was somehow a special case. Thank you for adding the testcase anyway.

Given there's already one connection ensured to each instance, we don't really need to ensure it at two levels (in getDesiredConcurrency and in resetConcurrency). But not sure if worth the change.

pstibrany

Looks good, nice job.

pkg/util/servicediscovery/ring.go

pkg/util/servicediscovery/ring_test.go

pkg/util/servicediscovery/dns.go

pkg/util/servicediscovery/ring.go

pstibrany · 2022-09-22T08:24:53Z

pkg/querier/worker/worker.go

+
+	// Re-balance the connections between the available query-frontends / query-schedulers.
+	w.mu.Lock()
+	w.resetConcurrency()


Nice find. In typical usage when one scheduler stops, another one also starts and we call resetConcurrency then. But we should not rely on that.

pstibrany · 2022-09-22T08:35:55Z

pkg/querier/worker/worker.go

+	// Skip if there's no manager for it.
+	if m := w.managers[instance.Address]; m == nil {
+		return
+	}


How can this happen? That would be a bug. Also instead of skipping, we can call InstanceAdded.

It shouldn't happen. Instead of calling InstanceAdded I would prefer to log it as an error, because looks like a bug we should fix. Also, I noticed there's a race condition between querierWorker.stopping() and the check whether the service is stopping in other functions, which could potentially lead to create new connections after the unlock in stopping(). I addressed in 7e3f8b7.

WDYT?

Logging seems fine to me.

I don't think doing ServiceContext check inside the lock prevents a race, as BasicService implementation will cancel the context before calling stopping function, and this is in no way related to the locks used inside the worker.

Any extra connection created after context has been cancelled will be closed by m.stop() call in stopping function of the worker.

Connections are created by InstanceAdded() which is a callback of the service discovery. The service discovery is stopped after the calls to m.stop() done in stopping() so I think you could end up with the following race:

goroutine1: calls InstanceAdded(), check service context (all good), and stops before the call to the Lock()

goroutine2: cancel the service context, calls stopping(), enters the lock, call m.stop(), then Unlock()

goroutine1: continues the execution inside InstanceAdded()

WDYT?

OK, I see what you mean now. Sounds good.

colega

LGTM!

I like how instance.InUse looks in the code.

pracucci · 2022-09-22T09:45:01Z

@colega @pstibrany I've added an integration test in 873f725. Could you take a look, please?

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pstibrany · 2022-09-22T10:04:33Z

@colega @pstibrany I've added an integration test in 873f725. Could you take a look, please?

Looks fine to me.

Signed-off-by: Marco Pracucci <marco@pracucci.com>

colega · 2022-09-22T10:10:18Z

LGTM, good test 👌

pracucci mentioned this pull request Sep 21, 2022

Add read-write-backend deployment mode #2749

Closed

12 tasks

pracucci commented Sep 21, 2022

View reviewed changes

pracucci force-pushed the add-shard-size-to-query-scheduler branch from 2c0ce89 to 0027952 Compare September 21, 2022 16:45

pracucci commented Sep 21, 2022

View reviewed changes

pracucci marked this pull request as ready for review September 22, 2022 07:32

pracucci requested review from osg-grafana and a team as code owners September 22, 2022 07:32

colega reviewed Sep 22, 2022

View reviewed changes

pracucci force-pushed the add-shard-size-to-query-scheduler branch from 0027952 to 9107113 Compare September 22, 2022 08:30

pstibrany approved these changes Sep 22, 2022

View reviewed changes

colega approved these changes Sep 22, 2022

View reviewed changes

pracucci added 9 commits September 22, 2022 11:45

Add max query-scheduler instances support

cc2aa58

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Removed unused const

c4b4e8d

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Removed unused RingCheckPeriod config

897345a

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fix bug when an instance is removed

016c110

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Add CHANGELOG entry about the fix

5a5fe6a

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Added an additional test case to TestQuerierWorker_getDesiredConcurrency

cf21474

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Addressed review comments

fb2afbe

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Fix race condition in querierWorker and log on possible bug

a01d1f6

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Added integration test

1daf7d1

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci force-pushed the add-shard-size-to-query-scheduler branch from 873f725 to 1daf7d1 Compare September 22, 2022 09:45

Add missing error check

bdfdd76

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci mentioned this pull request Sep 22, 2022

Increase backoff period used to retry connections to query-frontend / query-scheduler #3011

Merged

3 tasks

pracucci merged commit d948eaf into main Sep 22, 2022

pracucci deleted the add-shard-size-to-query-scheduler branch September 22, 2022 10:24

pracucci mentioned this pull request Sep 22, 2022

Do not log "error processing requests from scheduler" when the query-scheduler is shutting down #3012

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add max query-scheduler instances support #3005

Add max query-scheduler instances support #3005

pracucci commented Sep 21, 2022 •

edited

pracucci Sep 21, 2022

pracucci Sep 21, 2022

pstibrany Sep 22, 2022

colega Sep 22, 2022

pracucci Sep 22, 2022

pracucci Sep 22, 2022

pracucci Sep 22, 2022

colega Sep 22, 2022

pstibrany left a comment

pstibrany Sep 22, 2022

pstibrany Sep 22, 2022

pracucci Sep 22, 2022

pstibrany Sep 22, 2022 •

edited

pracucci Sep 22, 2022

pstibrany Sep 22, 2022

colega left a comment

pracucci commented Sep 22, 2022

pstibrany commented Sep 22, 2022

colega commented Sep 22, 2022

	// If concurrency is 0 then MaxConcurrentRequests is less than the total number of
	// frontends/schedulers. In order to prevent accidentally starving a frontend or scheduler we are just going to
	// always connect once to every target. This is dangerous b/c we may start exceeding PromQL
	// max concurrency.
	if concurrency == 0 {
	concurrency = 1
	}

Add max query-scheduler instances support #3005

Add max query-scheduler instances support #3005

Conversation

pracucci commented Sep 21, 2022 • edited

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstibrany left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pstibrany Sep 22, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colega left a comment

Choose a reason for hiding this comment

pracucci commented Sep 22, 2022

pstibrany commented Sep 22, 2022

colega commented Sep 22, 2022

pracucci commented Sep 21, 2022 •

edited

pstibrany Sep 22, 2022 •

edited