Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config: adds frontend.max-query-capacity to tune per-tenant query capacity #11284

Merged
merged 14 commits into from Dec 1, 2023

Conversation

ashwanthgoli
Copy link
Contributor

@ashwanthgoli ashwanthgoli commented Nov 22, 2023

What this PR does / why we need it:
Adds a new config frontend.max-query-capacity that allows users to configure what portion of the the available querier replicas can be used by a tenant. max_query_capacity is the corresponding YAML option that can be configured in limits or runtime overrides.

For example, setting this to 0.5 would allow a tenant to use half of the available queriers.

This complements the existing frontend.max-queriers-per-tenant. When both are configured, the smaller value of the resulting querier replica count is considered:

min(frontend.max-queriers-per-tenant, ceil(querier_replicas * frontend.max-query-capacity))

All queriers will handle requests for a tenant if neither limits are applied.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
noticed that we don't pass down the shuffle sharding limits for frontend (only using it with schedulers)

disabledShuffleShardingLimits{},

but the docs mention thatfrontend.max-queriers-per-tenant applies to frontend as well.

This option only works with queriers connecting to the query-frontend / query-scheduler, not when using downstream URL.

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Copy link
Contributor

github-actions bot commented Nov 22, 2023

Trivy scan found the following vulnerabilities:

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Nov 22, 2023
@ashwanthgoli ashwanthgoli marked this pull request as ready for review November 22, 2023 07:21
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner November 22, 2023 07:21
Copy link
Contributor

@JStickler JStickler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[docs team] Docs LGTM

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, but I think there's some refactoring that needs to be done for safety and clarity.

pkg/queue/tenant_queues.go Outdated Show resolved Hide resolved
pkg/queue/tenant_queues_test.go Outdated Show resolved Hide resolved
pkg/queue/queue.go Outdated Show resolved Hide resolved

pendingRequests: map[requestKey]*schedulerRequest{},
connectedFrontends: map[string]*connectedFrontend{},
queueMetrics: queueMetrics,
ringManager: ringManager,
requestQueue: queue.NewRequestQueue(cfg.MaxOutstandingPerTenant, cfg.QuerierForgetDelay, queueMetrics),
requestQueue: queue.NewRequestQueue(cfg.MaxOutstandingPerTenant, cfg.QuerierForgetDelay, limits.QueueLimits(schedulerLimits), queueMetrics),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would look neater if we move it to the queue pkg queue.Limits(frontendLimits)
but I left it in scheduler pkg since it is an adapter between scheduler and queue limits.

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor! It reads much more clearly now.
There were a couple troubling things that caught my eye though with the changes, but they shouldn't take long to resolve

pkg/queue/tenant_queues.go Outdated Show resolved Hide resolved
pkg/queue/tenant_queues.go Outdated Show resolved Hide resolved
pkg/queue/tenant_queues.go Outdated Show resolved Hide resolved
pkg/scheduler/limits/definitions.go Outdated Show resolved Hide resolved
pkg/scheduler/limits/definitions.go Outdated Show resolved Hide resolved
pkg/scheduler/limits/definitions_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much cleaner now, thanks @ashwanthgoli!
Just minor nits now

docs/sources/configure/_index.md Outdated Show resolved Hide resolved
docs/sources/configure/_index.md Outdated Show resolved Hide resolved
pkg/scheduler/limits/definitions.go Show resolved Hide resolved
pkg/validation/limits.go Show resolved Hide resolved
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ashwanthgoli ashwanthgoli merged commit d9f3bf3 into main Dec 1, 2023
8 checks passed
@ashwanthgoli ashwanthgoli deleted the ashwath/add-query-capacity-config branch December 1, 2023 14:22
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
…apacity (grafana#11284)

**What this PR does / why we need it**:
Adds a new config `frontend.max-query-capacity` that allows users to
configure what portion of the the available querier replicas can be used
by a tenant. `max_query_capacity` is the corresponding YAML option that
can be configured in limits or runtime overrides.

For example, setting this to 0.5 would allow a tenant to use half of the
available queriers.

This complements the existing `frontend.max-queriers-per-tenant`. When
both are configured, the smaller value of the resulting querier replica
count is considered:
```
min(frontend.max-queriers-per-tenant, ceil(querier_replicas * frontend.max-query-capacity))
```
*All* queriers will handle requests for a tenant if neither limits are
applied.

**Which issue(s) this PR fixes**:
Fixes #<issue number>

**Special notes for your reviewer**:
noticed that we don't pass down the shuffle sharding limits for frontend
(only using it with schedulers)

https://github.com/grafana/loki/blob/26f097162a856db48ecbd16bef2f0b750029855b/pkg/loki/modules.go#L895
but the
[docs](https://github.com/grafana/loki/blob/26f097162a856db48ecbd16bef2f0b750029855b/pkg/validation/limits.go#L276)
mention that`frontend.max-queriers-per-tenant` applies to frontend as
well.
```
This option only works with queriers connecting to the query-frontend / query-scheduler, not when using downstream URL.
```

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [x] Documentation added
- [x] Tests updated
- [x] `CHANGELOG.md` updated
- [ ] If the change is worth mentioning in the release notes, add
`add-to-release-notes` label
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/setup/upgrade/_index.md`
- [ ] For Helm chart changes bump the Helm chart version in
`production/helm/loki/Chart.yaml` and update
`production/helm/loki/CHANGELOG.md` and
`production/helm/loki/README.md`. [Example
PR](grafana@d10549e)
- [ ] If the change is deprecating or removing a configuration option,
update the `deprecated-config.yaml` and `deleted-config.yaml` files
respectively in the `tools/deprecated-config-checker` directory.
[Example
PR](grafana@0d4416a)

---------

Co-authored-by: J Stickler <julie.stickler@grafana.com>
Co-authored-by: Danny Kopping <danny.kopping@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants