Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(helm): Adding KEDA autoscaling support #7282

Merged
merged 62 commits into from
Feb 13, 2024

Conversation

beatkind
Copy link
Contributor

@beatkind beatkind commented Feb 3, 2024

What this PR does

This PR takes #4687 as base layer and tries to implement all added code comments. Please be aware that this PR does not cover the changes in the monitoring dashboards that may be needed.

Which issue(s) this PR fixes or relates to

Fixes #3430

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@beatkind beatkind requested a review from a team as a code owner February 3, 2024 15:40
@CLAassistant
Copy link

CLAassistant commented Feb 3, 2024

CLA assistant check
All committers have signed the CLA.

@beatkind
Copy link
Contributor Author

beatkind commented Feb 3, 2024

@dimitarvdimitrov I have opened this PR and took #4687 as a base. I have ported the changes from your PR #6971 into here, as you have mentioned in this comment.

I know that there is still documentation missing for this PR. Could you, nevertheless, have a look over the changes?

@dimitarvdimitrov
Copy link
Contributor

many thanks for picking this up. I will set some time aside this week to review it

@dimitarvdimitrov
Copy link
Contributor

Have you had the chance to test this out on a mimir cell? It would be nice to do that before we merge the PR

@beatkind
Copy link
Contributor Author

beatkind commented Feb 7, 2024

I did tried it in a minikube setup, but not yet in a real world scenario.

Do you feel confident in merging it as experimental feature? This would give me the option to test it in our production cluster.

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, great work @beatkind! Thank you for picking this up.

Review

The PR is in a really good shape. I left a few minor comments and I think they'll be quick to address. Can you take a look? I'm happy to merge this as an experimental feature after addressing them.

I apologise for the delayed review, but I couldn't find a large-enough block of time to review this (it's a hefty PR). I'll try to be more responsive for the next iterations.

Docs

Perhaps the major reason for writing docs is to cover the migration procedure for enabling autoscaling. At Grafana Labs we had to go through a migration procedure to make sure that there aren't any disruptions. The main problem was that removing the replicas field causes the deployment to scale to 1 replica if the HPA hasn't had the time to take control, which is very likely if you're removing replicas and creating the HPA in one go.

I think a reasonable migration procedure would be to add a config flag to kedaAutoscaling which preserves replicas (e.g. *.kedaAutoscaling.keepReplicas). If you'd like to add that in this PR you're welcome, but this PR is already quite large and is already a beneficial addition. I opened this issue to track the migration docs and configuration #7367

Do you agree with this? Maybe I'm missing something about the migration.

Future work

There some bits which are missing from this PR which I think we should add before calling this work complete. I've added them in another issue #7368. As the one opening this PR, would you like to take ownership of these follow-up actions? It's ok to say no, just decided to offer you the chance.

also cc @krajorama in case you want to review this too

@beatkind
Copy link
Contributor Author

@dimitarvdimitrov many thanks for the review! I am happy to add missing parts that you have mentioned.

I also totally agree with the migration path. You are right, this is already a big PR. I will add the things you have mentioned, let then merge it as experimental. I will also add a comment regarding to be careful regarding the mentioned replica situation.

I am happy to take ownership of the issues you have created and I plan to open another PR once that is merged introducing the migration to autoscaler and adding docs :)

Thanks you very much!

…perimental state of the feature, using the same autoscaling rules & behaviours as jsonnet, only supporting newest version of KEDA
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one comment on the querier SO and a suggestion to satisfy the linter

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your work on this @beatkind 😄

@dimitarvdimitrov dimitarvdimitrov enabled auto-merge (squash) February 13, 2024 12:37
@dimitarvdimitrov
Copy link
Contributor

because of some drift with main, there are some changes which aren't reflected in the rendered manifests for keda-autoscaling. Can you rebase your branch on top of grafana:main or merge with it, rerun make build-helm-tests locally and push?

@beatkind
Copy link
Contributor Author

@dimitarvdimitrov already on my way

narqo and others added 4 commits February 13, 2024 13:47
…afana#7269)

* helm: align grpc server connection lifetime settings with jsonnet

Co-authored-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>

* helm: rebuild tests

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>

---------

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
…rafana#7290)

* querymiddleware: race condition in shardActiveSeriesMiddleware

s2.Writer is not goroutine-safe to reuse between concurrent
requests.

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>

* querymiddleware: remove flaky ResponseBodyStreamed test from shardActiveSeriesMiddleware

---------

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
Update pkg/util/version/info.go
Update pkg/mimirtool/client/client.go

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
Co-authored-by: Andy Asp <90626759+andyasp@users.noreply.github.com>
…onfig (grafana#7298)

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
renovate bot and others added 15 commits February 13, 2024 13:47
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
* Jsonnet / Helm: improve distributors graceful shutdown

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fix Helm distributor termination grace period

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: grafanabot <grafanabot@grafana.com>
…fault to true, and deprecate (grafana#7366)

* Distributor: Change -distributor.enable-otlp-metadata-storage flag's default to true and deprecate

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
…-distributor.limit-inflight-requests-using-grpc-method-limiter as stable and enable it by default (grafana#7360)

* Mark -ingester.limit-inflight-requests-using-grpc-method-limiter and -distributor.limit-inflight-requests-using-grpc-method-limiter as stable and enable it by default

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Improved tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Improved tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Fixed unit tests

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Mark config as deprecated

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
…rafana#7342)

* Do not consider out-of-order blocks when filtering compactable jobs

Co-authored-by: Marco Pracucci <marco@pracucci.com>
* mimir: inject span profiler into tracer

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>

* Update changelog

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>

---------

Signed-off-by: Vladimir Varankin <vladimir.varankin@grafana.com>
* Add experimental partitions ring lifecycler support

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>

* Use http.Error()

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Skip deprecated linter check for code that will be soon replaced

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Updated dskit

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Add partition id to logs

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added comment

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Added TestIngester_ShouldNotCreatePartitionIfThereIsShutdownMarker

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
auto-merge was automatically disabled February 13, 2024 12:51

Head branch was pushed to by a user without write access

@dimitarvdimitrov
Copy link
Contributor

I believe this still needs one more make build-helm-tests

@dimitarvdimitrov dimitarvdimitrov enabled auto-merge (squash) February 13, 2024 13:15
@dimitarvdimitrov dimitarvdimitrov merged commit 55950ca into grafana:main Feb 13, 2024
30 checks passed
@beatkind beatkind deleted the add-helm-keda branch February 13, 2024 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for HPA in mimir-distributed