adding support for azure managed prometheus #4256

raggupta-ms · 2023-02-21T06:28:50Z

Adding Azure Workload Identity and Azure Pod Identity Auth support to existing Prometheus scaler. This will enable Azure Managed Prometheus use case with KEDA.

Checklist

When introducing a new scaler, I agree with the scaling governance policy
I have verified that my change is according to the deprecations & breaking changes policy
Tests have been added
Changelog has been updated and is aligned with our changelog requirements
A PR is opened to update our Helm chart (repo) (if applicable, ie. when deployment manifests are modified)
A PR is opened to update the documentation on (repo) (if applicable)
Commits are signed with Developer Certificate of Origin (DCO - learn more)

Relates to #4153

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

raggupta-ms · 2023-02-21T06:34:30Z

@tomkerkhove @JorTurFer Created this PR. I am working on e2e tests, changelog and documentation parallely. I understand that this PR will not be merged until those are pending. Still raising this so that you can help in revieweing the changes and we are not blocked, in case you suggest any code changes.

Please take a look. Thanks!

JorTurFer

Looking good in general, I'll wait till all the things are ready, but in the meantime I have kept some comments inline.
BTW, thanks for this improvement.
For e2e tests, do you need some help?
You have to open a PR to the infra repo using terraform in order to create the needed resources

pkg/scalers/azure/azure_managed_prometheus_http_round_tripper.go

tomkerkhove · 2023-02-21T08:18:49Z

You have to open a PR to the infra repo using terraform in order to create the needed resources

Terraform does not support it yet so we did it manually (as I shared offline with you :))

raggupta-ms · 2023-02-21T08:43:44Z

For e2e tests, do you need some help?

@JorTurFer Thanks for offering the help, I might need some actually! I am little confused on the flow of e2e tests, for example are all tests run in Azure env? Is it a requirement to run tests on ezisting AKS clusters? what if I create new for Azure Mnaged Prometheus - how would nightly and pr runs get affected? Azure Managed Prometheus and native Prometheus might not work on 1 cluster at a time. Yes I can think of cleanup but then, if installing and uninstalling an "addon" on AKS cluser is supported right now by test framework? And may be couple of more ques..

I am afraid, figuring all that by mysef will take time and I may miss next release. Is it possible to connect on a quick call? What would be your preferred time? I can ping you on slack..

pkg/scalers/azure/azure_managed_prometheus_http_round_tripper.go

pkg/scalers/prometheus_scaler.go

JorTurFer · 2023-02-21T10:53:55Z

Terraform does not support it yet so we did it manually (as I shared offline with you :))

We can deploy it using terraform + ARM, but I wouldn't add infra manually if it isn't the only option. I can tackle that PR for adding the manged Prometheus if @raggupta-ms faces with problems

tomkerkhove · 2023-02-21T14:19:43Z

Terraform does not support it yet so we did it manually (as I shared offline with you :))

We can deploy it using terraform + ARM, but I wouldn't add infra manually if it isn't the only option. I can tackle that PR for adding the manged Prometheus if @raggupta-ms faces with problems

As mentioned, it's not supported yet :)

JorTurFer · 2023-02-22T12:58:00Z

As mentioned, it's not supported yet :)

I'm almost there :)
All the resources are already created, only something is still missing, I need to figure out why the cluster can't access to the Prometheus, but I think that all needed resources are created 😄

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

tomkerkhove · 2023-02-23T15:11:53Z

Thank you @JorTurFer for the automation!

JorTurFer · 2023-02-23T15:13:06Z

You're welcome. It has been an interesting exercise 😝

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

raggupta-ms · 2023-02-24T17:17:00Z

As mentioned, it's not supported yet :)

I'm almost there :) All the resources are already created, only something is still missing, I need to figure out why the cluster can't access to the Prometheus, but I think that all needed resources are created 😄

Thank you very much @JorTurFer for helping out on this. It was not straight forward but you made it work :)

raggupta-ms · 2023-02-24T17:19:50Z

@tomkerkhove @JorTurFer The code changes are done and updated in the PR. Please review.
I am working on e2e and hopefully, will be updating the PR with e2e tests by EOD today if all goes fine.
Will also update docs and other pending tasks in the checklist.

btw, do you know why e2e tests are not running in this PR? Seems like its queued but not getting picked up any worker. Do I need to do anythig special to triggere tests on the PR?

vendor/github.com/Azure/go-autorest/autorest/azure/environments.go

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

tests/scalers/prometheus/azure_managed_prometheus/azure_managed_prometheus_test.go

JorTurFer · 2023-02-27T21:29:12Z

I'm not sure if we should place the e2e test inside Prometheus folder or inside azure folder. @kedacore/keda-core-contributors ?

pkg/scalers/azure/azure_managed_prometheus_http_round_tripper.go

JorTurFer · 2023-02-27T21:32:04Z

/run-e2e prometheus*
Update: You can check the progress here

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer · 2023-02-27T22:44:14Z

/run-e2e azure_managed_prometheus*
Update: You can check the progress here

raggupta-ms · 2023-02-27T23:30:14Z

I'm not sure if we should place the e2e test inside Prometheus folder or inside azure folder. @kedacore/keda-core-contributors ?

I added that here since azure managed prometheus doesn't have its own scaler, but an extension to existing prometheus scaler. But I am open to putting beside other azure scalers since that also makes sense, based on what people think.

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer · 2023-03-03T07:38:41Z

/run-e2e prometheus*
Update: You can check the progress here

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer

/run-e2e prometheus*

JorTurFer · 2023-03-04T08:18:35Z

/run-e2e prometheus*
Update: You can check the progress here

JorTurFer · 2023-03-04T20:18:30Z

/run-e2e prometheus*
Update: You can check the progress here

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer · 2023-03-05T20:13:57Z

/run-e2e prometheus*
Update: You can check the progress here

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer · 2023-03-06T07:08:20Z

/run-e2e prometheus*
Update: You can check the progress here

JorTurFer · 2023-03-06T08:00:14Z

/run-e2e prometheus*
Update: You can check the progress here

JorTurFer · 2023-03-06T09:59:51Z

I have noticed that all the time the first try fails and the second try works, maybe we need to increase timeouts somewhere

tomkerkhove · 2023-03-06T10:00:56Z

Given it works but it's a test optimization, we might want to do that in a seperate PR then - Make sense?

JorTurFer · 2023-03-06T10:01:53Z

Given it works but it's a test optimization, we might want to do that in a seperate PR then - Make sense?

I think that a test which requires being executed twice always is a flaky test. It isn't an optimization IMO
I'd like to see the e2e test passing correctly during the first execution, the retries are a hack to reduce the failing tests, not a solution for flaky tests

tomkerkhove · 2023-03-06T10:03:36Z

Agreed but my point was more that I don't think it should block this PR from being merged and we can do it async?

zroubalik · 2023-03-06T10:21:14Z

We should try to avoid merging flaky tests if possible 🙏 This one seems like an easy fix, so could be easily included in this PR.

And the current behaviour also extends the execution time of e2e test suite, which is not great 😄

raggupta-ms · 2023-03-06T19:30:04Z

@tomkerkhove @JorTurFer @zroubalik regarding the flakyness of test, I think it makes sense to increase the timeout. Since this is managed prometheus running on cloud there are delays. Roughly like this:

collector agent reaction on config map update: 1m
scrape interval: 1m
ingestion dealy: 1-2m
query in scaled object look back: 2 min

so test can be flaky as it waits for scale out/in for 5 min. I will increase that timeout to 7min to be safe. Let me know if there are any concerns. Small fix, will update PR in shortly.

Edit:
I just noticed that the timeout is for 10 min already in test. I am adding 2 more mins now, if it still is flaky, we will see. There isn't much that I can do on the ingestion delays. I can try modfying scrape interval or query lookback if it still fails.

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer · 2023-03-06T19:43:34Z

@raggupta-ms
I have noticed that you deploy the Prometheus configMap at the end of the setup file. I'd move it above the workload identity tests, because the tests in the file are executed from top to bottom. If you move it up, the configMap which configures the prometheus will be applied some minutes before.
If you move it above func TestSetupWorkloadIdentityComponents(t *testing.T) {, you will earn 4-6 minutes having the configMap ready in the cluster for starting the scrapping.

My proposal is to move this code:

func TestSetupAzureManagedPrometheusComponents(t *testing.T) {
	// this will install config map in kube-system namespace, as needed by azure manage prometheus collector agent
	KubectlApplyWithTemplate(t, helper.EmptyTemplateData{}, "azureManagedPrometheusConfigMapTemplate", helper.AzureManagedPrometheusConfigMapTemplate)
}

between TestSetupHelm and TestSetupWorkloadIdentityComponents

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

raggupta-ms · 2023-03-06T19:48:43Z

@raggupta-ms I have noticed that you deploy the Prometheus configMap at the end of the setup file. I'd move it above the workload identity tests, because the tests in the file are executed from top to bottom. If you move it up, the configMap which configures the prometheus will be applied some minutes before. If you move it above func TestSetupWorkloadIdentityComponents(t *testing.T) {, you will earn 4-6 minutes having the configMap ready in the cluster for starting the scrapping.

My proposal is to move this code:
func TestSetupAzureManagedPrometheusComponents(t *testing.T) {
	// this will install config map in kube-system namespace, as needed by azure manage prometheus collector agent
	KubectlApplyWithTemplate(t, helper.EmptyTemplateData{}, "azureManagedPrometheusConfigMapTemplate", helper.AzureManagedPrometheusConfigMapTemplate)
}
between TestSetupHelm and TestSetupWorkloadIdentityComponents

good call out, made that change. Also increased timout by 2 min. it should succeed now. Let's see.. please trigger e2e..

JorTurFer · 2023-03-06T19:50:52Z

/run-e2e prometheus*
Update: You can check the progress here

JorTurFer · 2023-03-06T20:22:03Z

/run-e2e prometheus*
Update: You can check the progress here

JorTurFer · 2023-03-06T20:23:51Z

It seems that the change during the setup has been enough

JorTurFer

LGTM!

raggupta-ms · 2023-03-06T21:35:56Z

@tomkerkhove @JorTurFer @zroubalik Thankyou for all the help and support throught! I learned a lot during this contribution.

JorTurFer · 2023-03-06T21:39:56Z

You're welcome! Thanks to you for the contribution! :)

adding support for azure managed prometheus

30ec427

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

raggupta-ms requested a review from a team as a code owner February 21, 2023 06:28

raggupta-ms mentioned this pull request Feb 21, 2023

Provide support for using Prometheus scaler with Azure Monitor managed service for Prometheus (preview) #4153

Closed

JorTurFer reviewed Feb 21, 2023

View reviewed changes

pkg/scalers/azure/azure_managed_prometheus_http_round_tripper.go Outdated Show resolved Hide resolved

pkg/scalers/azure/azure_managed_prometheus_http_round_tripper.go Outdated Show resolved Hide resolved

tomkerkhove reviewed Feb 21, 2023

View reviewed changes

pkg/scalers/azure/azure_managed_prometheus_http_round_tripper.go Show resolved Hide resolved

pkg/scalers/prometheus_scaler.go Show resolved Hide resolved

review comments

21d6f65

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

raggupta-ms added 2 commits February 23, 2023 13:00

fix statci check failure

7480787

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

fix static check

b822e5b

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer reviewed Feb 25, 2023

View reviewed changes

vendor/github.com/Azure/go-autorest/autorest/azure/environments.go Outdated Show resolved Hide resolved

raggupta-ms added 3 commits February 27, 2023 12:33

add e2e tests

0a8aad4

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

Merge remote-tracking branch 'upstream/main'

35b2ec9

add changelog

b24e3ce

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer reviewed Feb 27, 2023

View reviewed changes

tests/scalers/prometheus/azure_managed_prometheus/azure_managed_prometheus_test.go Outdated Show resolved Hide resolved

JorTurFer reviewed Feb 27, 2023

View reviewed changes

pkg/scalers/azure/azure_managed_prometheus_http_round_tripper.go Outdated Show resolved Hide resolved

fixing static check and adding env var check

ec2557c

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

adding e2e for pod identity

ad1d61f

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

Merge remote-tracking branch 'upstream/main'

6220356

e2e fix

7ff0403

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer reviewed Mar 3, 2023

View reviewed changes

deploying config map as part of one time KEDA setup

ff05b16

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

moving config map to helper package to resolve linking issues

a95dd49

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

tomkerkhove approved these changes Mar 6, 2023

View reviewed changes

increasing test timeout

0c513f5

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

moving config map setup

8a1dd26

Signed-off-by: Raghav Gupta <guptaraghav16@gmail.com>

JorTurFer approved these changes Mar 6, 2023

View reviewed changes

JorTurFer merged commit 7238408 into kedacore:main Mar 6, 2023

xoanmm pushed a commit to xoanmm/keda that referenced this pull request Mar 22, 2023

adding support for azure managed prometheus (kedacore#4256)

2a9ea95

adding support for azure managed prometheus #4256

adding support for azure managed prometheus #4256

Conversation

raggupta-ms commented Feb 21, 2023 • edited Loading

Checklist

raggupta-ms commented Feb 21, 2023

JorTurFer left a comment • edited Loading

Choose a reason for hiding this comment

tomkerkhove commented Feb 21, 2023

raggupta-ms commented Feb 21, 2023 • edited Loading

JorTurFer commented Feb 21, 2023 • edited Loading

tomkerkhove commented Feb 21, 2023

JorTurFer commented Feb 22, 2023

tomkerkhove commented Feb 23, 2023

JorTurFer commented Feb 23, 2023

raggupta-ms commented Feb 24, 2023

raggupta-ms commented Feb 24, 2023 • edited Loading

JorTurFer commented Feb 27, 2023

JorTurFer commented Feb 27, 2023 • edited by github-actions bot Loading

JorTurFer commented Feb 27, 2023 • edited by github-actions bot Loading

raggupta-ms commented Feb 27, 2023

JorTurFer commented Mar 3, 2023 • edited by github-actions bot Loading

JorTurFer left a comment

Choose a reason for hiding this comment

JorTurFer commented Mar 4, 2023 • edited by github-actions bot Loading

JorTurFer commented Mar 4, 2023 • edited by github-actions bot Loading

JorTurFer commented Mar 5, 2023 • edited by github-actions bot Loading

JorTurFer commented Mar 6, 2023 • edited by github-actions bot Loading

JorTurFer commented Mar 6, 2023 • edited by github-actions bot Loading

JorTurFer commented Mar 6, 2023

tomkerkhove commented Mar 6, 2023

JorTurFer commented Mar 6, 2023 • edited Loading

tomkerkhove commented Mar 6, 2023

zroubalik commented Mar 6, 2023 • edited Loading

raggupta-ms commented Mar 6, 2023 • edited Loading

JorTurFer commented Mar 6, 2023 • edited Loading

raggupta-ms commented Mar 6, 2023

JorTurFer commented Mar 6, 2023 • edited by github-actions bot Loading

JorTurFer commented Mar 6, 2023 • edited by github-actions bot Loading

JorTurFer commented Mar 6, 2023

JorTurFer left a comment

Choose a reason for hiding this comment

raggupta-ms commented Mar 6, 2023

JorTurFer commented Mar 6, 2023

raggupta-ms commented Feb 21, 2023 •

edited

Loading

JorTurFer left a comment •

edited

Loading

raggupta-ms commented Feb 21, 2023 •

edited

Loading

JorTurFer commented Feb 21, 2023 •

edited

Loading

raggupta-ms commented Feb 24, 2023 •

edited

Loading

JorTurFer commented Feb 27, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Feb 27, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Mar 3, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Mar 4, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Mar 4, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Mar 5, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Mar 6, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Mar 6, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Mar 6, 2023 •

edited

Loading

zroubalik commented Mar 6, 2023 •

edited

Loading

raggupta-ms commented Mar 6, 2023 •

edited

Loading

JorTurFer commented Mar 6, 2023 •

edited

Loading

JorTurFer commented Mar 6, 2023 •

edited by github-actions bot

Loading

JorTurFer commented Mar 6, 2023 •

edited by github-actions bot

Loading