Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Service Discovery resource group filter #10365

Merged
merged 4 commits into from Mar 28, 2022
Merged

Azure Service Discovery resource group filter #10365

merged 4 commits into from Mar 28, 2022

Conversation

David-N-Perkins
Copy link
Contributor

Added an optional resource_group configuration for Azure service discovery, which limits the discovery when specified. See issue 10253.

Also updated a deprecated azure auth call.

David N Perkins added 2 commits February 25, 2022 13:45
Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
@David-N-Perkins
Copy link
Contributor Author

David-N-Perkins commented Feb 25, 2022

@roidelapluie I implemented this using a new config value. Let me know if there's another preferred method to get the resource group.

I also don't have access to an Azure account, so not sure how to test these changes. There's not many existing unit tests.

Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
@roidelapluie
Copy link
Member

Thanks for this! It seems like we would need azure users to test this.

@jrauschenbusch
Copy link

@David-N-Perkins Thanks for implementing this. A new optional config value seems to be the best solution for this topic. It's backwards compatible and provide a way to reduce the amount of ARM calls massively if it's a subscription containing a lot of VMs. I'll try out you code changes and give you feedback. But maybe there are other Azure users who can also give feedback. Especially the MSI auth change will be hard to test locally.

@jrauschenbusch
Copy link

@David-N-Perkins Sorry for the late response. I've now tested your branch for ~24h within an AKS cluster.

TL;DR: LGTM! @roidelapluie Please merge 😄

  • MSI authentication works (tested with azure pod identity in AKS)
  • Applying the resource groups parameter also works like a charm (ARM request report comparison (#calls before/#calls after) using Export-AzLogAnalyticRequestRateByInterval)
  • Backwards compatibility is given by omitting the resource_group config parameter (also tested via the report approach mentioned above)

I had previously accidentally built a container image from the main branch and was wondering why the parameter shouldn't exist in azure.plain (see log output below). Then I realized it was the wrong branch I was building the image on. But it was a good test that the parameter is accepted as soon as i was working on the correct branch.

Before using the right branch there was the following log output:

 ts=2022-03-23T09:59:23.929Z caller=main.go:870 level=error msg="Error reloading config" err="couldn't load configuration (--config.file=\"/etc/config/prometheus.yml\"): parsing YAML file /etc/config/prometheus.yml: yaml: unmarshal errors:\n  line XXX: field resource_group not found in type azure.plain" 

After using the right branch everything worked as expected.

Good job! 🚀

@roidelapluie
Copy link
Member

But is this a beaking change for users? Do they need to change their credentials ?

@jrauschenbusch
Copy link

jrauschenbusch commented Mar 23, 2022

No, it's not a breaking change. I've just tested if the MSI authentication still works with the changes made in this PR. Service Principal and OAuth authentication were not touched and should work as before:

oauthConfig, err := adal.NewOAuthConfig(activeDirectoryEndpoint, cfg.TenantID)

spt, err = adal.NewServicePrincipalToken(*oauthConfig, cfg.ClientID, string(cfg.ClientSecret), resourceManagerEndpoint)

Just the MSI authentication was replaced, as the current methods are deprecated:

https://github.com/Azure/go-autorest/blob/b3899c1057425994796c92293e931f334af63b4e/autorest/adal/token.go#L738

Furthermore one can omit the resource_group parameter and then the behavior is the same as before. Hence also not a breaking change.

var err error
if len(resourceGroup) == 0 {
var rtn compute.VirtualMachineScaleSetListWithLinkResultPage
rtn, err = client.vmss.ListAll(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check the error here to avoid deferring a nil pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

result = &rtn
} else {
var rtn compute.VirtualMachineScaleSetListResultPage
rtn, err = client.vmss.List(ctx, resourceGroup)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@roidelapluie
Copy link
Member

thanks. I have spin up azure vm's to test this, it indeed works. Added a comment.

Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>
@roidelapluie roidelapluie merged commit b13aec9 into prometheus:main Mar 28, 2022
@roidelapluie
Copy link
Member

Thanks!

@David-N-Perkins David-N-Perkins deleted the azure-resource-group-filter branch March 29, 2022 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants