-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure SD stopped working for VMSS instances after upgrade from 2.41 > 2.48 #13245
Comments
Looks like this is breaking because getNetworkInterfaceByID uses VMSS Network Interfaces have a very different Resource ID format to VM Network Interfaces, which is why there are two functions to return these. This used to work (before #11860) because getNetworkInterfaceByID used to just call GET on the NIC ID path. Doing it that way meant it didn't matter that the ID was formatted differently for each ID type. |
I've done a little bit of further testing on this. The
|
Have you tried current main of Prometheus with #13241 included (not expert in Azure here, but we recently merged, perhaps, related fix) |
@bwplotka that PR is about Public IPs. For VMSS, it is a primary private interface (internal IPs). |
Thanks, help wanted in fixing & testing. Also volunteers welcome to setup some integration test against Azure (we need sponsorship for that too, but perhaps some kind soul from Microsoft could help us 🙈 ). |
I'll take a look at this and see if I can get a fix. |
I think I have a fix for this: #13283 |
@daniel-resdiary amazing potential fix is merged to main. Before closing this issue, can somebody else confirm this helps e.g. @roman-vynar? You will need to use the main branch e.g. "prom/prometheus:main" docker image (EDIT: Its latest "main" tag includes the fix now). Once confirmed we can close this AND we can consider cherry-picking the fix for 2.49.0 release (rc.0 is out for now)🤗 |
Hmm, it partially works for me.
Not sure why, it is whether something else changed or my specific config with relabeling. |
Nice that the fix is already on its way. Until then I can confirm that a downgrade to 2.47.2 worked for us. |
Any update @roman-vynar? @SimonDreher does prom/prometheus:main works for you now? |
Yes, this works fine. |
@bwplotka looks like I have discovered another bug while testing this, not in v2.48.0 but it is in the Looks like the following commit 6de80d7
if you have 1+ jobs with the same type of SD, e.g. 2 jobs with Is it possible to verify that before a new release of prometheus? |
Nice, this commit is not in 2.49. I will create separate issue for it. Thanks! Closing this particular issue and will cherry-pick the fix onto 2.49 |
What did you do?
Azure SD stopped working for VMSS instances after upgrade from 2.41 > 2.48.
It was working fine with 2.41 for both VM and VMSS instances but after upgrade it is discovering only regular VMs.
Any instance within VM Scaling Set is not discovered and the error is returned (as many errors as VMSS you have):
Moreover, I checked the resource in error by following the path
/subscriptions/xxx-xxx-xxx-xxx/resourceGroups/xxx/providers/Microsoft.Compute/virtualMachineScaleSets/nomad-main-xxx-vmss/virtualMachines/282/networkInterfaces/primary.nic
and there is no problem to see it.What did you expect to see?
Azure SD working as previously.
What did you see instead? Under which circumstances?
Missing VMSS instances in Targets. Only VMs are present.
System information
No response
Prometheus version
No response
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response
The text was updated successfully, but these errors were encountered: