-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(metrics): Reverts deletion of SpectatorMeterRegistry and enables opt in of breaking change to prometheus metrics/dashboards through the monitoring daemon. #725
Conversation
…pt in of breaking change to prometheus metrics/dashboards through the monitoring daemon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me but I'm not familiar with this area of work so I'll defer to Cam or Chris for the final say. AFAIK, we don't have a path for backporting Kork changes to patch releases so I'm not sure what we'll do to alleviate the actual regression in 1.20.
@plumpy @ezimanyi @maggieneterval - any thoughts here?
@ethanfrogers it seems like PRs that are being merged right now are getting the target 1.20 label. |
I think the fact that PRs are being tagged with 1.20 is a bug in spinnakerbot (likely because we have a Ultimately, I'll leave it to the release manager to decide whether a cherry-pick of this would follow our cherry-pick policy, but if it does the rough steps are:
I know that @jonsie and @link108 recently went through this process for some plugin work (maybe ~2 months ago) so they probably have the most recent experience with how to do the above. |
@ezimanyi does bintray / maven let you do EDIT: Scrap that, I guess in theory you would just bump the patch and make the 1.20 branches use the latest patch on what ever minor release they are on. |
Does it sound reasonable that we just leave it broken in 1.20 and fix it for 1.21 and communicate that the fix is to not use 1.20? @ezimanyi, et al? |
I think in the past we might have picked a kork version like I'll defer to the release manager, but in general the idea of saying that 1.20 is completely broken doesn't sound great to me (unless this really only affects a tiny subset of users). If this was a regression and we can fix it safely, we probably should, even if it's more work than a normal fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to address this in the spinnaker-monitoring daemon? This seems like a heavy handed approach for what is essentially an integration issue with a specific tool. I'd rather keep Kork simple and not introduce another feature flag for all the Spinnaker services, and instead put a change in spinnaker-monitoring that optionally skips appending _value
to metrics with statistic
on them. Would that make sense? I'm not really sure what _value
in spinnaker-monitoring is supposed to accomplish.
+1 to @jonsie - at this point there isn't clear ownership on spinnaker-monitoring and it is basically abandonware. If we feel that is a core offering someone needs to step up and keep it relevant |
@jonsie re: ☝️ You can't really fix this in the monitoring daemon as you don't really know what metrics had the statistic tag before and which ones had it newly added. If this is the approach we wanted, I would say to just update the dashboards as it would probably be easier. I personally am not a fan of the monitoring daemon and the extra layer of indirection it adds and would prefer the plugin approach that I have been working on (using Micrometer's integration with SB). We are pushing the observability plugin which will have its own set of dashboards but eliminates the need for the daemon to run on each pod. If I was going to take time to fix/create Prometheus dashboards, it would probably be for the plugin I have been working on, which has its own Prometheus metrics naming convention and not the daemon. Let me talk internally about a couple options. |
@cfieber I do not want to support it, and have created a plugin that basically replaces it and will be supporting that model. |
Yes, IFF it's updating the dashboards and optionally making changes in the naming convention.
I'll +1 this.
It's the OPs attempt at accomplishing the following: https://prometheus.io/docs/practices/naming/#metric-names Although neither the naming convention or the official micrometer integration have Personally, I think the Prometheus naming convention, or rather the action of renaming metrics, adds unneeded developer friction. The current immediate plan is for @jasonmcintosh to update the dashboards by appending Our longterm plan is to create a new set of dashboards that are compatible with our plugin and declare that those are the only ones we support. |
Just an FYI - also hit some interesting scaling differences - apparently micrometer uses seconds where spectator was using nanoseconds - this is on totalTime vs. total. SO the conversion isn't super "simple". ALSO with the removal of hystrix, a lot of dashboards just need updating period. That's a more interesting challenge. Will try and get updated dashboards out, but I don't guarantee they'll be "great" with out more peer review of them... |
@fieldju Is your plugin open-source or is it Armory-only? |
Open Source :) |
https://github.com/armory-io/spinnaker-monitoring/tree/UpdatedDashboards/spinnaker-monitoring-third-party/third_party/prometheus FYI I've got some updated dashboards here. I ALSO have an updated docker file that fixes some vulnerabilties. JUST in case others find this thread. Note, these are raw json exports, so they don't have the "go replace the datasource" kinda stuff that the scripts added. I figure someone wants that, can do it other ways. |
In PR #592 the default Spectator registry was changed from the default Spectator registry to a Spectator Micrometer registry.
Having the micrometer registry is nice because it enables things such as including Micrometer registries on the classpath and having them just work, or creating plugins to configure micrometer such as the Armory Observability Plugin
However: The Micrometer Registry L50 adds a statistic tag to all Metrics.
The monitoring daemon appends
__value
to all metrics that contain the tagstatistic
These 2 things in combination mean that
1.20
introduced a BREAKING change that broke the existing Prometheus dashboards and alerting when using the monitoring daemon, as it changed the name of almost all metrics being reporting to Prometheus and potentially other metric integrations.This PR partially reverts #592 and makes it opt-in by supplying
spectator.micrometerRegistry.enabled: true
as a Springboot prop.I also tried to get fancy with making spring conditionally configure micrometer vs the default reg when custom Micrometer registries were supplied without any luck and had to add the explicit feature flag
spectator.micrometerRegistry.enabled:
cc: @Aloren