New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prometheus metrics #694
Conversation
ead525c
to
08b7d62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've gone through the PR and yeah this would require a bit of design changes. Overall the approach looks fine. Needs some more thought in designing the API for the internal metrics package, I suggested a simpler approach please have a look.
I've not yet gone through the Echo middleware part, shall do so once these changes are finalised.
8fe9250
to
86801f2
Compare
Almighty, I think this might be able to come out of draft now. I believe this should be in a better form than previously and closer to what you had in mind. One thing you'll note is that I did not go with the string concat to declare the metric name and labels inline. Hopefully, this is an acceptable compromise. I understand how the string concat gets it out of the way, line-count wise, but while metric names are technically labels at storage time on most backends, they're never referred as such in the Prometheus or VictoriaMetrics codebase. I felt that keeping the reference to the metric name being separate from the labels was worthwhile for "standards adherence". Either way, if this doesn't pass muster, it's not a big refactor. I'm hoping this is unobtrusive enough to slip by. 😄
Edit: Rebased on master and split into new cleaner commits. |
b8e8a92
to
2cd9384
Compare
2cd9384
to
99ca9b2
Compare
@avanier I'll checkout these changes locally and review it by this end of this week. |
@avanier Firstly, excellent work so far :) Most of the things are in-place, I just had to do some minor refactoring on how the config is loaded, function/variable naming and removing some redundant code that I found. I was not really sure if I can push the changes to your fork (if you give write access I can maybe). For now, I've pushed changes in this repo itself and you can see what is changed: bc7acfe. Please let me know if the proposed changes are fine. I've some minor concerns before we land this in upstream:
Apart from this, mostly looks LGTM! @knadh can do a final review once we discuss the above points. |
|
Okay, @avanier we need to take that in consideration in this PR. I can help out with frontend changes if you want.
We're not excluding 404s from the metric. Since we are embedding the HTTP handler path inside metrics, there's a chance of increasing high cardinality (which Prometheus or any time series DB doesn't like - as it makes things slower for them). For eg, on visiting
There could be millions of requests to random paths that don't exist, but they will end up creating unique metric labels (each label is a basically a unique time series). This would have downstream effects on Prometheus/metrics collection server. So, what the intended behavior should be, is we define a hard-coded string (like
Hm, yeah probably we can. This should work I think. |
@mr-karan I gave this a look. Many thanks for the edits! 🙏 I've cherry-picked bc7acfe into this branch. Normally you should have been able to push as I did enable the That said, I do have a few questions:
As for the reflection thing I've amended that to a simple @knadh Regarding the presence of these settings in the UI, I would personally keep it only in the config file. Considering there are no Prometheus solutions out there that can be configured by clicking, I don't think we'd be leaving too many users of that ecosystem stranded. Either way, I would argue this could be implemented separately if needed? (That and my Vue.js is more than rusty ... 😅 ) |
While this is subjective, as I said, listmonk only has config it requires to boot in the TOML file. That's a simple standard that can be followed or we end up in situations like this where every setting requires deliberation on where it should go. @mr-karan @avanier if the API is behind the BasicAuth, then it can just be on all the time, maybe. |
That's correct. I've implemented in such a way that if the config block isn't present (which it won't be for people upgrading), reasonable defaults are loaded and Prometheus metrics are exposed. I don't think having an option to disable metrics would make a significant difference, since anyway we are collecting metrics inside the function blocks. Disabling them would simply disable exposing the API Handler/Response. Don't think it's a big deal, anyway it's an internal API.
Yep, it'll be on all the time as discussed ^ .
Hm, seconded. Although I did had a chat about this with @knadh about exposing metrics like build/version info. Exposing that metric on every single request seemed wasteful. We can reconsider it though.
Fair point, I agree with you. We can remove it from config, make it hardcoded as a constant in the app as |
Alrighty, I'll get on making the changes. I started having a proper-er look and the Vue part is more straightforward than I expected. That said, I'm still facing significant challenges understanding how Right now, this is my general understanding of how the settings API and sequenceDiagram
participant main as main.go
participant init as init.go
participant koanf as koanf
participant config as config file
participant postgres
participant environment
participant request as HTTP request
main->>init: init()
init->>config: initConfigFiles()
config->>koanf: set config values from config file
init->>environment: [...] initConfigFiles()
environment->>koanf: set config values from environment
init->>postgres: initDB()
postgres->>koanf: set config values from database
main->>init: main()
init->>koanf: initConstants() queries koanf
koanf->>init: returns config values<br />this is where the app settings will be maintained for the API
main->>init: initMetrics()
init->>init: metrics module initialized
main->>main: Application is happy and runs
request->>main: request to /api/settings<br />this actually happends in handlers.go ... but this graph is big enough
main->>request: return App constants as JSON
Right now I've pushed c5b92ef to a separate branch. If I query
Once I have the edit: Inserting values manually to postgres in the settings table, I can see this behaves as expected in the API. So I'm really missing the link between |
Ah, I thought we weren't doing config or UI settings but planning to keep metrics always on behind the standard BasicAuth API. |
Errrr ... You had me under the impression that UI settings were a must for any exposed setting. Right now Prometheus would export the following:
If we make none of those things configurable ... that would solve the all those issues >_< . I wouldn't think of the Golang metrics as too expensive to collect and export, but few of them are actually very useful in production unless you're a Listmonk contributor who knows how to do Golang performance optimization. At-large HTTP metrics are usually the bread and butter of Prometheus queries, but they run the risk of being high cardinality depending on the size of your deployment. (Also what constitutes high cardinality depends on which Prometheus backend you use, so YMMV.) Either way, they're generally high-ish value. The bounce metrics and ship-rate metrics are indicative of immediately actionnable issues. TL;DR @knadh I don't mind going the extra mile and doing the UI work for toggles, but if you want to avoid this PR becoming a 🐰 🕳️ relegating the toggles to a separate PR could make a lot of sense. |
Making some toggle-able and others not isn't consistent. It's okay to have to metrics on all the time. I'm guessing that internally it's quite very cheap to just do |
+1 It makes sense for Listmonk to always expose metrics out of the box, as a good default. The current config looks like:
Both of these, don't really need to be "toggled", as a sane default both of them can always be true.
We took care of the cardinality issue by ensuring the labels aren't dynamic in any metric. Don't think this will be a problem in any listmonk deployment.
+1 @avanier we can probably run a HTTP Load Tests on some API endpoints. In all of the API endpoints, we have a middleware which does .Increment/.UpdateDuration and we can compare the same without metrics. |
Any updates on this? |
I was under the impression that there was a desire to do performance testing on this. Otherwise, going over what we discussed, would I be correct in assuming the only changes left is to remove any configuration toggles? This I could probably get in by end of week. |
this looks like an almost done PR and interesting feature, is there any chance to get it merged at some point? |
Yep, this is on my to-do to review and merge in the next release. |
Whew, I finally picked this up (for the upcoming release). Was making a few changes, primarily, moving config away from the config file into the settings UI. Was playing around with the |
Thinking about it, I think HTTP handler metrics aren't very meaningful in listmonk. There are only a handful of external URIs, and they vary widely with unique params. Not much value in tracking the ~4 public handlers. |
@knadh Sorry for the late answer. If I remember correctly I had limited the path depth collected for those metrics to avoid this cardinality exposition. It's also not the most important thing to monitor. It should be fine to drop the metric and revisit this later if needed. (The only legitimate use-case I can think of right now is catching errors on missing assets or the like. But that's something that could just as easily be done at the level of an ingress controller or a CDN.) edit: After double-checking, I indeed did not commit the change about path depth. |
Does this implementation put additional load on your server even if one doesn't use Prometheus at all? Or is it just some sort of REST endpoint that only gets active when called via the API? |
@ChrisTG742 apps event the Prometheus protocol (simple string formatting) which any system that can read the protocol can consume. It doesn't have to be Prometheus. From a load perspective, there wouldn't be an issue. But what we were discussing was whether fine grained per-URI metrics is actually useful in an app like listmonk which only has a tiny handful of public handlers.
@avanier I agree. Let's close this PR and revisit metrics support in the future when there's a stronger usecase. Thanks! |
This pull request implements #627 .
The goal here is to expose some basic metrics over a Prometheus endpoint.
This is very much a draft PR for now. I would expect this to get rebased into something cleaner before it's taken out of draft state.Undrafted, yay! \o/