Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/stats and /metrics API calls do not return data for functions that have not been called #368

Closed
nigeldeakin opened this issue Sep 27, 2017 · 9 comments

Comments

@nigeldeakin
Copy link
Contributor

The /stats API call returns a map containing queued/running/completed/failed stats for each function, as well as global totals.

However this does not include functions that have not yet been called. This is no what the user (or the UI tool) would expect. If a function has been created then /stats should return queued/running/completed/failed stats for it. Obviously the values would all be zero until the function was actually called.

@nigeldeakin
Copy link
Contributor Author

Required for fnproject/ui#18

@nigeldeakin
Copy link
Contributor Author

This issue also affects Prometheus metrics.

@treeder
Copy link
Contributor

treeder commented Oct 6, 2017

I believe this is because async calls go directly on the queue without hitting the database first. @rdallman can confirm. If that's the case, it would probably make sense to hit the database with "queued" state, or at least send a queued event to stats.

@rdallman
Copy link
Contributor

rdallman commented Oct 6, 2017

yea, related to #281 and #155

just moving the stats.Queued() call to agent.GetCall will probably do the trick (and then our Stats struct thing doesn't have to leak into the front end as much...)

@nigeldeakin nigeldeakin changed the title /stats API call does not return data for functions that have not been called /stats and /metrics API calls does not return data for functions that have not been called Oct 9, 2017
@nigeldeakin
Copy link
Contributor Author

nigeldeakin commented Oct 9, 2017

This isn't about the stats.Queued() call. The problem is that neither the stats structure, nor the data held in the Prometheus client, know anything about functions/routes until they have been called.

To fix this issue we need a new function that is called, once for every route in the database, when the server is started and subsequently whenever a new route is created. This function would update stats and the Prometheus client with initial metrics for that route, with queued, running, completed and failed all set to zero. Writing that function is straightforward, but where would we call it from? I'm looking for some existing code which is executed at startup which reads all the routes from the database.

@nigeldeakin nigeldeakin changed the title /stats and /metrics API calls does not return data for functions that have not been called /stats and /metrics API calls do not return data for functions that have not been called Oct 9, 2017
@rdallman
Copy link
Contributor

rdallman commented Oct 9, 2017

To fix this issue we need a new function that is called, once for every route in the database, when the server is started and subsequently whenever a new route is created.

since we are planning to use an external aggregator service, i don't think we need to add too much machinery here (at the cost of slight precision loss around fn server failures, which I don't think matters so much).

if we call stats.Queued() from agent.GetCall then in theory that data gets sent out or pulled from statsd / prometheus (respectively) within some polling interval, so I don't think it's worth checking the db really. also, since we're running distributed, on startup we can't really have every fn server add up every queued call in the db, otherwise if there are e.g. 100 queued calls and 3 fn servers restart, prom would pull that 300 are queued, and i don't think we can mix gauge and counter very easily.

@nigeldeakin
Copy link
Contributor Author

nigeldeakin commented Oct 10, 2017

This issue isn't about "queued calls". It's whether the Prometheus scraper should receive metrics about routes that exist but which haven't been called (or queued).

It sounds (from discussion here and elsewhere) that the current behaviour is considered OK: Prometheus should only receive information about things that happened since the server was started. So I can close this issue. Thanks for the feedback.

As for calling stats.Queued() from agent.GetCall: why is that better than calling it just once as now, when the call is enqueued? Currently when a call is enqueued the Prometheus client is notified so it can increase its counter. The Prometheus server can then scrape the value of this counter (by calling /metrics) any time it likes. Or is the suggestion that we change this from a counter to a gauge whose value is maintained within the Fn server itself? In any case that change is not related to this issue.

@rdallman
Copy link
Contributor

As for calling stats.Queued() from agent.GetCall: why is that better than calling it just once as now, when the call is enqueued?

well, just the positioning I think. GetCall is called before queueing the call to the MQ, so while it sits on the MQ, prometheus will have a counter incremented for it. whereas right now it's in Submit, so only after the call gets picked off the MQ (could be seconds, minutes, hours after it was actually queued) will the counter get incremented. there is some consideration for calls that may get pulled off the MQ multiple times (for reasons of failing previously/timeouts/etc), this is an issue in the current spot as well as in GetCall without certain care. I think originally this is how I interpreted this issue, though now I understand it's something else. in any event, this is also going on.

@rdallman
Copy link
Contributor

rdallman commented Feb 6, 2018

closing, don't think we need to have zeroed stats for routes that have yet been invoked if i understand correctly

@rdallman rdallman closed this as completed Feb 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants