Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric Collection and Montioring #731

Open
IMAM9AIS opened this issue Sep 9, 2019 · 4 comments
Open

Metric Collection and Montioring #731

IMAM9AIS opened this issue Sep 9, 2019 · 4 comments

Comments

@IMAM9AIS
Copy link
Contributor

IMAM9AIS commented Sep 9, 2019

Hi,

We have been trying to use JEG in our Production systems and with time it is becoming increasingly necessary for us to collect and monitor the metric around the kernels being spawned, users using them, and the kind of requests being made to the JEG servers.

This being said, can we receive any guidance on how we should proceed with the aggregation of these metrics from the JEG servers for it be logged for monitoring purposes. To start with, we are thinking of incorporating the collection and monitoring of these metrics through "STATSD" library. https://pypi.org/project/pystatsd/

I am not sure if this really qualifies to be an issue, but this surely can be a feature add with this being the starting point.

We are looking to collect following generic information around the setup.

  • Average number of active kernels per user.
  • Total number of active kernels.
  • Number of active users.
  • Number of active kernels per OS type (Client OS).

RPS on JEG.

  • Kernel launch requests
  • Refresh/reconnect requests
  • Get kernel/kernelspec requests
  • Shutdown/restart kernel requests etc.
@kevin-bates
Copy link
Member

@IMAM9AIS - this would be fantastic! This seems to imply that we'd want to have our own handlers in place since some of these probably warrant updates to those locations - although I suppose that could be a discussion point.

With the persistent kernel session stuff, we already track kernels per user and can get total active kernels and users.

I don't know how much overlap there is with pystatsd, but I think it would be good to take a look at the telemetry stuff (event logging) that is underway in a couple other Jupyter projects (Hub and Lab) from a synergy perspective. On the surface, that appears to be more of an auditing thing than metrics. That said, there are other metric pieces (via prometheus) in place in various projects as well. I just want to make sure we're not adding yet another framework to the ecosystem when others exist and are adequate for our needs.

I hope that's helpful.

@esevan
Copy link
Contributor

esevan commented Sep 10, 2019

there are other metric pieces (via prometheus) in place in various projects as well.

I love prometheus, too 👍

@IMAM9AIS
Copy link
Contributor Author

@kevin-bates @esevan Sounds good.
We actually came across this PR that was added to notebook server to use Prometheus to push metrics.
https://github.com/jupyter/notebook/pull/3490

However, while using JEG, this PR does not seem to be enabled in JEG.
We are trying to understand if we can actually use this PR to extend our solution and add more metrics to this.

@kevin-bates
Copy link
Member

If you move to the master branch (where we've removed EG's dependency on Kernel Gateway), you should have the ability to get the /metrics endpoint exposed. I suspect this would consist of the similar approach used in https://github.com/jupyter/enterprise_gateway/blob/master/enterprise_gateway/base/handlers.py where the various mixins get added into the class derivation and the handler then essentially derives from Notebook's PrometheusMetricsHandler - similar to all the other handlers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants