-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Showing admins infos on a specific admin port #23
Conversation
I'm not sure what this pull request does. Why should connections from localhost be trusted? Why is binding to localhost better than BTTP basic over TLS? And why do you want to enable profiling in production? |
On a server, with restricted access, admin website on localhost is not so ugly. OK, I'll add password. How can I do load testing with webrtc, without real user and real flappy internet connection ? OK, lets add a flag. I'm focusing on stats and metrics. I can't host a service without metrics.
|
The POC is done. Do you think that proposed features are useful and this PR is welcome ? What metrics should be exposed in a first iteration ? |
I'm sorry, but I still don't understand what problems this PR is aimed at solving. |
Managing galene serverFor hosting an Internet service, you need observability. LoggingLogging exposes events (with timestamp) and error. Application statesRooms with clients and their channels are specific to Galene, and very useful to know the context (connection performance, codec used…) and real usage of th service. MetricsMetrics are emitter for time series databases. It can be used for graphing (with the all mighty Grafana) and monitoring and raising allerts. Prometheus is one of the time series databases, but its export format is the de facto standard, readable by all time series. Prometheus export is a HTTP endpoint, displaying metrics with tag and values. They are two formats, a plain text with lots of comments and a compact binary format. The golang library handles the different kind of counter, the HTTP handler, and the two export formats. PProfpprof is the golang tool for understanding runtime behavior. SecurityMetrics, states and logging are for private eyes. You don't know how its scale and if this information can be used for evil purpose. Using a distinct port from main application port, and listening by default localhost is a first level of protection. Authenticating access with a password (using galene tool for hashing password) is a second level. Optional admin (you don't have to always run admin endpoint) is a third level. This Pull RequestLogging is out of the scope of this PR.
pprof are exposed with its standard handlers from Prometheus endpoint came from Authentication is done with handmade Settings are done with standard |
I have read your pull request, and I understand what it does. I don't
understand why it is useful.
* We all agree that better logging is a good idea. Help with improving
logging is welcome, but there's nothing about logging in your pull request.
* Group stats: this needs to be carefully balanced with the users' right
to privacy. There are good reasons why Galène doesn't export connected
usernames or IP addresses.
* I understand what Prometheus is. I do not understand what particular
data your patch exports in Prometheus format.
* Galène is already exposing pprof data (through a set of command-line
options). We already have a good understanding of where the CPU time
goes, we have a good understanding of where memory allocation happens.
Please explain why you believe that additionally exposing CPU time
statistics over HTTP will help improving Galène.
I am sorry, Mathieu, but I will not apply your patches unless you can
explain what is their purpose. Please do not paraphrase the code, but
explain *why* this is necessary.
|
ok, I'm focusing on prometheus exports. |
pprof http endpoint is removed. |
I suggest some counters :
|
For now, I have that :
|
I have to say that I still don't understand what kind of insights you expect to get from this kind of statistics. Also, I'm not convinced with the approach of maintaining a running count of items, it's error-prone and difficult to maintain. Wouldn't it be simpler to count items of interest on demand, as is done in stats.go? Please be aware that every call to |
This metrics are suggestion, high level one are mandatory (number of websocket, opened rooms, opened channel, dropped packets and everything linked to quality), low level, like cache usage are clearly optionnal. stats API are useful when you are part of the discussion, inside the chat room. Metrics, from the outside, are needed for monitoring the service (the quality is too low : maybe the network is involved, the CPU is saturated?), what happens when user stop using H.264 for A1? How many users, how many rooms can I handle with this server? What server is needed for handling a massive conference? Is there jsut one user flooding my service with its 4k video upstream? What can I do when the server is saturated, just restart the service? How di I know the quality service and if users are happy? |
This reverts commit de55c28.
I'm sorry, I still don't understand. In what way are any of the metrics that you implemented related to service quality? Here are a few examples of useful metrics:
I could be mistaken, but I don't see any code in your submission that relates to any of the above. |
There is different kind of metrics. QualityI'm OK to implement your suggestions Usage and capacity planningCPU and memory are monitored from outside, through cgroups. Network usage can be monitored. Some basic usage number, like user/room/channel for guessing how many CPU/RAM I need for more users (aka capacity planning) What do other SFUs ?
Galene metricsWhich additional metrics seem useful for Galene ? |
A private http server for debugging purpose, performance analysis, or monitoring.
It listens on localhost, and should not be routed on Internet.
It exposes :