-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unprocessed video metric #12
Comments
I'm afraid BBB's API doesn't expose that information; http://docs.bigbluebutton.org/dev/api.html#getrecordings |
You are absolutely right... I added this feature via netdata module; (I know it's unrelated to this project, but if someone wonders..., and yes I know, the code is not pretty)
|
This looks interesting and potentially useful. Could you maybe fix the formatting of the code? In the wysiwyg editor on GitHub select insert code and copy & paste the code with proper formatting. |
Updated the post. The coding and naming should be updated. (DIR to config file, counting only done files etc.). With this setting netdata_bbb_videos_average gives the unprocessed file count. On second thought, maybe a netdata module for all BBB data can be writen... |
Ill take a look at this during the weekend, maybe we can modify this and bundle it together with the exporter. A quick look at the code, I suspect netdata is not really required for this to function since you are just checking how many files there are in a specific directory. We could accomplish this directly in the exporter with a read only volume mount to /var/bigbluebutton/recording. What do you think? |
Sorry GS, did not notice the reply you gave. You are right. Code's main block is indipendent from netdata. It simply counts the files under the /var/bigbluebutton/recording/status/sanity directory. Normally this folder consists of ".done" files. When processing is done, done ".done" file is removed. |
Ah, I knew I forgot something. I'll reopen this so I won't forget. I'll take a deep look at this next week when my schedule clears up a bit. Great insight regarding the |
In your use case, what is the size of this metric? I'm not too thrilled about adding a metric that bypasses the API and accesses the filesystem of BBB. Although I can see this metric being optional down the road when we will (probably) need to optimize the currently slow recordings metrics - we will probably gather this information directly from the filesystem of BBB. |
I have 18 BBB servers, which is managed by Scalelite. Some of the servers' unprocessed queue can go up to ~150 in a day. (If soo, I temproraryly get that server "disabled", then run multiple concurent video process jobs. And I understand the dilemma. Now your BBB-exporter does not have to be on the same server with BBB. But if you handle via FS, it will be "required". |
18 BBB servers! That's quite a lot to manage 😅 . Are these running on older HW or do you have so many concurrent users? 150 recordings for the unprocessed queue per server? That's quite a lot, how many recordings do you have per day (per server)? In our use case we didn't even notice the need for this metric (~230 total recordings in total). If the need is there for this metric, which you most probably need I can implement it as an optional metric which requires that you install the exporter on the same machine (or rather it has access to the FS of BBB).
Correct, you can run the exporter for example on your personal machine (during development) or on the monitoring server if that's what you prefer. But like I mentioned if the need is there we can implement it in the coming days as an optional metric. Would be great if you could test it out before making it official. |
1 DB, 1 Moodle and 1 Scalelite. 21 total. They are not old HW. They each 8 core VPS'es serving for an entire university's online course system via distance learning mechanisms due to Covid-19 outbreak. Concurrently I had meetings more than 250 and participants over 4000.
For last 45 days, we had ~18500 recordings which are bigger than 5 minutes (recordings smaller than 5 minutes are automaticly "archived". According to calculations; server per day average is around 40 (including the archived ones).
That's right. each server has it's own bbb-expoter and netdata. :)
It would be my honor to test it :) |
4000 participants with 250 rooms and On our (non-scalelite) BBB installation we are nearing 10 seconds for the recording metrics. This is due to BBB's API response which returns loads of data (entire object for each recording), just so we can perform a simple count. This led to the implementation of the env var RECORDINGS_METRICS in cases where massive metrics collection delays are unacceptable. I'm sure we could optimize this by counting ourselves the appropriate files on disk, making it a good technical reason for the exporter to be installed on the BBB server and we could implement the unprocessed recordings metric as a byproduct of this optimization. As I understand it Scalelite helps mitigate this problem by caching certain things or am I wrong? Besides your API response times (especially for recordings) how long does it take for the exporter to give you a response (so the duration of the /metrics request)? |
First just to be on the same page; record count is 18.500. I will add a server's output at the end of this message. But I can say that, I had to tweak prometheus spool time and timeout values. Currently it takes approx. a minute (some times a little less, some times a little more) to get the results.
That makes sense of the latency.
That sounds good and reasonable.
If you are refering to "Unprocessed Video Count", I could say yes. Scalelite only scales the meetings according to the "on going meeting count". The meeting participants are not taken to consideration (with next versions, they will, as they say). Soo sometimes a servers unprocessed queue can get bigger (because of long meetings etc). As for api response time, Prometheus reports that scape duration is between 40 seconds to 80 seconds.
|
I just made a preview build with an optimization that computes the number of published and deleted recordings from disk (/var/bigbluebutton) and not via API. In my case this substantially decreased the metric scrape time, see the graph bellow: I also added the requested optional metric: bbb_recordings_unprocessed. @penguennoktanet would love your feedback. I have updated all the Grafana dashboards and the documentation. All changes are currently on the If you have installed the exporter using docker, a minor change is required, see the example docker-compose file. |
Don't think that I missed this. I will try it tomorrow and get back to you. On second thought; I had some spare time and wanted to try... Some good news on here.. I tested it on my bbb01 server which has 1500+ recordings published (1596 to be exact), unprocessed 9, etc. Every thing seems to be file. The other issue of published record count=0 is (link) also fixed. For the response times, it is droped to 1/10. Before it was aroung ~1 Minute. (varried between 52 seconds - 59 seconds). Now it is 6 seconds total. Huge performance leap. I think, I will update all my servers with this version. :) PS: bbb_recordings_unprocessed seems to be fine too. |
Great news, I'll let you upgrade all your servers and wait a day or two if any issues pop up. If not I'll merge into master and make a proper release 👍 . |
6 seconds is still though pretty long for a simple count. Does your API 95th percentile latency show any "long" (i.e. 1 second +) requests? I wonder if this is a counting issue (counting the number of files in the directory) or an API response + parsing issue. |
Here is more data; scrape duration is taken from prometheus's target's link. Meetings are continuing in this duration. Soo probably, your code works fine (1563/1.9s, 1652/2.48s). High response time, don't know the reason yet. |
In Grafana (either server instance or all servers dashboard) can you check the API 95th percentile latency panel, it breaks down the time it takes to request + parse a specific API endpoint. I wonder if there is another recordings metric that we are not reading from disk that is taking up so much time. Currently only published. deleted and unprocessed are calculated from disk. Processing, processed, unpublished are still being requested via API. |
Sorry, first I did not understand the metrics you requested. We are using customized grafana output. It seems like the delay is caused by getRecordings processing and gerRecordings processed section. Raw output of exporter is below;
|
Hmm interesting. I'm going to implement scraping |
The latest branch Replacing the version in the I'm currently leaning on removing the |
You are right. My processing and processed list are exactly same (tested via BBB Api). And, good question. I do not know the difference. I asked it to bigbluebutton-setup group. If an answer arrives, I'll let you know. Also, I upgraded to preview2 release. The scrape time of the servers which has big processing values almost got to half of preview1 (probably because of processed metric). As for removing processed metric; I do not watch that metric on my systems. So it does not bother me if it's gone. But, again, I do not know what it "means"... Maybe someone will miss it. |
A brief search regarding
Source: bigbluebutton/bigbluebutton#5151 (comment) Which doesn't appear to be the case 🤔 |
That would be the case.. Some times, I stop BBB meetings on spesific servers to "catch up with the queue". In that time, I run parallel bbb-rap-process-worker scripts (one works with timestamps desc order and the other asc order). The videos on processing/processed status seems to be published already. |
Just released v0.4.0-preview3. It has the |
If there won't be any bug reports I'll merge into master in a day or two. @penguennoktanet if you have a custom dashboard, you might wan't to import the newest one (without overwriting your customized one) and see if there are any new panels to copy into your customized dashboard. The dashboards were quite heavily changed since the first version. |
I am using preview2 for 3 days. Notting seems to be broken.
I checked them. Thanks for the tip. |
Implemented in release v0.4.0 |
Unprocessed video count (the videos on queue which are waiting to be processed) would be fine..
The text was updated successfully, but these errors were encountered: