Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unprocessed video metric #12

Closed
penguennoktanet opened this issue Apr 15, 2020 · 29 comments
Closed

Unprocessed video metric #12

penguennoktanet opened this issue Apr 15, 2020 · 29 comments
Labels
enhancement New feature or request

Comments

@penguennoktanet
Copy link

Unprocessed video count (the videos on queue which are waiting to be processed) would be fine..

@greenstatic
Copy link
Owner

I'm afraid BBB's API doesn't expose that information; http://docs.bigbluebutton.org/dev/api.html#getrecordings

@penguennoktanet
Copy link
Author

penguennoktanet commented Apr 16, 2020

You are absolutely right...

I added this feature via netdata module; (I know it's unrelated to this project, but if someone wonders..., and yes I know, the code is not pretty)

# -*- coding: utf-8 -*-

import os, os.path
from bases.FrameworkServices.SimpleService import SimpleService

priority = 90000
ORDER = [
    'bbb_unprocessed',
]

CHARTS = {
    'bbb_unprocessed': {
        'options': [None, 'Unprocessed Video Count', 'videos', 'bbb', 'bbb', 'line'],
        'lines': [
            ['bbb_unprocessed']
        ]
    }
}

class Service(SimpleService):
    def __init__(self, configuration=None, name=None):
        SimpleService.__init__(self, configuration=configuration, name=name)
        self.order = ORDER
        self.definitions = CHARTS

    @staticmethod
    def check():
        return True

    def get_data(self):
        data = dict()
        DIR = '/var/bigbluebutton/recording/status/sanity'
        count=len([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])
        dimension_id = 'unprocessed'
        if dimension_id not in self.charts['bbb_unprocessed']:
            self.charts['bbb_unprocessed'].add_dimension([dimension_id])
        data[dimension_id] = count

        return data`

@greenstatic
Copy link
Owner

This looks interesting and potentially useful. Could you maybe fix the formatting of the code?

In the wysiwyg editor on GitHub select insert code and copy & paste the code with proper formatting.

@penguennoktanet
Copy link
Author

Updated the post.

The coding and naming should be updated. (DIR to config file, counting only done files etc.). With this setting netdata_bbb_videos_average gives the unprocessed file count.

On second thought, maybe a netdata module for all BBB data can be writen...

@greenstatic
Copy link
Owner

Ill take a look at this during the weekend, maybe we can modify this and bundle it together with the exporter. A quick look at the code, I suspect netdata is not really required for this to function since you are just checking how many files there are in a specific directory. We could accomplish this directly in the exporter with a read only volume mount to /var/bigbluebutton/recording. What do you think?

@penguennoktanet
Copy link
Author

Sorry GS, did not notice the reply you gave.

You are right. Code's main block is indipendent from netdata. It simply counts the files under the /var/bigbluebutton/recording/status/sanity directory. Normally this folder consists of ".done" files. When processing is done, done ".done" file is removed.

@greenstatic
Copy link
Owner

Ah, I knew I forgot something. I'll reopen this so I won't forget. I'll take a deep look at this next week when my schedule clears up a bit. Great insight regarding the .done files.

@greenstatic greenstatic reopened this May 1, 2020
@greenstatic greenstatic added the enhancement New feature or request label May 1, 2020
@greenstatic greenstatic changed the title New Metric request Unprocessed video metric May 8, 2020
@greenstatic
Copy link
Owner

In your use case, what is the size of this metric? I'm not too thrilled about adding a metric that bypasses the API and accesses the filesystem of BBB.

Although I can see this metric being optional down the road when we will (probably) need to optimize the currently slow recordings metrics - we will probably gather this information directly from the filesystem of BBB.

@penguennoktanet
Copy link
Author

In your use case, what is the size of this metric? I'm not too thrilled about adding a metric that bypasses the API and accesses the filesystem of BBB.

I have 18 BBB servers, which is managed by Scalelite. Some of the servers' unprocessed queue can go up to ~150 in a day. (If soo, I temproraryly get that server "disabled", then run multiple concurent video process jobs.

And I understand the dilemma. Now your BBB-exporter does not have to be on the same server with BBB. But if you handle via FS, it will be "required".

@greenstatic
Copy link
Owner

18 BBB servers! That's quite a lot to manage 😅 . Are these running on older HW or do you have so many concurrent users?

150 recordings for the unprocessed queue per server? That's quite a lot, how many recordings do you have per day (per server)? In our use case we didn't even notice the need for this metric (~230 total recordings in total). If the need is there for this metric, which you most probably need I can implement it as an optional metric which requires that you install the exporter on the same machine (or rather it has access to the FS of BBB).

And I understand the dilemma. Now your BBB-exporter does not have to be on the same server with BBB. But if you handle via FS, it will be "required".

Correct, you can run the exporter for example on your personal machine (during development) or on the monitoring server if that's what you prefer. But like I mentioned if the need is there we can implement it in the coming days as an optional metric.

Would be great if you could test it out before making it official.

@penguennoktanet
Copy link
Author

18 BBB servers! That's quite a lot to manage . Are these running on older HW or do you have so many concurrent users?

1 DB, 1 Moodle and 1 Scalelite. 21 total. They are not old HW. They each 8 core VPS'es serving for an entire university's online course system via distance learning mechanisms due to Covid-19 outbreak. Concurrently I had meetings more than 250 and participants over 4000.

150 recordings for the unprocessed queue per server? That's quite a lot, how many recordings do you have per day (per server)?

For last 45 days, we had ~18500 recordings which are bigger than 5 minutes (recordings smaller than 5 minutes are automaticly "archived". According to calculations; server per day average is around 40 (including the archived ones).

In our use case we didn't even notice the need for this metric (~230 total recordings in total). If the need is there for this metric, which you most probably need I can implement it as an optional metric which requires that you install the exporter on the same machine (or rather it has access to the FS of BBB).

That's right. each server has it's own bbb-expoter and netdata. :)

And I understand the dilemma. Now your BBB-exporter does not have to be on the same server with BBB. But if you handle via FS, it will be "required".

Correct, you can run the exporter for example on your personal machine (during development) or on the monitoring server if that's what you prefer. But like I mentioned if the need is there we can implement it in the coming days as an optional metric.

Would be great if you could test it out before making it official.

It would be my honor to test it :)

@greenstatic
Copy link
Owner

greenstatic commented May 13, 2020

4000 participants with 250 rooms and 185000 18 500 recordings in total - those are some pretty impressive figures. I'm really curious what your API response times are, especially for recordings metrics.

On our (non-scalelite) BBB installation we are nearing 10 seconds for the recording metrics. This is due to BBB's API response which returns loads of data (entire object for each recording), just so we can perform a simple count.

This led to the implementation of the env var RECORDINGS_METRICS in cases where massive metrics collection delays are unacceptable. I'm sure we could optimize this by counting ourselves the appropriate files on disk, making it a good technical reason for the exporter to be installed on the BBB server and we could implement the unprocessed recordings metric as a byproduct of this optimization.

As I understand it Scalelite helps mitigate this problem by caching certain things or am I wrong? Besides your API response times (especially for recordings) how long does it take for the exporter to give you a response (so the duration of the /metrics request)?

@penguennoktanet
Copy link
Author

4000 participants with 250 rooms and 185000 recordings in total - those are some pretty impressive figures. I'm really curious what your API response times are, especially for recordings metrics.

First just to be on the same page; record count is 18.500. I will add a server's output at the end of this message. But I can say that, I had to tweak prometheus spool time and timeout values. Currently it takes approx. a minute (some times a little less, some times a little more) to get the results.

On our (non-scalelite) BBB installation we are nearing 10 seconds for the recording metrics. This is due to BBB's API response which returns loads of data (entire object for each recording), just so we can perform a simple count.

That makes sense of the latency.

This led to the implementation of the env var RECORDINGS_METRICS in cases where massive metrics collection delays are unacceptable. I'm sure we could optimize this by counting ourselves the appropriate files on disk, making it a good technical reason for the exporter to be installed on the BBB server and we could implement the unprocessed recordings metric as a byproduct of this optimization.

That sounds good and reasonable.

As I understand it Scalelite helps mitigate this problem by caching certain things or am I wrong? Besides your API response times (especially for recordings) how long does it take for the exporter to give you a response (so the duration of the /metrics request)?

If you are refering to "Unprocessed Video Count", I could say yes. Scalelite only scales the meetings according to the "on going meeting count". The meeting participants are not taken to consideration (with next versions, they will, as they say). Soo sometimes a servers unprocessed queue can get bigger (because of long meetings etc).

As for api response time, Prometheus reports that scape duration is between 40 seconds to 80 seconds.

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 97.0
python_gc_objects_collected_total{generation="1"} 0.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 53.0
python_gc_collections_total{generation="1"} 4.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="7",patchlevel="7",version="3.7.7"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.4496512e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.3506944e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.58935611997e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 27.58
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 8.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP bbb_meetings Number of BigBlueButton meetings
# TYPE bbb_meetings gauge
bbb_meetings 0.0
# HELP bbb_meetings_participants Total number of participants in all BigBlueButton meetings
# TYPE bbb_meetings_participants gauge
bbb_meetings_participants 0.0
# HELP bbb_meetings_listeners Total number of listeners in all BigBlueButton meetings
# TYPE bbb_meetings_listeners gauge
bbb_meetings_listeners 0.0
# HELP bbb_meetings_voice_participants Total number of voice participants in all BigBlueButton meetings
# TYPE bbb_meetings_voice_participants gauge
bbb_meetings_voice_participants 0.0
# HELP bbb_meetings_video_participants Total number of video participants in all BigBlueButton meetings
# TYPE bbb_meetings_video_participants gauge
bbb_meetings_video_participants 0.0
# HELP bbb_recordings_processing Total number of BigBlueButton recordings processing
# TYPE bbb_recordings_processing gauge
bbb_recordings_processing 23.0
# HELP bbb_recordings_processed Total number of BigBlueButton recordings processed
# TYPE bbb_recordings_processed gauge
bbb_recordings_processed 23.0
# HELP bbb_recordings_published Total number of BigBlueButton recordings published
# TYPE bbb_recordings_published gauge
bbb_recordings_published 0.0
# HELP bbb_recordings_unpublished Total number of BigBlueButton recordings unpublished
# TYPE bbb_recordings_unpublished gauge
bbb_recordings_unpublished 0.0
# HELP bbb_recordings_deleted Total number of BigBlueButton recordings deleted
# TYPE bbb_recordings_deleted gauge
bbb_recordings_deleted 0.0
# HELP bbb_api_up 1 if BigBlueButton API is responding 0 otherwise
# TYPE bbb_api_up gauge
bbb_api_up 1.0
# HELP bbb_api_latency BigBlueButton API call latency
# TYPE bbb_api_latency histogram
bbb_api_latency_bucket{endpoint="getMeetings",le="0.01",parameters=""} 0.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.025",parameters=""} 0.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.05",parameters=""} 108.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.075",parameters=""} 136.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.1",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.25",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.5",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.75",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="1.0",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="1.25",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="1.5",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="1.75",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="2.0",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="2.5",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="5.0",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="7.5",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="10.0",parameters=""} 138.0
bbb_api_latency_bucket{endpoint="getMeetings",le="+Inf",parameters=""} 138.0
bbb_api_latency_count{endpoint="getMeetings",parameters=""} 138.0
bbb_api_latency_sum{endpoint="getMeetings",parameters=""} 5.919447660446167
bbb_api_latency_bucket{endpoint="getRecordings",le="0.01",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.025",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.05",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.075",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.1",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.25",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.5",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.75",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.0",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.25",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.5",parameters="state=processing"} 1.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.75",parameters="state=processing"} 9.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.0",parameters="state=processing"} 42.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.5",parameters="state=processing"} 96.0
bbb_api_latency_bucket{endpoint="getRecordings",le="5.0",parameters="state=processing"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="7.5",parameters="state=processing"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="10.0",parameters="state=processing"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="+Inf",parameters="state=processing"} 138.0
bbb_api_latency_count{endpoint="getRecordings",parameters="state=processing"} 138.0
bbb_api_latency_sum{endpoint="getRecordings",parameters="state=processing"} 333.4787197113037
bbb_api_latency_bucket{endpoint="getRecordings",le="0.01",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.025",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.05",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.075",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.1",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.25",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.5",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.75",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.0",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.25",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.5",parameters="state=processed"} 2.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.75",parameters="state=processed"} 20.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.0",parameters="state=processed"} 51.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.5",parameters="state=processed"} 114.0
bbb_api_latency_bucket{endpoint="getRecordings",le="5.0",parameters="state=processed"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="7.5",parameters="state=processed"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="10.0",parameters="state=processed"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="+Inf",parameters="state=processed"} 138.0
bbb_api_latency_count{endpoint="getRecordings",parameters="state=processed"} 138.0
bbb_api_latency_sum{endpoint="getRecordings",parameters="state=processed"} 304.65675616264343
bbb_api_latency_bucket{endpoint="getRecordings",le="0.01",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.025",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.05",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.075",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.1",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.25",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.5",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.75",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.0",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.25",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.5",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.75",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.0",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.5",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="5.0",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="7.5",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="10.0",parameters="state=published"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="+Inf",parameters="state=published"} 138.0
bbb_api_latency_count{endpoint="getRecordings",parameters="state=published"} 138.0
bbb_api_latency_sum{endpoint="getRecordings",parameters="state=published"} 8288.475729703903
bbb_api_latency_bucket{endpoint="getRecordings",le="0.01",parameters="state=unpublished"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.025",parameters="state=unpublished"} 1.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.05",parameters="state=unpublished"} 103.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.075",parameters="state=unpublished"} 130.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.1",parameters="state=unpublished"} 137.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.25",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.5",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.75",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.0",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.25",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.5",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.75",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.0",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.5",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="5.0",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="7.5",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="10.0",parameters="state=unpublished"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="+Inf",parameters="state=unpublished"} 138.0
bbb_api_latency_count{endpoint="getRecordings",parameters="state=unpublished"} 138.0
bbb_api_latency_sum{endpoint="getRecordings",parameters="state=unpublished"} 6.3337719440460205
bbb_api_latency_bucket{endpoint="getRecordings",le="0.01",parameters="state=deleted"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.025",parameters="state=deleted"} 2.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.05",parameters="state=deleted"} 104.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.075",parameters="state=deleted"} 133.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.1",parameters="state=deleted"} 136.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.25",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.5",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.75",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.0",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.25",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.5",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.75",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.0",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.5",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="5.0",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="7.5",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="10.0",parameters="state=deleted"} 138.0
bbb_api_latency_bucket{endpoint="getRecordings",le="+Inf",parameters="state=deleted"} 138.0
bbb_api_latency_count{endpoint="getRecordings",parameters="state=deleted"} 138.0
bbb_api_latency_sum{endpoint="getRecordings",parameters="state=deleted"} 6.202455997467041

@greenstatic
Copy link
Owner

I just made a preview build with an optimization that computes the number of published and deleted recordings from disk (/var/bigbluebutton) and not via API. In my case this substantially decreased the metric scrape time, see the graph bellow:

Screen Shot 2020-05-18 at 7 54 56 PM

I also added the requested optional metric: bbb_recordings_unprocessed.

@penguennoktanet would love your feedback.

I have updated all the Grafana dashboards and the documentation. All changes are currently on the v0.4.0 branch. The preview docker image container is tagged: v0.4.0-preview.

If you have installed the exporter using docker, a minor change is required, see the example docker-compose file.
Make sure you have RECORDINGS_METRICS_READ_FROM_DISK set to true.

@penguennoktanet
Copy link
Author

penguennoktanet commented May 19, 2020

I just made a preview build with an optimization that computes the number of published and deleted recordings from disk (/var/bigbluebutton) and not via API. In my case this substantially decreased the metric scrape time, see the graph bellow:
I also added the requested optional metric: bbb_recordings_unprocessed.
@penguennoktanet would love your feedback.
Make sure you have RECORDINGS_METRICS_READ_FROM_DISK set to true.

Don't think that I missed this. I will try it tomorrow and get back to you.

On second thought; I had some spare time and wanted to try... Some good news on here..

I tested it on my bbb01 server which has 1500+ recordings published (1596 to be exact), unprocessed 9, etc. Every thing seems to be file. The other issue of published record count=0 is (link) also fixed.

For the response times, it is droped to 1/10. Before it was aroung ~1 Minute. (varried between 52 seconds - 59 seconds). Now it is 6 seconds total. Huge performance leap.

I think, I will update all my servers with this version. :)

PS: bbb_recordings_unprocessed seems to be fine too.

@greenstatic
Copy link
Owner

Great news, I'll let you upgrade all your servers and wait a day or two if any issues pop up. If not I'll merge into master and make a proper release 👍 .
Thanks for your input!

@greenstatic
Copy link
Owner

6 seconds is still though pretty long for a simple count. Does your API 95th percentile latency show any "long" (i.e. 1 second +) requests? I wonder if this is a counting issue (counting the number of files in the directory) or an API response + parsing issue.

@penguennoktanet
Copy link
Author

penguennoktanet commented May 19, 2020

your API 95th percentile latency

Here is more data;
1949 published video, 17.92s scrape duration
1688 published, 6.53s scrape
1652 published, 2.48s scrape
1602 published, 5.8s scrape
1563 published, 1.9s scrape

scrape duration is taken from prometheus's target's link. Meetings are continuing in this duration.

Soo probably, your code works fine (1563/1.9s, 1652/2.48s). High response time, don't know the reason yet.

@greenstatic
Copy link
Owner

In Grafana (either server instance or all servers dashboard) can you check the API 95th percentile latency panel, it breaks down the time it takes to request + parse a specific API endpoint. I wonder if there is another recordings metric that we are not reading from disk that is taking up so much time.

Currently only published. deleted and unprocessed are calculated from disk. Processing, processed, unpublished are still being requested via API.

@penguennoktanet
Copy link
Author

Sorry, first I did not understand the metrics you requested. We are using customized grafana output. It seems like the delay is caused by getRecordings processing and gerRecordings processed section. Raw output of exporter is below;

# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 97.0
python_gc_objects_collected_total{generation="1"} 0.0
python_gc_objects_collected_total{generation="2"} 255.0
# HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 912.0
python_gc_collections_total{generation="1"} 82.0
python_gc_collections_total{generation="2"} 7.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="7",patchlevel="7",version="3.7.7"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 3.47004928e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.859008e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.58989304445e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 132.08
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 6.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP bbb_meetings Number of BigBlueButton meetings
# TYPE bbb_meetings gauge
bbb_meetings 1.0
# HELP bbb_meetings_participants Total number of participants in all BigBlueButton meetings
# TYPE bbb_meetings_participants gauge
bbb_meetings_participants 29.0
# HELP bbb_meetings_listeners Total number of listeners in all BigBlueButton meetings
# TYPE bbb_meetings_listeners gauge
bbb_meetings_listeners 27.0
# HELP bbb_meetings_voice_participants Total number of voice participants in all BigBlueButton meetings
# TYPE bbb_meetings_voice_participants gauge
bbb_meetings_voice_participants 1.0
# HELP bbb_meetings_video_participants Total number of video participants in all BigBlueButton meetings
# TYPE bbb_meetings_video_participants gauge
bbb_meetings_video_participants 0.0
# HELP bbb_meetings_participant_clients Total number of participants in all BigBlueButton meetings by client
# TYPE bbb_meetings_participant_clients gauge
bbb_meetings_participant_clients{type="html5"} 29.0
bbb_meetings_participant_clients{type="dial-in"} 0.0
bbb_meetings_participant_clients{type="flash"} 0.0
# HELP bbb_recordings_processing Total number of BigBlueButton recordings processing
# TYPE bbb_recordings_processing gauge
bbb_recordings_processing 223.0
# HELP bbb_recordings_processed Total number of BigBlueButton recordings processed
# TYPE bbb_recordings_processed gauge
bbb_recordings_processed 223.0
# HELP bbb_recordings_unpublished Total number of BigBlueButton recordings unpublished
# TYPE bbb_recordings_unpublished gauge
bbb_recordings_unpublished 0.0
# HELP bbb_recordings_published Total number of BigBlueButton recordings published (scraped from disk)
# TYPE bbb_recordings_published gauge
bbb_recordings_published 1969.0
# HELP bbb_recordings_deleted Total number of BigBlueButton recordings deleted (scraped from disk)
# TYPE bbb_recordings_deleted gauge
bbb_recordings_deleted 6.0
# HELP bbb_recordings_unprocessed Total number of BigBlueButton recordings enqueued to be processed (scraped from disk)
# TYPE bbb_recordings_unprocessed gauge
bbb_recordings_unprocessed 1.0
# HELP bbb_api_latency BigBlueButton API call latency
# TYPE bbb_api_latency histogram
bbb_api_latency_bucket{endpoint="getMeetings",le="0.01",parameters=""} 0.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.025",parameters=""} 1.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.05",parameters=""} 432.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.075",parameters=""} 467.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.1",parameters=""} 470.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.25",parameters=""} 472.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.5",parameters=""} 472.0
bbb_api_latency_bucket{endpoint="getMeetings",le="0.75",parameters=""} 472.0
bbb_api_latency_bucket{endpoint="getMeetings",le="1.0",parameters=""} 472.0
bbb_api_latency_bucket{endpoint="getMeetings",le="1.25",parameters=""} 472.0
bbb_api_latency_bucket{endpoint="getMeetings",le="1.5",parameters=""} 472.0
bbb_api_latency_bucket{endpoint="getMeetings",le="1.75",parameters=""} 472.0
bbb_api_latency_bucket{endpoint="getMeetings",le="2.0",parameters=""} 472.0
bbb_api_latency_bucket{endpoint="getMeetings",le="2.5",parameters=""} 472.0
bbb_api_latency_bucket{endpoint="getMeetings",le="5.0",parameters=""} 473.0
bbb_api_latency_bucket{endpoint="getMeetings",le="7.5",parameters=""} 473.0
bbb_api_latency_bucket{endpoint="getMeetings",le="10.0",parameters=""} 473.0
bbb_api_latency_bucket{endpoint="getMeetings",le="+Inf",parameters=""} 473.0
bbb_api_latency_count{endpoint="getMeetings",parameters=""} 473.0
bbb_api_latency_sum{endpoint="getMeetings",parameters=""} 22.302513122558594
bbb_api_latency_bucket{endpoint="getRecordings",le="0.01",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.025",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.05",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.075",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.1",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.25",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.5",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.75",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.0",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.25",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.5",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.75",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.0",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.5",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="5.0",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="7.5",parameters="state=processing"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="10.0",parameters="state=processing"} 247.0
bbb_api_latency_bucket{endpoint="getRecordings",le="+Inf",parameters="state=processing"} 473.0
bbb_api_latency_count{endpoint="getRecordings",parameters="state=processing"} 473.0
bbb_api_latency_sum{endpoint="getRecordings",parameters="state=processing"} 4806.061728715897
bbb_api_latency_bucket{endpoint="getRecordings",le="0.01",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.025",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.05",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.075",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.1",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.25",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.5",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.75",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.0",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.25",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.5",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.75",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.0",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.5",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="5.0",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="7.5",parameters="state=processed"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="10.0",parameters="state=processed"} 222.0
bbb_api_latency_bucket{endpoint="getRecordings",le="+Inf",parameters="state=processed"} 473.0
bbb_api_latency_count{endpoint="getRecordings",parameters="state=processed"} 473.0
bbb_api_latency_sum{endpoint="getRecordings",parameters="state=processed"} 4947.400803565979
bbb_api_latency_bucket{endpoint="getRecordings",le="0.01",parameters="state=unpublished"} 0.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.025",parameters="state=unpublished"} 3.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.05",parameters="state=unpublished"} 439.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.075",parameters="state=unpublished"} 465.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.1",parameters="state=unpublished"} 467.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.25",parameters="state=unpublished"} 468.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.5",parameters="state=unpublished"} 468.0
bbb_api_latency_bucket{endpoint="getRecordings",le="0.75",parameters="state=unpublished"} 468.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.0",parameters="state=unpublished"} 469.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.25",parameters="state=unpublished"} 469.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.5",parameters="state=unpublished"} 469.0
bbb_api_latency_bucket{endpoint="getRecordings",le="1.75",parameters="state=unpublished"} 469.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.0",parameters="state=unpublished"} 469.0
bbb_api_latency_bucket{endpoint="getRecordings",le="2.5",parameters="state=unpublished"} 469.0
bbb_api_latency_bucket{endpoint="getRecordings",le="5.0",parameters="state=unpublished"} 471.0
bbb_api_latency_bucket{endpoint="getRecordings",le="7.5",parameters="state=unpublished"} 472.0
bbb_api_latency_bucket{endpoint="getRecordings",le="10.0",parameters="state=unpublished"} 472.0
bbb_api_latency_bucket{endpoint="getRecordings",le="+Inf",parameters="state=unpublished"} 473.0
bbb_api_latency_count{endpoint="getRecordings",parameters="state=unpublished"} 473.0
bbb_api_latency_sum{endpoint="getRecordings",parameters="state=unpublished"} 43.180294036865234
# HELP bbb_api_up 1 if BigBlueButton API is responding 0 otherwise
# TYPE bbb_api_up gauge
bbb_api_up 1.0
# HELP bbb_room_participants BigBlueButton room participants histogram gauge
# TYPE bbb_room_participants histogram
bbb_room_participants_bucket{le="0"} 0.0
bbb_room_participants_bucket{le="1"} 0.0
bbb_room_participants_bucket{le="5"} 0.0
bbb_room_participants_bucket{le="15"} 0.0
bbb_room_participants_bucket{le="30"} 1.0
bbb_room_participants_bucket{le="60"} 1.0
bbb_room_participants_bucket{le="90"} 1.0
bbb_room_participants_bucket{le="120"} 1.0
bbb_room_participants_bucket{le="150"} 1.0
bbb_room_participants_bucket{le="200"} 1.0
bbb_room_participants_bucket{le="250"} 1.0
bbb_room_participants_bucket{le="300"} 1.0
bbb_room_participants_bucket{le="400"} 1.0
bbb_room_participants_bucket{le="500"} 1.0
bbb_room_participants_bucket{le="+Inf"} 1.0
# TYPE bbb_room_participants_gcount gauge
bbb_room_participants_gcount 1.0
# TYPE bbb_room_participants_gsum gauge
bbb_room_participants_gsum 29.0
# HELP bbb_room_listeners BigBlueButton room listeners histogram gauge
# TYPE bbb_room_listeners histogram
bbb_room_listeners_bucket{le="0"} 0.0
bbb_room_listeners_bucket{le="1"} 0.0
bbb_room_listeners_bucket{le="5"} 0.0
bbb_room_listeners_bucket{le="15"} 0.0
bbb_room_listeners_bucket{le="30"} 1.0
bbb_room_listeners_bucket{le="60"} 1.0
bbb_room_listeners_bucket{le="90"} 1.0
bbb_room_listeners_bucket{le="120"} 1.0
bbb_room_listeners_bucket{le="150"} 1.0
bbb_room_listeners_bucket{le="200"} 1.0
bbb_room_listeners_bucket{le="250"} 1.0
bbb_room_listeners_bucket{le="300"} 1.0
bbb_room_listeners_bucket{le="400"} 1.0
bbb_room_listeners_bucket{le="500"} 1.0
bbb_room_listeners_bucket{le="+Inf"} 1.0
# TYPE bbb_room_listeners_gcount gauge
bbb_room_listeners_gcount 1.0
# TYPE bbb_room_listeners_gsum gauge
bbb_room_listeners_gsum 27.0
# HELP bbb_room_voice_participants BigBlueButton room voice participants histogram gauge
# TYPE bbb_room_voice_participants histogram
bbb_room_voice_participants_bucket{le="0"} 0.0
bbb_room_voice_participants_bucket{le="1"} 1.0
bbb_room_voice_participants_bucket{le="5"} 1.0
bbb_room_voice_participants_bucket{le="15"} 1.0
bbb_room_voice_participants_bucket{le="30"} 1.0
bbb_room_voice_participants_bucket{le="60"} 1.0
bbb_room_voice_participants_bucket{le="90"} 1.0
bbb_room_voice_participants_bucket{le="120"} 1.0
bbb_room_voice_participants_bucket{le="+Inf"} 1.0
# TYPE bbb_room_voice_participants_gcount gauge
bbb_room_voice_participants_gcount 1.0
# TYPE bbb_room_voice_participants_gsum gauge
bbb_room_voice_participants_gsum 1.0
# HELP bbb_room_video_participants BigBlueButton room video participants histogram gauge
# TYPE bbb_room_video_participants histogram
bbb_room_video_participants_bucket{le="0"} 1.0
bbb_room_video_participants_bucket{le="1"} 1.0
bbb_room_video_participants_bucket{le="5"} 1.0
bbb_room_video_participants_bucket{le="15"} 1.0
bbb_room_video_participants_bucket{le="30"} 1.0
bbb_room_video_participants_bucket{le="60"} 1.0
bbb_room_video_participants_bucket{le="90"} 1.0
bbb_room_video_participants_bucket{le="120"} 1.0
bbb_room_video_participants_bucket{le="+Inf"} 1.0
# TYPE bbb_room_video_participants_gcount gauge
bbb_room_video_participants_gcount 1.0
# TYPE bbb_room_video_participants_gsum gauge
bbb_room_video_participants_gsum 0.0
# HELP bbb_exporter BigBlueButton Exporter version
# TYPE bbb_exporter gauge
bbb_exporter{version="0.4.0-preview"} 1.0

@greenstatic
Copy link
Owner

Hmm interesting. I'm going to implement scraping processing recordings count from disk to improve this. I'm however a bit confused regarding processed - it appears to be equal to processing which is strange. Do you by any chance know the difference? Otherwise I'm going to remove the metric.

@greenstatic
Copy link
Owner

The latest branch v0.4.0 contains the enhancement with processing recordings being scraped from disk rather than via API when the option is turned. The Docker build is tagged as v0.4.0-preview2.

Replacing the version in the docker-compose.yaml file and issuing a sudo docker-compose up -d should be enough to update.

I'm currently leaning on removing the processed metric, any arguments against this?

@penguennoktanet
Copy link
Author

Hmm interesting. I'm going to implement scraping processing recordings count from disk to improve this. I'm however a bit confused regarding processed - it appears to be equal to processing which is strange. Do you by any chance know the difference? Otherwise I'm going to remove the metric.

You are right. My processing and processed list are exactly same (tested via BBB Api).

And, good question. I do not know the difference. I asked it to bigbluebutton-setup group. If an answer arrives, I'll let you know.

Also, I upgraded to preview2 release. The scrape time of the servers which has big processing values almost got to half of preview1 (probably because of processed metric).

As for removing processed metric; I do not watch that metric on my systems. So it does not bother me if it's gone. But, again, I do not know what it "means"... Maybe someone will miss it.

@greenstatic
Copy link
Owner

A brief search regarding processed reveled:

which BTW it is an state that last only for an instant, unless the recording fails in the process.

Source: bigbluebutton/bigbluebutton#5151 (comment)

Which doesn't appear to be the case 🤔

@penguennoktanet
Copy link
Author

A brief search regarding processed reveled:

which BTW it is an state that last only for an instant, unless the recording fails in the process.

Source: bigbluebutton/bigbluebutton#5151 (comment)

Which doesn't appear to be the case

That would be the case.. Some times, I stop BBB meetings on spesific servers to "catch up with the queue". In that time, I run parallel bbb-rap-process-worker scripts (one works with timestamps desc order and the other asc order). The videos on processing/processed status seems to be published already.

@greenstatic
Copy link
Owner

Just released v0.4.0-preview3.

It has the processed recording state metric removed. It should increase the metric scrape latency a bit.

@greenstatic
Copy link
Owner

If there won't be any bug reports I'll merge into master in a day or two.

@penguennoktanet if you have a custom dashboard, you might wan't to import the newest one (without overwriting your customized one) and see if there are any new panels to copy into your customized dashboard. The dashboards were quite heavily changed since the first version.

@penguennoktanet
Copy link
Author

If there won't be any bug reports I'll merge into master in a day or two.

I am using preview2 for 3 days. Notting seems to be broken.

@penguennoktanet if you have a custom dashboard, you might wan't to import the newest one (without overwriting your customized one) and see if there are any new panels to copy into your customized dashboard. The dashboards were quite heavily changed since the first version.

I checked them. Thanks for the tip.

@greenstatic
Copy link
Owner

Implemented in release v0.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants