[TRACKING] Enhanced libs stats / metrics #1347

incertum · 2023-09-12T07:52:09Z

Motivation

Have enhanced 24/7 production visibility into Falco's software functioning to assess what may be causing high CPU or memory usage or possible memory leaks.

Feature

Expose additional existing libsinsp stats and add more relevant stats. Ultimately, the goal is to make them available in Falco metrics.

This will help us better debug CPU and memory usage of Falco or custom libs clients in production, especially because periodic metrics snapshots can be taken 24/7. Running Falco in special debug mode is more difficult in production for various reasons.

This issue will track possible metrics to be added to augment the substantial amount of metrics and counters that we already expose to consumers:

libsinsp state:

Here are some existing counters from the results of string searching for #ifdef GATHER_INTERNAL_STATS that we should evaluate:

sinsp_thread_manager

overall server load

In addition to the recently added CPU and memory usage snapshot metrics, we should also expose the following:

Overall server CPU usage
Overall server memory usage
The total number of currently running threads on the server, which serves as the ground truth for assessing the stability of our libsinsp state cache

The text was updated successfully, but these errors were encountered:

Andreagit97 · 2023-09-12T11:23:49Z

This seems a reasonable point! I'm a little bit worried about our planning for Falco 0.37...looking at the libs milestone 0.14.0 and Falco milestone 0.37 we already have tons of stuff like:

ia32 support
k8s client
falco-driver-loader refactor
memleak issue
...

Probably we need to discuss what we really want to do in the next release :/

sboschman · 2023-09-28T08:05:32Z

To have insight in all these metrics it would be great to expose them with a prometheus style http metrics endpoint or integrate a OpenTelemetry client, so we can push them easily to a metrics database and view/graph/analyse them with f.e. Grafana.

incertum added the kind/feature New feature or request label Sep 12, 2023

incertum mentioned this issue Sep 12, 2023

OOM on physical servers falcosecurity/falco#2495

Open

incertum added this to the 0.14.0 milestone Sep 12, 2023

This was referenced Oct 20, 2023

cleanup(libsinsp): consolidation and extension of libsinsp stats / metrics sinsp_stats_v2 #1433

Merged

update(userspace/falco): add libsinsp state metrics option falcosecurity/falco#2883

Merged

incertum closed this as completed Nov 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRACKING] Enhanced libs stats / metrics #1347

[TRACKING] Enhanced libs stats / metrics #1347

incertum commented Sep 12, 2023

Andreagit97 commented Sep 12, 2023

sboschman commented Sep 28, 2023

[TRACKING] Enhanced libs stats / metrics #1347

[TRACKING] Enhanced libs stats / metrics #1347

Comments

incertum commented Sep 12, 2023

Andreagit97 commented Sep 12, 2023

sboschman commented Sep 28, 2023