You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Have enhanced 24/7 production visibility into Falco's software functioning to assess what may be causing high CPU or memory usage or possible memory leaks.
Feature
Expose additional existing libsinsp stats and add more relevant stats. Ultimately, the goal is to make them available in Falco metrics.
This will help us better debug CPU and memory usage of Falco or custom libs clients in production, especially because periodic metrics snapshots can be taken 24/7. Running Falco in special debug mode is more difficult in production for various reasons.
This issue will track possible metrics to be added to augment the substantial amount of metrics and counters that we already expose to consumers:
libsinsp state:
Here are some existing counters from the results of string searching for #ifdef GATHER_INTERNAL_STATS that we should evaluate:
uint64_t m_n_preemptions;
uint64_t m_n_noncached_fd_lookups;
uint64_t m_n_cached_fd_lookups;
uint64_t m_n_failed_fd_lookups;
uint64_t m_n_threads;
uint64_t m_n_fds;
uint64_t m_n_added_fds;
uint64_t m_n_removed_fds;
uint64_t m_n_stored_evts;
uint64_t m_n_store_drops;
uint64_t m_n_retrieved_evts;
uint64_t m_n_retrieve_drops;
sinsp_thread_manager
m_failed_lookups
m_cached_lookups
m_non_cached_lookups
m_added_threads
m_removed_threads
overall server load
In addition to the recently added CPU and memory usage snapshot metrics, we should also expose the following:
Overall server CPU usage
Overall server memory usage
The total number of currently running threads on the server, which serves as the ground truth for assessing the stability of our libsinsp state cache
The text was updated successfully, but these errors were encountered:
This seems a reasonable point! I'm a little bit worried about our planning for Falco 0.37...looking at the libs milestone 0.14.0 and Falco milestone 0.37 we already have tons of stuff like:
ia32 support
k8s client
falco-driver-loader refactor
memleak issue
...
Probably we need to discuss what we really want to do in the next release :/
To have insight in all these metrics it would be great to expose them with a prometheus style http metrics endpoint or integrate a OpenTelemetry client, so we can push them easily to a metrics database and view/graph/analyse them with f.e. Grafana.
Motivation
Have enhanced 24/7 production visibility into Falco's software functioning to assess what may be causing high CPU or memory usage or possible memory leaks.
Feature
Expose additional existing libsinsp stats and add more relevant stats. Ultimately, the goal is to make them available in
Falco metrics
.This will help us better debug CPU and memory usage of Falco or custom libs clients in production, especially because periodic metrics snapshots can be taken 24/7. Running Falco in special debug mode is more difficult in production for various reasons.
This issue will track possible metrics to be added to augment the substantial amount of metrics and counters that we already expose to consumers:
libsinsp state:
Here are some existing counters from the results of string searching for
#ifdef GATHER_INTERNAL_STATS
that we should evaluate:sinsp_thread_manager
overall server load
In addition to the recently added CPU and memory usage snapshot metrics, we should also expose the following:
The text was updated successfully, but these errors were encountered: