os.cgroup.cpuacct.usage_nanos
is actually microseconds when Elasticsearch is ran inside cgroup v2
#96089
Labels
>bug
:Core/Infra/Core
Core issues without another label
Team:Core/Infra
Meta label for core/infra team
Elasticsearch Version
master
Installed Plugins
No response
Java Version
bundled
OS Version
5.15.0-1036-azure
Problem Description
When Elasticsearch is ran inside a cgroup v2, the node stats output for
"https://elasticsearch:9200/_nodes/stats?filter_path=nodes.*.os.cgroup.cpuacct.usage_nanos"
is actually in microseconds, cgroup v1 correctly reports these in the nanosecond unit:elasticsearch/server/src/main/java/org/elasticsearch/monitor/os/OsProbe.java
Line 695 in f01f07f
We collect these stats in Rally's node-stats telemetry device and it became clear that formula we use to derive CPU usage out of the available time is off by a factor of 1000 (i.e. the difference between nanoseconds and microseconds) for any container running inside a cgroup v2.
The below screenshots show the difference between cgroup v1 running on Google Kuberentes Engine (GKE), and cgroup v2 running on Azure Kubernetes Service (AKS):
GKE output (cgroup v1)
AKS output (cgroup v2):
Steps to Reproduce
I encountered the bug when running a cluster inside an Azure Kubernetes Service (AKS) cluster, but that's not exactly practical for reproductions.
We can repro this using ECK and Minikube with the Docker driver on macOs.
Note that for Linux users Minikube automatically detects whether or not cgroup v1 or v2 is in use by your workstation (i.e. where you invoke
minikube start
from), whereas for Docker Desktop users on macOS (which actually creates a Linux VM in the background) we need to adjust the cgroup version via modifying the engine's settings (more on this below).Testing with cgroup v2:
I'm on macOS Monterey 12.6 using the
docker
driver forminikube
, which is actually a Linux VM running behind the scenes. In order to force it to use cgroup v2 I had to configure"deprecatedCgroupv1": true,
in$HOME/Library/Group\ Containers/group.com.docker/settings.json
and then restartDocker Desktop
before following these steps:Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: