Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

intel_powerstat not working if deployed within a container #6

Open
divStar opened this issue May 10, 2024 · 0 comments
Open

intel_powerstat not working if deployed within a container #6

divStar opened this issue May 10, 2024 · 0 comments

Comments

@divStar
Copy link

divStar commented May 10, 2024

Hello,

TL;DR:

intel_powerstat does not work if Telegraf is deployed in a container.
I am referring to this ticket influxdata/telegraf#14881.


Relevant telegraf.conf

Relevant telegraf.conf
[[inputs.intel_powerstat]]
  interval = "10s"
  ## The user can choose which package metrics are monitored by the plugin with
  ## the package_metrics setting:
  ## - The default, will collect "current_power_consumption",
  ##   "current_dram_power_consumption" and "thermal_design_power"
  ## - Leaving this setting empty means no package metrics will be collected
  ## - Finally, a user can specify individual metrics to capture from the
  ##   supported options list
  ## Supported options:
  ##   "current_power_consumption", "current_dram_power_consumption",
  ##   "thermal_design_power", "max_turbo_frequency", "uncore_frequency",
  ##   "cpu_base_frequency"
  package_metrics = ["current_power_consumption", "current_dram_power_consumption"]

  ## The user can choose which per-CPU metrics are monitored by the plugin in
  ## cpu_metrics array.
  ## Empty or missing array means no per-CPU specific metrics will be collected
  ## by the plugin.
  ## Supported options:
  ##   "cpu_frequency", "cpu_c0_state_residency", "cpu_c1_state_residency",
  ##   "cpu_c6_state_residency", "cpu_busy_cycles", "cpu_temperature",
  ##   "cpu_busy_frequency"
  ## ATTENTION: cpu_busy_cycles is DEPRECATED - use cpu_c0_state_residency
  cpu_metrics = ["cpu_frequency", "cpu_c0_state_residency", "cpu_c1_state_residency","cpu_c6_state_residency", "cpu_busy_frequency"]

Logs from Telegraf

Telegraf logs
2024-02-22T15:53:22Z I! Loading config: /etc/telegraf/telegraf.conf

2024-02-22T15:53:22Z I! Starting Telegraf 1.29.5 brought to you by InfluxData the makers of InfluxDB

2024-02-22T15:53:22Z I! Available plugins: 241 inputs, 9 aggregators, 30 processors, 24 parsers, 60 outputs, 6 secret-stores

2024-02-22T15:53:22Z I! Loaded inputs: intel_powerstat mqtt_consumer

2024-02-22T15:53:22Z I! Loaded aggregators: 

2024-02-22T15:53:22Z I! Loaded processors: 

2024-02-22T15:53:22Z I! Loaded secretstores: 

2024-02-22T15:53:22Z I! Loaded outputs: influxdb_v2

2024-02-22T15:53:22Z I! Tags enabled: host=233d22a54fc0

2024-02-22T15:53:22Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"233d22a54fc0", Flush Interval:10s

2024-02-22T15:53:52Z I! [inputs.mqtt_consumer] Connected [tcp://mqtt.my.family:1883]

2024-02-22T15:53:52Z W! [inputs.intel_powerstat] Plugin started with errors: PowerTelemetry instance initialized with errors: failed to initialize msr: invalid MSR base path "/dev/cpu": file "/dev/cpu" does not exist; failed to initialize rapl: invalid base path of rapl control zone: file "/sys/devices/virtual/powercap/intel-rapl" does not exist

2024-02-22T15:54:00Z E! [inputs.intel_powerstat] Error in plugin: failed to update MSR time-related metrics: module "msr" is not initialized

2024-02-22T15:54:00Z E! [inputs.intel_powerstat] Error in plugin: failed to get "current_power_consumption": module "rapl" is not initialized

2024-02-22T15:54:00Z E! [inputs.intel_powerstat] Error in plugin: failed to get "current_dram_power_consumption": module "rapl" is not initialized

System info

Ubuntu 22.04.03, Telegraf 1.29.5, Docker (Server Version) 25.0.3

Docker

docker-compose.yml
version: '3'

services:
  telegraf:
    image: telegraf:latest
    container_name: telegraf
    restart: unless-stopped
    environment:
      INFLUX_TOKEN: "<token redacted>"
      HOST_ETC: "/hostfs/etc"
      HOST_PROC: "/hostfs/proc"
      HOST_SYS: "/hostfs/sys"
      HOST_VAR: "/hostfs/var"
      HOST_RUN: "/hostfs/run"
      HOST_MOUNT_PREFIX: "/hostfs"
    volumes:
      - '<host-path>/telegraf.conf:/etc/telegraf/telegraf.conf'
      - '/:/hostfs:ro'
    # depends_on:
    #  - influxdb
    networks:
      - services-network

networks:
  services-network:
    external: true

Steps to reproduce

  1. Ensure your system supports Intel MSR and/or RAPL and that the appropriate kernel modules have been loaded (e.g. using lsmod | grep rapl).
  2. Ensure your system has cpuid installed (sudo apt-get install -y cpuid)-
  3. Set up a network (in my example it's called services-network and is a bridge-type network).
  4. Create a docker-compose.yaml with just Telegraf - as mentioned in the docker part above.
  5. Configure it to use input.intel_powerstat.
  6. Run the docker-compose.yaml file.
  7. Wait about 20 seconds.

Expected behavior

I expect the plugin to look for PowerTelemtry inside /hostfs/sys/... or /hostfs/dev/... etc., to not throw any errors and ultimately grab the corresponding values.

Actual behavior

As 2024-02-22T15:53:52Z W! [inputs.intel_powerstat] Plugin started with errors: PowerTelemetry instance initialized with errors: failed to initialize msr: invalid MSR base path "/dev/cpu": file "/dev/cpu" does not exist; failed to initialize rapl: invalid base path of rapl control zone: file "/sys/devices/virtual/powercap/intel-rapl" does not exist states, the plug in does not find the corresponding folders.
/hostfs/dev/cpu and `/hostfs/sys/devices/virtual/powercap/intel-rapl" do indeed exist, but they seem to not be found.

Additional info

I've checked out the project and tried looking around, but I cannot find where (if at all) HOST_MOUNT_PREFIX or any of the HOST_* environment variables would be used. They are used to some extent in other plugins it seems, but not in this one.

Edit: I also figured the following: when installing Telegraf locally - even though MSR and RAPL are available - I had to do a couple of things before I could use it locally, namely this:

sudo chmod -R a+r /sys/devices/virtual/powercap/
sudo setcap cap_sys_rawio=ep /usr/bin/telegraf
sudo systemctl restart telegraf

After that, Telegraf started working locally and sending values to my InfluxDB in the container as I'd expect it to.

In the containerized Telegraf instance though, even mounting to /sys and /dev directly (not /hostfs/sys and /hostfs/dev) and even using privileged: true and user: "0:0", I could not get it to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant