Skip to content

Commit

Permalink
Prometheus Node Exporter
Browse files Browse the repository at this point in the history
  • Loading branch information
huataihuang committed Jul 18, 2023
1 parent 5726202 commit b3b1065
Show file tree
Hide file tree
Showing 8 changed files with 109 additions and 3 deletions.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Prometheus社区官方提供了大约十几种 `xxxx_exporters <https://github.c
node_exporter.rst
node_exporter_textfile-collector.rst
node_exporter_ipmitool_text_plugin.rst
node_exporter_smartctl_text_plugin.rst
ipmi_exporter.rst
process-exporter.rst
amd_smi_exporter.rst
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ IPMI Exporter

:ref:`hpe_server_monitor`

:ref:`prometheus_exporters` 有一个官方 ``ipmi_exporter`` 可以基于 :ref:`ipmi` 输出 :ref:`metrics` 。并且有一个非常完美的 :ref:`grafana` `Dashboard IPMI for Prometheus <https://grafana.com/grafana/dashboards/13177-ipmi-for-prometheus/>`_ 。这样可以用来监控大规模服务器集群,并且生成告警。
:ref:`prometheus_exporters` 有一个官方 ``ipmi_exporter`` 可以基于 :ref:`ipmi` 输出 :ref:`metrics` 。并且有一个非常完美的 :ref:`grafana` `Grafana Dashboard 15765: IPMI Exporter <https://grafana.com/grafana/dashboards/15765-ipmi-exporter/>`_ 。这样可以用来监控大规模服务器集群,并且生成告警。

``ipmi_exporter`` 输出本地IPMI metrics到标准的 ``/metrics`` ,无需特殊配置。对于远程metrics,通用配置方法非常类似 :ref:`blackbox_exporter` (黑盒测试HTTP,HTTPS,DNS,TCP,ICMP和gRPC),只需要简单使用 ``target`` 和 ``module`` URL参数告知IPMI设备入口即可。可以对数以千计的IPMI设备进行metrics输出。

Expand Down Expand Up @@ -129,7 +129,7 @@ debug
配置Grafana
==============

:ref:`grafana` `Dashboard IPMI for Prometheus <https://grafana.com/grafana/dashboards/13177-ipmi-for-prometheus/>`_
:ref:`grafana` `Grafana Dashboard 15765: IPMI Exporter <https://grafana.com/grafana/dashboards/15765-ipmi-exporter/>`_

完成后就可以看到我的 :ref:`hpe_dl360_gen9` :ref:`hpe_server_monitor` 的功耗监控:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,62 @@ Node Exporter ipmitool 文本插件

通过 :ref:`node_exporter_textfile-collector` 可以将几乎所有文本转换成Prometheus的metrics,也包括 :ref:`ipmi` 。这种方式可以帮助我们监控服务器的硬件:

准备工作
==========

- 创建一个 ``/var/lib/node_exporter/textfile_collector/`` 用于存放 ``--collector.textfile.directory`` 对应的 ``*.prom`` 文件,以便转换成metrics::

sudo mkdir -p /var/lib/node_exporter/textfile_collector
sudo chomd 777 /var/lib/node_exporter/textfile_collector

- Prometheus社区提供了 `node-exporter-textfile-collector-scripts <https://github.com/prometheus-community/node-exporter-textfile-collector-scripts>`_ ,将这些脚本下载到服务器上:

.. literalinclude:: node_exporter_textfile-collector/git_node-exporter-textfile-collector-scripts
:caption: 下载 ``node-exporter-textfile-collector-scripts`` 到本地( ``/etc/prometheus`` )

这里将使用 ``/etc/prometheus/node-exporter-textfile-collector-scripts/ipmitool`` **脚本** 来转换服务器的 ``ipmitool sensor`` 命令输出成 :ref:`node_exporter_textfile-collector` 可处理的兼容格式

执行脚本
=========

社区推荐使用 ``sponge`` 来自动写输出,所以先执行以下命令生成一个案例检查::

sudo ipmitool sensor | /etc/prometheus/node-exporter-textfile-collector-scripts/ipmitool | sponge /var/lib/node_exporter/textfile_collector/ipmitool.prom

然后检查输出内容 ``/var/lib/node_exporter/textfile_collector/ipmitool.prom`` 可以看到类似::

# HELP node_ipmi_temperature_celsius Temperature sensor reading from ipmitool
# TYPE node_ipmi_temperature_celsius gauge
node_ipmi_temperature_celsius{sensor="37-Fuse"} 38.000000
node_ipmi_temperature_celsius{sensor="15-VR P1 Mem"} 34.000000
node_ipmi_temperature_celsius{sensor="02-CPU 1"} 40.000000
node_ipmi_temperature_celsius{sensor="16-VR P1 Mem"} 35.000000
...

这个文件就是 :ref:`node_exporter_textfile-collector` 可以处理的标准格式

- 检查没有问题,就配置 crontab (这里采用root用户) ::

crontab -e

输入内容::

* * * * * ipmitool sensor | /etc/prometheus/node-exporter-textfile-collector-scripts/ipmitool | sponge /var/lib/node_exporter/textfile_collector/ipmitool.prom

然后检查目标文件,正常是每分钟刷新一次

配置 ``node_exporter``
==========================

按照 :ref:`node_exporter` 中 :ref:`systemd` 运行服务配置,修订 ``/etc/systemd/system/node_exporter.service`` ::

ExecStart=/usr/local/bin/node_exporter \
--collector.textfile.directory=/var/lib/node_exporter/textfile_collector

重启 ``node_exporter`` 服务

:ref:`grafana` 中 ``import`` `Grafana Dashboard 13177: IPMI for Prometheus <https://grafana.com/grafana/dashboards/13177-ipmi-for-prometheus/>`_

完成后Dashboard:

.. figure:: ../../../../_static/kubernetes/monitor/prometheus/prometheus_exporters/node_exporter_with_ipmitool_text_plugin.png
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. _node_exporter_smartctl_text_plugin:

===================================
Node Exporter smartctl 文本插件
===================================

监控磁盘 SMART 数据,原理也是采用 :ref:`node_exporter_textfile-collector`

- 使用修订过的 `janw / node-exporter-textfile-collector-scripts / smartmon.sh <https://github.com/janw/node-exporter-textfile-collector-scripts/blob/master/smartmon.sh>`_

- `Grafana Dashboard 10664: SMART disk data <https://grafana.com/grafana/dashboards/10664-smart-disk-data/>`_ 也比较清晰

- 使用 `olegeech-me / S.M.A.R.T-disk-monitoring-for-Prometheus <https://github.com/olegeech-me/S.M.A.R.T-disk-monitoring-for-Prometheus/>`_ (从 `micha37-martins / S.M.A.R.T-disk-monitoring-for-Prometheus <https://github.com/micha37-martins/S.M.A.R.T-disk-monitoring-for-Prometheus>`_ fork出来):

- `Grafana Dashboard 13654: S.M.A.R.T Dashboard <https://grafana.com/grafana/dashboards/13654-s-m-a-r-t-dashboard/>`_ 比较美观清晰,准备主要使用这个面板

- 使用 `micha37-martins / S.M.A.R.T-disk-monitoring-for-Prometheus <https://github.com/micha37-martins/S.M.A.R.T-disk-monitoring-for-Prometheus>`_ 采集:

- `Grafana Dashboard 10530: S.M.A.R.T disk monitoring for Prometheus Dashboard <https://grafana.com/grafana/dashboards/10530-s-m-a-r-t-disk-monitoring-for-prometheus-dashboard/>`_ 这个概况比较好,准备使用
- `Grafana Dashboard 10531: S.M.A.R.T disk monitoring for Prometheus Errorboard <https://grafana.com/grafana/dashboards/10531-s-m-a-r-t-disk-monitoring-for-prometheus-errorboard/>`_ 主要扩展error details
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,31 @@
Node Exporter的Textfile Collector扩展
======================================

``textfile`` collector是Proetheus的一个扩展功能,可以通过定时任务输出状态,类似于 ``Pushgateway`` 。 ``Pushgateway`` 被用于服务类metrics,而 ``textfile`` 模块则用于处理主机的metrics。

:ref:`node_exporter` 运行参数上加上 ``--collector.textfile.directory`` 参数,则 collector 就会处理该目录下所有使用 `Prometheus Exposition formats <https://prometheus.io/docs/instrumenting/exposition_formats/>`_ 格式的以 ``*.prom`` 后缀的文件。(不支持时间戳)

( **汗,这段我没有理解,等以后再折腾** ) 对于一个cron任务要实现自动推送完成时间,可以采用如下方法(假设脚本名字是 ``count_hosts`` 用于计算服务器数量, ``/var/lib/node_exporter/textfile_collector/`` 是用来对应 ``--collector.textfile.directory`` 的存储 ``*.prom`` 文件目录):

.. literalinclude:: node_exporter_textfile-collector/count_hosts_textfile-collector
:caption: 为脚本增加时间戳输出到 ``*.prom`` 文件

社区脚本
==========

Prometheus社区提供了 `node-exporter-textfile-collector-scripts <https://github.com/prometheus-community/node-exporter-textfile-collector-scripts>`_ ,将这些脚本下载到服务器上:

.. literalinclude:: node_exporter_textfile-collector/git_node-exporter-textfile-collector-scripts
:caption: 下载 ``node-exporter-textfile-collector-scripts`` 到本地( ``/etc/prometheus`` )

实践案例
==========

- :ref:`node_exporter_ipmitool_text_plugin`

参考
========

- `Node Exporter (GitHub)#Textfile Collector <https://github.com/prometheus/node_exporter#textfile-collector>`_
- `Prometheus Textfile Collectors <https://www.nine.ch/en/blog/prometheus-textfile-collectors>`_
- `Prometheus Textfile Collectors <https://www.nine.ch/en/blog/prometheus-textfile-collectors>`_ 关于如何将Nagios监控输出改成Prometheus的兼容metrics
- `Using the textfile collector from a shell script <https://www.robustperception.io/using-the-textfile-collector-from-a-shell-script/>`_ 这个文档非常简单清晰,提供了一个脚本案例将自己的输出结果转换成Prometheus textfile collector的案例,可以借鉴
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
echo count_hosts_completion_time $(date +%s) > /var/lib/node_exporter/textfile_collector/count_hosts.prom.$$
mv /var/lib/node_exporter/textfile_collector/count_hosts.prom.$$ /var/lib/node_exporter/textfile_collector/count_hosts.prom
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
git clone git@github.com:prometheus-community/node-exporter-textfile-collector-scripts.git
sudo mv node-exporter-textfile-collector-scripts /etc/prometheus/

0 comments on commit b3b1065

Please sign in to comment.