Skip to content

Commit

Permalink
Prometheus Node Exporter
Browse files Browse the repository at this point in the history
  • Loading branch information
huataihuang committed Jul 18, 2023
1 parent d7149e3 commit 5726202
Show file tree
Hide file tree
Showing 37 changed files with 558 additions and 15 deletions.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions source/_static/performance/pcp/pcp_remote-collector.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions source/_static/performance/pcp/pmns-small.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions source/_static/performance/pcp/retrospective-architecture.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions source/devops/docs/kindle/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Kindle在,书未老
reset_restart_kindle.rst
save_web_page_as_pdf.rst
read_e-books_after_kindle.rst
kindle_download_helper.rst


.. only:: subproject and html
Expand Down
10 changes: 10 additions & 0 deletions source/devops/docs/kindle/kindle_download_helper.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. _kindle_download_helper:

=========================
kindle电子书批量下载
=========================

参考
======

- `Kindle_download_helper (Github) <https://github.com/yihong0618/Kindle_download_helper>`_
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ Prometheus社区官方提供了大约十几种 `xxxx_exporters <https://github.c

intro_prometheus_exporters.rst
node_exporter.rst
node_exporter_textfile-collector.rst
node_exporter_ipmitool_text_plugin.rst
ipmi_exporter.rst
process-exporter.rst
amd_smi_exporter.rst
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,12 @@ IPMI Exporter

有两个案例配置文件: ``ipmi_local.yml`` 抓取本地主机metrics,以及 ``ipmi_remote.yml`` 抓取远程IPMI接口。

.. note::

社区 :ref:`prometheus_exporters` 的 ``ipmi_exporter`` 是采用 ``freeipmi`` 来访问IPMI获取服务器监控数据的。

另外一种解决方案是采用 :ref:`node_exporter` 提供的 :ref:`node_exporter_ipmitool_text_plugin` 实现

安装
========

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,35 @@ Prometheus Node Exporter提供了一系列硬件和内核相关metric

采用 Prometheus 社区helm chart 完成的 :ref:`helm3_prometheus_grafana` 会自动为每个Node节点安装 Node Exporter。强烈推荐采用!!!

- 下载安装执行程序::
- 下载安装执行程序:

wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvfz node_exporter-1.3.1.linux-amd64.tar.gz
cd node_exporter-1.3.1.linux-amd64/
sudo mv node_exporter /usr/local/bin/
.. literalinclude:: node_exporter/install_node_exporter
:caption: 安装Node Exporter执行程序

# 直接运行
/usr/local/bin/node_exporter

要持续运行可以采用 screen ::
:strike:`要持续运行可以采用 screen` :

screen -S node_exporter -dm /usr/local/bin/node_exporter

.. note::

``node_exporter`` 运行不需要root权限,并且会监听所有网络接口的 ``9100`` 端口,所以prometheus可以直接抓去指定服务器 ``node_exporter`` 输出的 metrics

- 参考 :ref:`prometheus_startup` 的 :ref:`systemd` 配置,为 ``Node Exporter`` 配置一个服务 ``/etc/systemd/system/node_exporter.service``

.. literalinclude:: node_exporter/node_exporter.service
:caption: 配置 Node Exporter 服务,通过 :ref:`systemd` 运行

- 启动:

.. literalinclude:: node_exporter/systemd_node_exporter
:caption: 通过 :ref:`systemctl` 启动 ``node_exporter`` 服务

此时观察运行状态可以看到已经运行:

.. literalinclude:: node_exporter/systemd_node_exporter_output
:caption: 通过 :ref:`systemctl` 启动 ``node_exporter`` 服务的状态观察

检查
=======

Expand All @@ -45,14 +56,11 @@ Prometheus Node Exporter提供了一系列硬件和内核相关metric
配置prometheus实例
=====================

我们在 :ref:`prometheus_startup` 安装的初始配置上添加以下内容来抓取指定服务器数据::
我们在 :ref:`prometheus_startup` 安装的初始配置上添加以下内容来抓取指定服务器数据:

scrape_configs:
...
- job_name: "node"
static_configs:
- targets: ['localhost:9100']
.. literalinclude:: node_exporter/prometheus.yml
:language: yaml
:caption: 在 ``/etc/prometheus/prometheus.yml`` 中添加抓取node配置任务

然后重启 prometheus ,再通过浏览器观察查询一些案例表达式,例如:

Expand Down Expand Up @@ -92,8 +100,30 @@ Prometheus Node Exporter提供了一系列硬件和内核相关metric

这里采用主机名,是因为我已经部署 :ref:`priv_dnsmasq_ics` ,可以在任意主机上解析整个网络所有服务器

配置Grafana
===============

`Node Exporter Full <https://grafana.com/grafana/dashboards/1860-node-exporter-full/>`_ 提供了一个全面观察的Dashboard, ``Import`` 之后可以看到惊人的 29 个分类超过 192 个面板,很多观察参数以前都没有注意过,在异常分析场景下可以帮助我们对比系统问题:

.. figure:: ../../../../_static/kubernetes/monitor/prometheus/prometheus_exporters/node_exporter_full.png

部分 ``node_exporter`` 监控模块默认不启用(性能或采集消耗资源),如果要激活指定模块,可以通过修订 ``node_exporter`` 运行参数来增加,例如::

ExecStart=/usr/local/bin/node_exporter --collector.processes --collector.ntp

`Complete Node Exporter Mastery with Prometheus <https://devconnected.com/complete-node-exporter-mastery-with-prometheus/>`_ 推荐了2个非常有意思的prometheus监控实践分享:

- `Prometheus Monitoring for Java Developers <https://youtu.be/jb9j_IYv4cU>`_ 关于如何在Java代码中中加入Prometheus 库实现白盒监控(metrics)以及常见的Java框架性能数据bridge成Prometheus进行监控(前半部分是Prometheus的基本功能介绍可作为入门)

.. youtube:: jb9j_IYv4cU

- `How to Export Prometheus Metrics from Just About Anything <https://youtu.be/Zk09Mbu0YQk>`_ 关于如何实现 :ref:`node_exporter_textfile-collector`

.. youtube:: Zk09Mbu0YQk

参考
=========

- `MONITORING LINUX HOST METRICS WITH THE NODE EXPORTER <https://prometheus.io/docs/guides/node-exporter/>`_
- `How to Setup Prometheus Node Exporter on Kubernetes <https://devopscube.com/node-exporter-kubernetes/>`_
- `Complete Node Exporter Mastery with Prometheus <https://devconnected.com/complete-node-exporter-mastery-with-prometheus/>`_ 这篇文章较为全面,提供了详细的 `Node Exporter Full <https://grafana.com/grafana/dashboards/1860-node-exporter-full/>`_ 介绍以及使用附加模块和 :ref:`node_exporter_textfile-collector` YouTube资源
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
version=1.6.1
wget https://github.com/prometheus/node_exporter/releases/download/v${version}/node_exporter-${version}.linux-amd64.tar.gz
tar xvfz node_exporter-${version}.linux-amd64.tar.gz
cd node_exporter-${version}.linux-amd64/
sudo mv node_exporter /usr/local/bin/

# 直接运行
#/usr/local/bin/node_exporter
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[Unit]
Description=node_exporter
Wants=network-online.target
After=network-online.target

StartLimitIntervalSec=500
StartLimitBurst=5

[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
...
scrape_configs:
...
- job_name: "node"
static_configs:
- targets: ["localhost:9100"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
systemctl daemon-reload
systemctl enable --now node_exporter
systemctl status node_exporter
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
● node_exporter.service - node_exporter
Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-07-18 15:12:23 CST; 2min 37s ago
Main PID: 484617 (node_exporter)
Tasks: 5 (limit: 464040)
Memory: 3.2M
CPU: 20ms
CGroup: /system.slice/node_exporter.service
└─484617 /usr/local/bin/node_exporter

Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.618Z caller=node_exporter.go:117 level=info collector=thermal_zone
Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.618Z caller=node_exporter.go:117 level=info collector=time
Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.619Z caller=node_exporter.go:117 level=info collector=timex
Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.619Z caller=node_exporter.go:117 level=info collector=udp_queues
Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.619Z caller=node_exporter.go:117 level=info collector=uname
Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.619Z caller=node_exporter.go:117 level=info collector=vmstat
Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.619Z caller=node_exporter.go:117 level=info collector=xfs
Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.619Z caller=node_exporter.go:117 level=info collector=zfs
Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.619Z caller=tls_config.go:274 level=info msg="Listening on" address=[::]:9100
Jul 18 15:12:23 zcloud.staging.huatai.me node_exporter[484617]: ts=2023-07-18T07:12:23.620Z caller=tls_config.go:277 level=info msg="TLS is disabled." http2=false address=[::]:9100
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
.. _node_exporter_ipmitool_text_plugin:

===================================
Node Exporter ipmitool 文本插件
===================================

通过 :ref:`node_exporter_textfile-collector` 可以将几乎所有文本转换成Prometheus的metrics,也包括 :ref:`ipmi` 。这种方式可以帮助我们监控服务器的硬件:


Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. _node_exporter_textfile-collector:

======================================
Node Exporter的Textfile Collector扩展
======================================

参考
========

- `Node Exporter (GitHub)#Textfile Collector <https://github.com/prometheus/node_exporter#textfile-collector>`_
- `Prometheus Textfile Collectors <https://www.nine.ch/en/blog/prometheus-textfile-collectors>`_
1 change: 1 addition & 0 deletions source/kubernetes/security/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ Kubernetes安全
falco/index
cert-manager/index
spiffe/index
vault/index
10 changes: 10 additions & 0 deletions source/kubernetes/security/vault/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. _vault:

=========================================
vault: 安全访问secrets工具
=========================================

.. toctree::
:maxdepth: 1

intro_vault.rst
24 changes: 24 additions & 0 deletions source/kubernetes/security/vault/intro_vault.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
.. _intro_vault:

=================
vault简介
=================

`vault (GitHub) <https://github.com/hashicorp/vault>`_ 是著名的 HashiCorp 公司开发的安全凭证(secrets)管理工具。Vault能够为secrets提供统一的接口,同时提供严格的访问控制并记录详细的审计日志。

现代系统需要访问大量的secrets: 例如数据库凭证,外部服务的API密钥,面向服务的架构通信的凭证。如果没有secrets管理解决方案,密钥的生命管理、安全存储和详细的审计是不可能的,这就是valut提供的功能。

Vault提供的关键功能:

- 安全的secret存储: 任意key/value secrets可以存储在Vault中。secrets写入持久化存储前进行加密,以确保对原始存储访问权限不能访问secrets。
- 动态secrets: Vault 可以为某些系统(例如 AWS 或 SQL 数据库)按需生成secrets(按需生成有效期限的密钥对,租约到期后自动撤销)
- 数据加密: 可以自定义加密参数,加密数据可以存储在SQL数据库,无需用户设计自己的加密方法
- 租赁和续订:Vault 中的所有secrets都有与其关联的租约。 租约结束时,Vault 将自动撤销该秘密
- 撤销:Vault 内置了对secrets撤销的支持

待学习实践...

参考
======

- `vault (GitHub) <https://github.com/hashicorp/vault>`_

0 comments on commit 5726202

Please sign in to comment.