grafana mysql data source

huataihuang · Aug 8, 2023 · d96e0e1 · d96e0e1
1 parent cdfa394
commit d96e0e1
Show file tree

Hide file tree

Showing 26 changed files with 301 additions and 12 deletions.
diff --git a/source/_static/kubernetes/monitor/grafana/grafana_mysql_data_source.png b/source/_static/kubernetes/monitor/grafana/grafana_mysql_data_source.png
diff --git a/source/_static/kubernetes/monitor/grafana/grafana_mysql_datasource_query.png b/source/_static/kubernetes/monitor/grafana/grafana_mysql_datasource_query.png
diff --git a/source/devops/ansible/ansible_config_raid.rst b/source/devops/ansible/ansible_config_raid.rst
@@ -0,0 +1,14 @@
+.. _ansible_config_raid:
+
+=======================
+Ansible配置RAID存储
+=======================
+
+.. note::
+
+   Ansible可以配置本机RAID实现 :ref:`mdadm_raid10` 相同功能，待续...我准备用Ansible来实现
+
+参考
+=======
+
+- `Product Documentation > Red Hat Enterprise Linux > 9 > 管理存储设备 > 18.7. 使用存储系统角色配置 RAID 卷 <https://access.redhat.com/documentation/zh-cn/red_hat_enterprise_linux/9/html/managing_storage_devices/configuring-a-raid-volume-using-the-storage-system-role_managing-raid>`_
diff --git a/source/devops/ansible/index.rst b/source/devops/ansible/index.rst
@@ -8,6 +8,7 @@ Ansible
    :maxdepth: 1
 
    introduce_ansible.rst
+   ansible_config_raid.rst
 
 .. only::  subproject and html
 

diff --git a/source/kernel/process/process_vs_thread.rst b/source/kernel/process/process_vs_thread.rst
@@ -74,6 +74,26 @@ Richard Stevens大师这样说过(大意):
   - 第4列 ``LWP`` 表示轻量级进程 ``Light Weight Process`` ，也就是线程 ``TID``
   - 第6列 ``NWLP`` 就是表示 ``Number of Threads`` (线程数量)
 
+- ``ps`` 命令可以检查指定进程的线程，非常重要的命令:
+
+.. literalinclude:: process_vs_thread/ps_special_thread
+   :caption: 检查指定进程的线程 **重要命令**
+
+输出显示类似:
+
+.. literalinclude:: process_vs_thread/ps_special_thread_output
+   :caption: 检查指定进程的线程输出案例 
+
+可以看到，这里根据第5列 ``线程命令`` 进行统计，就能找出哪个命令大量出现线程泄漏:
+
+.. literalinclude:: process_vs_thread/ps_special_thread_count
+   :caption: 统计指定进程的哪个线程出现泄
+
+输出类似:
+
+.. literalinclude:: process_vs_thread/ps_special_thread_count_output
+   :caption: 统计指定进程的哪个线程出现泄
+
 - 通过 ``pstree`` 命令
 
 .. literalinclude:: process_vs_thread/pstree_thread

diff --git a/source/kernel/process/process_vs_thread/ps_special_thread b/source/kernel/process/process_vs_thread/ps_special_thread
@@ -0,0 +1 @@
+ps -T -p <PID>
diff --git a/source/kernel/process/process_vs_thread/ps_special_thread_count b/source/kernel/process/process_vs_thread/ps_special_thread_count
@@ -0,0 +1 @@
+ps -T -p <PID> | awk '{print $5}' | sort | uniq -c | sort -n -k1
diff --git a/source/kernel/process/process_vs_thread/ps_special_thread_count_output b/source/kernel/process/process_vs_thread/ps_special_thread_count_output
@@ -0,0 +1,6 @@
+...
+      3 listener_loop
+      3 reaper
+      3 rund-1f1b78d6
+      4 prealloc-memnum
+  23758 client_handler
diff --git a/source/kernel/process/process_vs_thread/ps_special_thread_output b/source/kernel/process/process_vs_thread/ps_special_thread_output
@@ -0,0 +1,9 @@
+   PID   SPID TTY          TIME CMD
+ 39112  39112 ?        00:00:00 rund-1f1b78d6
+ 39112  39115 ?        00:00:00 tokio-runtime-w
+ 39112  40565 ?        3-11:35:27 vmm_master
+ 39112  41223 ?        00:00:00 blk_iothread_q0
+ 39112  43205 ?        39-17:59:40 fc_vcpu0
+ 39112  43206 ?        34-22:50:52 fc_vcpu1
+ 39112  43207 ?        34-23:18:56 fc_vcpu2
+...
diff --git a/source/kernel/process/thread_count.rst b/source/kernel/process/thread_count.rst
@@ -60,6 +60,37 @@
    :caption: ``top`` 的 ``nTH`` 字段无法显示超过3位数值
    :emphasize-lines: 8
 
+- (推荐) ``ps`` 命令可以检查指定进程的线程，非常重要的命令:
+
+.. literalinclude:: process_vs_thread/ps_special_thread
+   :caption: 检查指定进程的线程 **重要命令**
+
+输出显示类似:
+
+.. literalinclude:: process_vs_thread/ps_special_thread_output
+   :caption: 检查指定进程的线程输出案例
+
+可以看到，这里根据第5列 ``线程命令`` 进行统计，就能找出哪个命令大量出现线程泄漏:
+
+.. literalinclude:: process_vs_thread/ps_special_thread_count
+   :caption: 统计指定进程的哪个线程出现泄
+
+输出类似:
+
+.. literalinclude:: process_vs_thread/ps_special_thread_count_output
+   :caption: 统计指定进程的哪个线程出现泄
+
+debug线程数量问题
+---------------------
+
+根据找到的怀疑泄漏线程的命令，例如上文 ``client_handler`` ，我们可以找一下这个问题线程的堆栈是否有异常:
+
+.. literalinclude:: thread_count/threads_stack
+   :caption: 检查异常线程的堆栈
+
+可以看到陷入了一个 syscall 
+
+
 进程允许的最大线程数量
 =======================
 

diff --git a/source/kernel/process/thread_count/threads_stack b/source/kernel/process/thread_count/threads_stack
@@ -0,0 +1,7 @@
+#cat /proc/303564/stack
+[<0>] futex_wait_queue_me+0xb6/0x110
+[<0>] futex_wait+0xe9/0x240
+[<0>] do_futex+0xa7/0x150
+[<0>] __x64_sys_futex+0x146/0x1c0
+[<0>] do_syscall_64+0x2d/0x40
+[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
diff --git a/source/kernel/process/utils/ps.rst b/source/kernel/process/utils/ps.rst
@@ -2,4 +2,23 @@
 
 =======================
 ps进程检查工具
-=======================
+=======================
+
+``ps`` 输出字段
+====================
+
+Linux/Unix 通常采用 System V ( ``ps -elf`` ) 或 BSD ( ``ps alx`` ) 风格 ``ps``
+
+.. note::
+
+   :ref:`ubuntu_linux` 使用 ``ps -elf`` 或 ``ps alx`` 都可以工作，但是输出字段有细微差异
+
+   似乎 ``ps -elf`` 更为适合(输出字段中有 ``C`` 表明CPU使用率)
+
+后续再学习
+
+
+参考
+======
+
+- `About the output fields of the ps command in Unix <https://kb.iu.edu/d/afnv>`_ 非常清晰的 ``ps -o`` 参数字段快速查询，建议参考
diff --git a/source/kubernetes/monitor/grafana/grafana_mysql_data_source.rst b/source/kubernetes/monitor/grafana/grafana_mysql_data_source.rst
@@ -21,6 +21,39 @@ Grafana内置提供了MySQL数据源插件，也就是可以直接查询和可
 
 - 比较简单的方式是通过程序脚本、日志系统，向MySQL数据库加载数据
 
+- 创建一个 ``grafanareader`` 只读查询账号并赋予查询权限:
+
+.. literalinclude:: grafana_mysql_data_source/grant_grafana_mysql_account
+   :caption: 创建 ``grafanareader`` 只读查询账号并赋予查询权限
+
+配置数据源
+============
+
+- 在左方导航菜单选择 ``Connections => Data sources`` ，然后点击 ``Add new data source`` 按钮
+
+- 选择 ``MySQL`` 数据源
+
+- 输入配置:
+
+.. figure:: ../../../_static/kubernetes/monitor/grafana/grafana_mysql_data_source.png
+
+- 测试通过后保存，然后就可以到dashboard中去构建查询
+
+查询
+========
+
+Grafana提供了Query Builder，只需要选择MySQL数据源，就可以依次选择数据表，列进行查询；当然SQL语法可以直接使用，所以也可以在MySQL中先采用标准SQL查询好符合预期的结果，然后再输入到 Grafana 的 Query Code栏进行查询
+
+- 数据查询: 按天纬度查询每日告警数据量
+
+.. literalinclude:: ../../../mysql/query/mysql_query_date_time/count_rows_per_day
+   :caption: 按照每天统计告警数量
+
+MySQL查询语句可以直接在Grafana中使用，只需要构建DashBoard的可视化面板时选择MySQL数据源，然后选择正确的Graph就可以(注意：Grafana图表是按时间序列，所以统计数据如果没有时间需要就需要使用表格或特定图表)，效果类似如下:
+
+.. figure:: ../../../_static/kubernetes/monitor/grafana/grafana_mysql_datasource_query.png
+   :scale: 60
+
 参考
 =======
 

diff --git a/source/kubernetes/monitor/grafana/grafana_mysql_data_source/grant_grafana_mysql_account b/source/kubernetes/monitor/grafana/grafana_mysql_data_source/grant_grafana_mysql_account
@@ -0,0 +1,3 @@
+ grant usage on notifier.* to grafanaReader@'%' identified by 'PASSWORD';
+ grant select on notifier.* to 'grafanaReader';
+ flush privileges;
diff --git a/source/linux/storage/software_raid/index.rst b/source/linux/storage/software_raid/index.rst
@@ -9,7 +9,8 @@ Linux 软RAID
 
    linux_software_raid_arch.rst
    mdadm.rst
-   mdadm_raid6.rst
+   mdadm_raid10.rst
+   ../../../devops/ansible/ansible_config_raid.rst
 
 .. only::  subproject and html
 

diff --git a/source/linux/storage/software_raid/linux_software_raid_arch.rst b/source/linux/storage/software_raid/linux_software_raid_arch.rst
@@ -4,6 +4,83 @@
 Linux软RAID架构
 =======================
 
+RAID概述
+==========
+
+RAID技术可以将多个存储设备(HDD, SDD 或 :ref:`nvme` )组合成一个阵列来提供性能提升和冗余增加，对于操作系统，这个RAID设备是一个单一存储单元(驱动器)。RAID技术为计算机提供了冗余(RAID1,4,5,6)以及较低延迟、增大带宽以及最大程度从硬盘崩溃中恢复的能力。
+
+RAID配置:
+
+- RAID0: 条代化
+- RAID1: 镜像
+- RAID4,5,6: 带有奇偶校验的磁盘条代化
+
+RAID类型
+-------------
+
+- 固件RAID: 也称为ATARAID，是一种软件RAID，使用基于固件的菜单配置RAID，此类RAID使用的固件会挂载到BIOS，允许从RAID集启动。(这种RAID我没有接触过，可以参考 `Set Up a System with Intel® Matrix RAID Technology <https://www.intel.com/content/www/us/en/support/articles/000005789/technologies.html>`_ )
+- 硬件RAID: 基于硬件的RAID是独立于主机管理的RAID子系统
+
+  - 内部设备通常是专用控制器卡，处理对操作系统透明的RAID任务
+  - 外部设备通常通过SCSI, 光纤, iSCSI, InfiniBand其他高速网络连接的系统，并显示卷(逻辑单元)给系统
+
+- 软件RAID: 在内核块设备代码中实现的RAID级别，不需要昂贵的磁盘控制器，可以采用任何Linux内核支持的块设备组建软件RAID，例如SATA, SCSI 或 :ref:`nvme` 存储设备。随着CPU性能越来越快，除了高端存储设备，软件RAID通常优于硬件RAID。
+
+Linux 软件 RAID 堆栈的主要功能:
+
+- 多线程设计
+- 在不同的 Linux 机器间移动磁盘阵列不需要重新构建数据
+- 使用空闲系统资源进行后台阵列重构
+- 支持热插拔驱动器
+- 自动 CPU 检测以利用某些 CPU 功能，如流传输单一指令多个数据 (Single Instruction Multiple Data, SIMD) 支持
+- 自动更正阵列磁盘中的错误扇区
+- 定期检查 RAID 数据，以确保阵列的健康状态
+- 主动监控阵列，在发生重要事件时将电子邮件报警发送到指定的电子邮件地址
+- ``Write-intent bitmaps`` 允许内核准确了解磁盘的哪些部分需要重新同步，而不必在系统崩溃后重新同步整个阵列，可以大大提高了重新同步事件的速度
+- 重新同步检查点: 重新同步期间重新启动计算机，则在启动时重新同步会从其停止的地方开始，而不是从头开始
+- 安装后更改阵列参数的功能，称为重塑（reshaping）: 举例，当有新设备需要添加时，可以将 4 磁盘 RAID5 阵列增加成 5 磁盘 RAID5 阵列。这种增加操作是实时的，不需要重新安装
+- 重塑支持更改设备数量、RAID 算法或 RAID 阵列类型的大小，如 RAID4、RAID5、RAID6 或 RAID10
+- 接管支持 RAID 级别转换，比如 RAID0 到 RAID6
+- 集群 MD(Cluster MD) 是集群的存储解决方案，可为集群提供 RAID1 镜像的冗余(怎么实践？)
+
+RAID级别和线性支持
+-------------------
+
+- RAID0: 条带化的数据映射技术，即写入阵列的数据被分成条块，分散到成员磁盘写入。这样可以低成本提高存储I/O性能， **但是没有冗余(数据安全)** 。注意，RAID0只能实现成员设备中最小设备(容量)的条带分布，也就是说如果组成RAID0的各个磁盘容量不一致，则以最小容量的磁盘来构成RAID0(磁盘空间较大的部分会被忽略)。这种组建磁盘容量大小不一的情况，建议使用磁盘分区组建RAID0
+
+- RAID1: 镜像（mirroring）技术，通过将相同数据写入阵列的每个磁盘来提供冗余。这是一种简单且数据高度可用的RAID技术，被广泛使用。提供了很好的数据可靠性，并且提高了读取性能，但成本较高(利用率低)
+
+- RAID4: 使用单一磁盘驱动器中的奇偶校验来保护数据: 需要注意RAID4使用了专用奇偶校验磁盘，所以该磁盘可能是RAID阵列的瓶颈(在没有回写缓存技术时很少采用RAID4); RAID4的读性能好于写性能(写时需要计算校验且写校验盘，而读时候只访问数据盘)
+
+- RAID5: 最常见的RAID类型，在阵列的所有成员磁盘中分布奇偶校验，所以消除了RAID4的校验盘写入瓶颈，但是奇偶校验计算依然是沉重的负担。不过好在现代CPU性能强劲能够很好满足计算校验，需要考虑的是大量磁盘可能造成校验计算负担加重
+
+- RAID6: RAID6采用了复杂的奇偶校验，能够允许出现2块磁盘故障，所以带来更好的数据冗余和保护性，缺点是对CPU造成较大负担，而且写入性能下降(性能不对称性比RAID4/5更严重)
+
+- RAID10: 结合了RAID0的性能优势和RAID1的冗余，没有RAID5/6这样奇偶校验计算的CPU消耗。空间利用率好于RAID1但不如RAID5/6
+
+- 线性RAID: 没有条带化，数据顺序填充磁盘，只有上一个磁盘完全写满才会进入下一个磁盘写入，这种方法没哟性能优势，也不提供冗余，所以通常不建议使用
+
+Linux RAID子系统
+==================
+
+``mdraid`` 子系统是Linux软件RAID解决方案，该子系统使用自己的元数据格式，称为原生MD元数据。此外还支持外部元数据(RHEL9支持外部元数据 ``mdraid`` 来访问 Intel Rapid Storage (ISW) 或 Intel Matrix Storage Manager (IMSM) 设置和存储网络行业关联 (SNIA) 磁盘驱动器格式 (DDF))。
+
+RAID思考
+==========
+
+``RAID`` 是一个非常古老的存储技术: 根据 `Wikipedia: RAID <https://en.wikipedia.org/wiki/RAID>`_ 介绍，早在上个世纪1970年代(距今已经有半个世纪多)，已经发明了RAID 1；到1986年，发明了 RAID 5；到1988年，RAID的各个级别已经明确定义。
+
+这项古老的技术对于早期需要大容量存储以及高可用、高性能的数据中心有着非凡的意义，记得我年轻的时候，提到服务器，必然会想到RAID存储。可以说RAID存储就是服务器的必然组成。
+
+不过，虽然RAID能够提供极佳的性能和可靠性，但是随着数据中心规模发展以及数据爆炸式增长，RAID技术也存在一定的局限以及需要不断优化调整，以结合其他计算机技术来实现技术的"重新焕发青春":
+
+- RAID的创建和重建是非常耗时的过程，特别是现代存储动辄上T容量: 在古老的计算机历史中有一段时间，SSD技术还在襁褓，而机械磁盘突破到数百GB的时候，大规模RAID重建曾经是运维的噩梦。因为RAID重建实在太慢了，甚至还没等重建完成新的磁盘又出现故障。不过， SSD技术快速迭代发展，特别是 :ref:`nvme` 横空出世带来存储技术的飞跃，这个RAID重建的速度已经不再是瓶颈。RAID技术再次获得青睐，可以为服务器存储本地提供超大规格的存储空间。
+- 强健的大规模分布式存储实际上不再依赖本地磁盘的冗余，也就是说，类似 :ref:`ceph` 自身能够检测数据分布，不依赖于RAID就能够实现数据的分布式多副本，省却了本地构建RAID的技术复杂性(但是， :ref:`ceph` 实在是更复杂了)。然而，并不是所有的分布式存储都像 :ref:`ceph` 那样全面，由于设计上的轻量级和简化，类似 :ref:`gluster` 分布式存储是不感知本机多磁盘数据分布的。也就是说，对于GlusterFS，只认本地一个 ``brick``
+  为好，以强制数据副本分布到不同服务器上。这就为重新引入RAID带来的契机。还有历史悠久并且依然在HPC领域广泛使用的 `Lustre_(file_system) <https://en.wikipedia.org/wiki/Lustre_(file_system)>`_ ，后端存储也是建立在RAID之上(现代已经趋向采用 :ref:`zfs` 后端)
+
+总之，在 :ref:`nvme` 硬件技术加持下，RAID技术重获青春，可以为本地存储提供海量空间，也为分布式存储提供了稳健的后端。( :ref:`zfs` 也是类似的技术 )
+
+
 参考
 ======
 

diff --git a/source/linux/storage/software_raid/mdadm.rst b/source/linux/storage/software_raid/mdadm.rst
@@ -3,3 +3,7 @@
 ========================
 mdadm 软RAID构建
 ========================
+
+.. note::
+
+   早期的一次 `raid_in_linux.md <https://github.com/huataihuang/cloud-atlas-draft/blob/master/os/linux/storage/device-mapper/raid/raid_in_linux.md>`_ 笔记: ``mdadm`` 软RAID已经是非常稳定和长久的开源项目，实际上变化不大。不过，我依然准备再系统学习和实践。
diff --git a/source/linux/storage/software_raid/mdadm_raid10.rst b/source/linux/storage/software_raid/mdadm_raid10.rst
@@ -0,0 +1,6 @@
+.. _mdadm_raid10:
+
+=====================
+mdadm构建RAID10
+=====================
+
diff --git a/source/linux/storage/software_raid/mdadm_raid6.rst b/source/linux/storage/software_raid/mdadm_raid6.rst
diff --git a/source/mysql/query/mysql_query_date_time.rst b/source/mysql/query/mysql_query_date_time.rst
@@ -16,10 +16,33 @@ MySQL查询日期和时间
    :language: sql
    :caption: 找出两个时间戳之间记录
 
-待续...
+统计
+=========
+
+- 统计一定时间范围内按照天的告警计数，这里使用 ``date()`` 可以按天合并统计:
+
+.. literalinclude:: mysql_query_date_time/count_rows_per_day
+   :caption: 按照每天统计告警数量
+
+输出案例:
+
+.. literalinclude:: mysql_query_date_time/count_rows_per_day_output
+   :caption: 按照每天统计告警数量输出案例
+
+- 如果要按照小时进行统计也类似，只不过需要注意小时数据排序时默认是ASCII排序，需要转换成数值排序:
+
+.. literalinclude:: mysql_query_date_time/count_rows_per_hour_order_by_number
+   :caption: 按照小时统计排序
+
+输出案例:
+
+.. literalinclude:: mysql_query_date_time/count_rows_per_hour_order_by_number_output
+   :caption: 按照小时统计排序输出案例
 
 参考
 ======
 
 - `How to Query Date and Time in MySQL <https://popsql.com/learn-sql/mysql/how-to-query-date-and-time-in-mysql>`_
 - `The Ultimate Guide To MySQL DATE and Date Functions <mysqltutorial.org/mysql-date/>`_
+- `MySQL - count rows per day <https://dirask.com/posts/MySQL-count-rows-per-day-D6BLnD>`_
+- `SQL order string as number <https://stackoverflow.com/questions/11808573/sql-order-string-as-number>`_ 数据查询按字段排讯默认是ASCII顺序，有时候我们需要按照数字大小排序，这里提供了很好的案例
diff --git a/source/mysql/query/mysql_query_date_time/count_rows_per_day b/source/mysql/query/mysql_query_date_time/count_rows_per_day
@@ -0,0 +1,3 @@
+SELECT count(alert) as 告警数量, date(gmt_create) as 日期 FROM notifier.notifier_alert_statistics
+    where gmt_create between '2023-07-01 00:00:00' and '2023-07-31 23:59:59' 
+    GROUP BY date( gmt_create ) order by date(gmt_create);
diff --git a/source/mysql/query/mysql_query_date_time/count_rows_per_day_output b/source/mysql/query/mysql_query_date_time/count_rows_per_day_output
@@ -0,0 +1,13 @@
++--------------+------------+
+| 告警数量     | 日期       |
++--------------+------------+
+|            1 | 2023-07-01 |
+|            6 | 2023-07-03 |
+|            2 | 2023-07-05 |
+|            7 | 2023-07-06 |
+......
+|            9 | 2023-07-27 |
+|            3 | 2023-07-28 |
+|            1 | 2023-07-29 |
++--------------+------------+
+25 rows in set (0.00 sec)
diff --git a/source/mysql/query/mysql_query_date_time/count_rows_per_hour_order_by_number b/source/mysql/query/mysql_query_date_time/count_rows_per_hour_order_by_number
@@ -0,0 +1,3 @@
+SELECT count(alert) as 告警数量, hour(gmt_create) as 小时 FROM notifier.notifier_alert_statistics 
+    where gmt_create between '2023-07-01 00:00:00' and '2023-07-31 23:59:59'
+    GROUP BY hour(gmt_create) order by cast(hour(gmt_create) as unsigned);
diff --git a/source/mysql/query/mysql_query_date_time/count_rows_per_hour_order_by_number_output b/source/mysql/query/mysql_query_date_time/count_rows_per_hour_order_by_number_output
@@ -0,0 +1,10 @@
++--------------+--------+
+| 告警数量     | 小时   |
++--------------+--------+
+|            3 |      0 |
+|            3 |      1 |
+|            2 |      2 |
+......
+|            4 |     23 |
++--------------+--------+
+24 rows in set (0.00 sec)
diff --git a/source/shell/utils/find.rst b/source/shell/utils/find.rst
@@ -13,6 +13,14 @@
 
 例如，我在 :ref:`virt-install_location_iso_image` 找多个文件
 
+查找控制递归目录深度
+=======================
+
+默认情况下， ``find`` 会搜索指定目录的所有子目录(递归)，但是有时候我们就是要限定目录深度，此时可以使用参数 ``--maxdepth 1`` 选项:
+
+.. literalinclude:: find/find_maxdepth
+   :caption: find命令控制递归目录深度
+
 参考
 ======
 

diff --git a/source/shell/utils/find/find_maxdepth b/source/shell/utils/find/find_maxdepth
@@ -0,0 +1,5 @@
+# Do NOT show hidden files (beginning with ".", i.e., .*):
+find DirsRoot/* -maxdepth 0 -type f
+
+#  DO show hidden files:
+find DirsRoot/ -maxdepth 1 -type f