skywalking and gluster

huataihuang · Aug 1, 2023 · fab4bce · fab4bce
1 parent 9169cb8
commit fab4bce
Show file tree

Hide file tree

Showing 26 changed files with 388 additions and 8 deletions.
diff --git a/source/clang/parallel_make.rst b/source/clang/parallel_make.rst
@@ -26,3 +26,8 @@
 .. note::
 
    使用 ``-j`` 参数并发在多处理器服务器上运行编译，可以看到 ``real`` 时间远小于 ``system`` 和 ``user`` 累加的时间。这是因为 ``system`` 和 ``user`` 消耗时间是在多个处理器上累加起来的，多个并发实际完成时间大大缩短 ( ``real`` 时间 )。详见 :ref:`time`
+
+参考
+======
+
+- `Parallel make: set -j8 as the default option <https://stackoverflow.com/questions/10567890/parallel-make-set-j8-as-the-default-option>`_
diff --git a/source/gluster/best_practices_for_gluster/gluster_underlay_filesystem.rst b/source/gluster/best_practices_for_gluster/gluster_underlay_filesystem.rst
@@ -0,0 +1,13 @@
+.. _gluster_underlay_filesystem:
+
+=======================
+Gluster存储底层文件系统
+=======================
+
+GlusterFS底层需要采用操作系统提供的文件系统，有以下推荐方案:
+
+- 直接使用 :ref:`xfs` 
+- 结合 :ref:`linux_lvm` 使用 :ref:`xfs`
+- 采用 :ref:`zfs` 同时兼具卷管理和文件系统功能
+
+待续...
diff --git a/source/gluster/best_practices_for_gluster/index.rst b/source/gluster/best_practices_for_gluster/index.rst
@@ -0,0 +1,18 @@
+.. _best_practices_for_gluster:
+
+=================================
+Gluster存储最佳实践
+=================================
+
+.. toctree::
+   :maxdepth: 1
+
+   think_best_practices_for_gluster.rst
+   gluster_underlay_filesystem.rst
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/source/gluster/best_practices_for_gluster/think_best_practices_for_gluster.rst b/source/gluster/best_practices_for_gluster/think_best_practices_for_gluster.rst
@@ -0,0 +1,12 @@
+.. _think_best_practices_for_gluster:
+
+===============================
+Gluster存储最佳实践的思考
+===============================
+
+最近 :ref:`deploy_centos7_suse15_suse12_gluster11` 沿用了我之前的部署方案，但是在 :ref:`add_centos7_gluster11_server` 突然触发我意识到直接使用裸盘的利弊，特别是阅读了 `Gluster Storage for Oracle Linux: Best Practices and Sizing Guideline <https://www.oracle.com/a/ocom/docs/linux/gluster-storage-linux-best-practices.pdf>`_ 之后，我觉得需要进一步梳理方案，对比和分析以总结 :ref:`best_practices_for_gluster` ，实现技术上的迭代进步。
+
+参考
+=====
+
+- `Gluster Storage for Oracle Linux: Best Practices and Sizing Guideline <https://www.oracle.com/a/ocom/docs/linux/gluster-storage-linux-best-practices.pdf>`_
diff --git a/source/gluster/deploy/centos/add_centos7_gluster11_server.rst b/source/gluster/deploy/centos/add_centos7_gluster11_server.rst
@@ -0,0 +1,85 @@
+.. _add_centos7_gluster11_server:
+
+=========================================
+CentOS 7上部署的Gluster 11 集群添加服务器
+=========================================
+
+在完成 :ref:`deploy_centos7_gluster11` 后，我需要为集群增加服务器节点，扩容服务器。
+
+准备工作
+===========
+
+- :ref:`build_glusterfs_11_for_centos_7`
+- :ref:`gluster11_rpm_createrepo` **添加仓库配置**
+
+安装和启动服务
+===============
+
+- 安装方法同 :ref:`deploy_centos7_gluster6` :
+
+.. literalinclude:: deploy_centos7_gluster11/yum_install_glusterfs-server
+   :caption: 在CentOS上安装GlusterFS
+
+- 启动GlusterFS管理服务:
+
+.. literalinclude:: deploy_centos7_gluster11/systemctl_enable_glusterd
+   :caption: 启动和激活GlusterFS管理服务
+
+- 检查 ``glusterd`` 服务状态:
+
+.. literalinclude:: deploy_centos7_gluster11/systemctl_status_glusterd
+   :caption: 检查GlusterFS管理服务
+
+添加GlusterFS新节点
+======================
+
+- 配置gluster配对 **只需要在一台服务器上执行一次** (可以在第一台服务器上) ，这里添加的服务器是我们集群的第7台服务器，不过由于只添加一台不需要区分，所以所以我命名为 ``server`` :
+
+.. literalinclude:: add_centos7_gluster11_server/gluster_peer_probe
+   :language: bash
+   :caption: 在 **一台** 服务器上 **执行一次** ``gluster peer probe`` 添加这台新服务器
+
+- 完成后检查 ``gluster peer`` 状态:
+
+.. literalinclude:: deploy_centos7_gluster11/gluster_peer_status
+   :caption: 在 **一台** 服务器上执行 ``gluster peer status`` 检查新添加节点是否正确连接到集群
+
+.. literalinclude:: add_centos7_gluster11_server/gluster_peer_status_output
+   :caption: ``gluster peer status`` 输出显示 ``peer`` 是 ``Connected`` 状态则表明构建成功
+   :emphasize-lines: 23-25
+
+- 依然是采用 :ref:`deploy_centos7_gluster11` 的卷，所以扩展(也就是 ``add_brick`` )采用如下简单脚本:
+
+.. literalinclude:: add_centos7_gluster11_server/add_gluster
+   :language: bash
+   :caption: ``create_gluster`` 脚本，传递卷名作为参数就可以 **扩展** ( ``add_brick`` ) 现有的``replica 3`` 分布式卷
+
+这里有一个报错提示:
+
+.. literalinclude:: add_centos7_gluster11_server/add_gluster_error
+   :language: bash
+   :caption: ``add_brick`` 提示为一个replica卷添加的多个brick位于相同服务器上，不是优化设置
+
+我验证了一下，确实可以在命令最后加上 ``force`` 关键字完成 ``add_brick`` ，但是给我带来如下困扰
+
+- 新添加的 ``brick`` 全部排在 ``bricks`` 列表的最后:
+
+.. literalinclude:: ../../startup/gluster_architecture/gluster_volume_info
+   :caption: 执行 ``gluster volume info`` 可以检查卷信息
+
+可以看到最后添加的 ``192.168.1.7`` 所有的bricks:
+
+.. literalinclude:: add_centos7_gluster11_server/gluster_volume_info_output
+   :caption: 执行 ``gluster volume info`` 看到新增加的服务器上所有bricks都是列在最后
+   :emphasize-lines: 26-39
+
+这里扩容的新节点有一个严重的问题，所有 ``bricks`` 都位于一台服务器上，会使得一部分hash到 ``brick73`` 到 ``brick84`` 的数据全部落在一台服务器上: :ref:`gluster_underlay_filesystem` 采用裸盘 :ref:`xfs` 带来的限制就是服务器节点不可增加或缩减
+
+.. note::
+
+   我将在 :ref:`best_practices_for_gluster` 详细探讨我的实践方案以及总结改进
+
+参考
+======
+
+- `rackspace docs: Add and Remove GlusterFS Servers <https://docs.rackspace.com/docs/add-and-remove-glusterfs-servers>`_
diff --git a/source/gluster/deploy/centos/add_centos7_gluster11_server/add_gluster b/source/gluster/deploy/centos/add_centos7_gluster11_server/add_gluster
@@ -0,0 +1,27 @@
+volume=$1
+server=192.168.1.7
+
+gluster volume add-brick ${volume} replica 3 \
+        ${server}:/data/brick0/${volume} \
+        \
+        ${server}:/data/brick1/${volume} \
+        \
+        ${server}:/data/brick2/${volume} \
+        \
+        ${server}:/data/brick3/${volume} \
+        \
+        ${server}:/data/brick4/${volume} \
+        \
+        ${server}:/data/brick5/${volume} \
+        \
+        ${server}:/data/brick6/${volume} \
+        \
+        ${server}:/data/brick7/${volume} \
+        \
+        ${server}:/data/brick8/${volume} \
+        \
+        ${server}:/data/brick9/${volume} \
+        \
+        ${server}:/data/brick10/${volume} \
+        \
+        ${server}:/data/brick11/${volume}
diff --git a/source/gluster/deploy/centos/add_centos7_gluster11_server/add_gluster_error b/source/gluster/deploy/centos/add_centos7_gluster11_server/add_gluster_error
@@ -0,0 +1,3 @@
+volume add-brick: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
+Bricks should be on different nodes to have best fault tolerant configuration.
+Use 'force' at the end of the command if you want to override this behavior.
diff --git a/source/gluster/deploy/centos/add_centos7_gluster11_server/gluster_peer_probe b/source/gluster/deploy/centos/add_centos7_gluster11_server/gluster_peer_probe
@@ -0,0 +1,3 @@
+server=192.168.1.7
+
+gluster peer probe ${server}
diff --git a/source/gluster/deploy/centos/add_centos7_gluster11_server/gluster_peer_status_output b/source/gluster/deploy/centos/add_centos7_gluster11_server/gluster_peer_status_output
@@ -0,0 +1,25 @@
+Number of Peers: 6
+
+Hostname: 192.168.1.2
+Uuid: c664761a-5973-4e2e-8506-9c142c657297
+State: Peer in Cluster (Connected)
+
+Hostname: 192.168.1.3
+Uuid: 901b8027-5eab-4f6b-8cf4-aafa4463ca13
+State: Peer in Cluster (Connected)
+
+Hostname: 192.168.1.4
+Uuid: 5ff667dd-5f45-4daf-900e-913e78e52297
+State: Peer in Cluster (Connected)
+
+Hostname: 192.168.1.5
+Uuid: ebd1d002-0719-4704-a59d-b4e8b3b28c29
+State: Peer in Cluster (Connected)
+
+Hostname: 192.168.1.6
+Uuid: 1f958e31-2d55-4904-815a-89f6ade360fe
+State: Peer in Cluster (Connected)
+
+Hostname: 192.168.1.7
+Uuid: a023c435-097c-411b-9d50-1e84629b9673
+State: Peer in Cluster (Connected)
diff --git a/source/gluster/deploy/centos/add_centos7_gluster11_server/gluster_volume_info_output b/source/gluster/deploy/centos/add_centos7_gluster11_server/gluster_volume_info_output
@@ -0,0 +1,45 @@
+Volume Name: backup
+Type: Distributed-Replicate
+Volume ID: 9ff7cdb3-abf0-4e33-8293-aae69c28b8d9
+Status: Started
+Snapshot Count: 0
+Number of Bricks: 24 x 3 = 72
+Transport-type: tcp
+Bricks:
+Brick1: 192.168.1.1:/data/brick0/backup
+Brick2: 192.168.1.2:/data/brick0/backup
+Brick3: 192.168.1.3:/data/brick0/backup
+Brick4: 192.168.1.4:/data/brick0/backup
+Brick5: 192.168.1.5:/data/brick0/backup
+Brick6: 192.168.1.6:/data/brick0/backup
+Brick7: 192.168.1.1:/data/brick1/backup
+Brick8: 192.168.1.2:/data/brick1/backup
+Brick9: 192.168.1.3:/data/brick1/backup
+Brick10: 192.168.1.4:/data/brick1/backup
+Brick11: 192.168.1.5:/data/brick1/backup
+Brick12: 192.168.1.6:/data/brick1/backup
+...
+Brick67: 192.168.1.1:/data/brick11/backup
+Brick68: 192.168.1.2:/data/brick11/backup
+Brick69: 192.168.1.3:/data/brick11/backup
+Brick70: 192.168.1.4:/data/brick11/backup
+Brick71: 192.168.1.5:/data/brick11/backup
+Brick72: 192.168.1.6:/data/brick11/backup
+Brick73: 192.168.1.7:/data/brick0/backup
+Brick74: 192.168.1.7:/data/brick1/backup
+Brick75: 192.168.1.7:/data/brick2/backup
+Brick76: 192.168.1.7:/data/brick3/backup
+Brick77: 192.168.1.7:/data/brick4/backup
+Brick78: 192.168.1.7:/data/brick5/backup
+Brick79: 192.168.1.7:/data/brick6/backup
+Brick80: 192.168.1.7:/data/brick7/backup
+Brick81: 192.168.1.7:/data/brick8/backup
+Brick82: 192.168.1.7:/data/brick9/backup
+Brick83: 192.168.1.7:/data/brick10/backup
+Brick84: 192.168.1.7:/data/brick11/backup
+Options Reconfigured:
+cluster.granular-entry-heal: on
+storage.fips-mode-rchecksum: on
+transport.address-family: inet
+nfs.disable: on
+performance.client-io-threads: off
diff --git a/source/gluster/deploy/centos/deploy_centos7_gluster11.rst b/source/gluster/deploy/centos/deploy_centos7_gluster11.rst
@@ -89,6 +89,14 @@ CentOS 7 部署Gluster 11
    :language: bash
    :caption: ``create_gluster`` 脚本，传递卷名作为参数就可以创建 ``replica 3`` 的分布式卷
 
+.. note::
+
+   当 ``brick`` 数量是 ``replica`` 的整数倍(2倍或更多倍)时， :ref:`distributed_replicated_glusterfs_volume` 自动创建，能够同时获得高可用和高性能。但是对 ``brick`` 的排列有要求: 先 ``replica`` 后 ``distribute`` 。
+
+   所以为了能将数据分布到不同服务器上，我这里采用了特定的排列顺序: ``A:0,B:0,C:0,A:1,B:1,C:1,A:2,B2,C2...`` 以便让 ``replicas 3`` 能够精确分布到不同服务器上。
+
+   这种部署方式有利有弊: :ref:`best_practices_for_gluster` 我会详细探讨
+
 - 将脚本加上执行权限::
 
    chmod 755 create_gluster

diff --git a/source/gluster/deploy/centos/index.rst b/source/gluster/deploy/centos/index.rst
@@ -12,6 +12,7 @@ CentOS平台GlusterFS部署
    download_gluster_rpm_createrepo.rst
    gluster11_rpm_createrepo.rst
    deploy_centos7_gluster11.rst
+   add_centos7_gluster11_server.rst
 
 .. only::  subproject and html
 

diff --git a/source/gluster/deploy/gluster_mount_options_multi_volfile_servers.rst b/source/gluster/deploy/gluster_mount_options_multi_volfile_servers.rst
@@ -0,0 +1,26 @@
+.. _gluster_mount_options_multi_volfile_servers:
+
+==================================================
+配置GlusterFS客户端挂载指定多个volfile-servers
+==================================================
+
+在GlusterFS的客户端挂载服务器输出的卷时，可以指定多个卷文件服务器(客户端将从卷文件服务器获取挂载卷的配置信息)。早期的GlusterFS提供了配置 ``/etc/fstab`` 类似如下:
+
+.. literalinclude:: centos/deploy_centos7_gluster11/gluster_fuse_fstab
+   :caption: GlusterFS客户端的 ``/etc/fstab``
+
+此时只有主、备 **两个** 卷文件服务器
+
+新版本GlusterFS已经支持更多的 ``volfile-server`` ，甚至可以把所有GlusterFS服务器端IP加入，GlusterFS会在主volfile-server宕机情况下，尝试依次使用 backup-volfile-servers用于挂载卷，直到挂载成功:
+
+.. literalinclude:: gluster_mount_options_multi_volfile_servers/gluster_fuse_fstab
+   :caption: 配置更多的 ``backup-volfile-servers`` 提高可用性
+
+.. note::
+
+   早期版本的 ``backupvolfile-server`` 已经被改为 ``backup-volfile-servers`` ，一定要注意，否则不能指定多个后备volfile服务器!!!
+
+参考
+=====
+
+- `Red Hat Gluster Storage > 3.4 > Administration Guide > Chapter 6. Creating Access to Volumes #6.1.3.1. Mount Commands and Options <https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/chap-accessing_data_-_setting_up_clients#Mount_Commands_and_Options>`_
diff --git a/source/gluster/deploy/gluster_mount_options_multi_volfile_servers/gluster_fuse_fstab b/source/gluster/deploy/gluster_mount_options_multi_volfile_servers/gluster_fuse_fstab
@@ -0,0 +1 @@
+192.168.1.1:/backup  /data/backup  glusterfs    defaults,_netdev,direct-io-mode=enable,backup-volfile-servers=192.168.1.2:192.168.1.3:192.168.1.4:192.168.1.5    0    0
diff --git a/source/gluster/deploy/index.rst b/source/gluster/deploy/index.rst
@@ -15,6 +15,7 @@ GlusterFS部署实践，将不断完善和改进，所以最终方案会和最
    suse/index
    centos/index
    ubuntu/index
+   gluster_mount_options_multi_volfile_servers.rst
 
 .. only::  subproject and html
 

diff --git a/source/gluster/index.rst b/source/gluster/index.rst
@@ -9,6 +9,7 @@ Gluster Atlas
 
    introduce_gluster.rst
    gluster_vs_ceph.rst
+   best_practices_for_gluster/index
    startup/index
    deploy/index
    build/index

diff --git a/source/gluster/startup/gluster_architecture.rst b/source/gluster/startup/gluster_architecture.rst
@@ -34,9 +34,10 @@ Distributed GlusterFS Volume是默认创建的GlusterFS卷，也就是如果不
 
    gluster volume create test-volume server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4
 
-然后就可以检查卷信息::
+然后就可以检查卷信息:
 
-   gluster volume info
+.. literalinclude:: gluster_architecture/gluster_volume_info
+   :caption: 执行 ``gluster volume info`` 可以检查卷信息
 
 显示信息类似::
 
@@ -72,6 +73,8 @@ GlusterFS复制卷解决了纯分布式卷的数据丢失风险。在所有的br
    gluster volume create test-volume replica 3 transport tcp \
          server1:/exp1 server2:/exp2 server3:/exp3
 
+.. _distributed_replicated_glusterfs_volume:
+
 分布式复制GlusterFS卷(Distributed Replicated GlusterFS Volume)
 ---------------------------------------------------------------
 

diff --git a/source/gluster/startup/gluster_architecture/gluster_volume_info b/source/gluster/startup/gluster_architecture/gluster_volume_info
@@ -0,0 +1 @@
+gluster volume info
diff --git a/source/kubernetes/monitor/skywalking/index.rst b/source/kubernetes/monitor/skywalking/index.rst
@@ -8,4 +8,5 @@ Apache SkyWalking
    :maxdepth: 1
 
    intro_skywalking.rst
+   skywalking_opensource_talk.rst
 
diff --git a/source/kubernetes/monitor/skywalking/intro_skywalking.rst b/source/kubernetes/monitor/skywalking/intro_skywalking.rst
@@ -7,11 +7,12 @@ Apache SkyWalking简介
 起源
 =======
 
-`Apache SkyWalking开源项目 <https://skywalking.apache.org/>`_
+`Apache SkyWalking开源项目 <https://skywalking.apache.org/>`_ 是吴昊开发并开源的APM系统。最初在华为工作时开发，并得到华为的资源推广。2017年SkyWalking加入Apache基金会，开源后得到了社区支持以及很好的推广，所以目前已经成为Apache顶级开源项目。
 
 参考
 =====
 
 - `Skywalking 中文资料 <https://skywalking.apache.org/zh/>`_ (京东有电子书 `Apache SkyWalking实战 <https://e.jd.com/30640502.html>`_ )
 - `吴晟：SkyWalking 与 Apache 软件基金会的那些事 | DEV. Together 2021 中国开发者生态峰会 <https://developer.aliyun.com/article/805796>`_
-- `Apache 首位华人董事吴晟谈开源：我对中国开源短期内是消极的 <https://developer.baidu.com/article/detail.html?id=294099>`_
+- `Apache 首位华人董事吴晟谈开源：我对中国开源短期内是消极的 <https://developer.baidu.com/article/detail.html?id=294099>`_ 吴昊的演讲介绍了SkyWalking的开源成功的原因，其选择的方式方法非常重要;此外对开源运作的不同立场、差异，应该怎么参与开源社区协作，非常推荐阅读( 我的 :ref:`skywalking_opensource_talk` )
+- `吴晟：程序员不当 CTO，也能过上理想的生活 |TGO 专访 <https://www.infoq.cn/article/blkqz6i3as3kjt5izqoo>`_ InfoQ采访，介绍了吴昊开发SkyWalking的经历， "确保你的时间是有价值的"