Skip to content

Commit

Permalink
skywalking and gluster
Browse files Browse the repository at this point in the history
  • Loading branch information
huataihuang committed Aug 1, 2023
1 parent 9169cb8 commit fab4bce
Show file tree
Hide file tree
Showing 26 changed files with 388 additions and 8 deletions.
5 changes: 5 additions & 0 deletions source/clang/parallel_make.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,8 @@
.. note::

使用 ``-j`` 参数并发在多处理器服务器上运行编译,可以看到 ``real`` 时间远小于 ``system`` 和 ``user`` 累加的时间。这是因为 ``system`` 和 ``user`` 消耗时间是在多个处理器上累加起来的,多个并发实际完成时间大大缩短 ( ``real`` 时间 )。详见 :ref:`time`

参考
======

- `Parallel make: set -j8 as the default option <https://stackoverflow.com/questions/10567890/parallel-make-set-j8-as-the-default-option>`_
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.. _gluster_underlay_filesystem:

=======================
Gluster存储底层文件系统
=======================

GlusterFS底层需要采用操作系统提供的文件系统,有以下推荐方案:

- 直接使用 :ref:`xfs`
- 结合 :ref:`linux_lvm` 使用 :ref:`xfs`
- 采用 :ref:`zfs` 同时兼具卷管理和文件系统功能

待续...
18 changes: 18 additions & 0 deletions source/gluster/best_practices_for_gluster/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
.. _best_practices_for_gluster:

=================================
Gluster存储最佳实践
=================================

.. toctree::
:maxdepth: 1

think_best_practices_for_gluster.rst
gluster_underlay_filesystem.rst

.. only:: subproject and html

Indices
=======

* :ref:`genindex`
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.. _think_best_practices_for_gluster:

===============================
Gluster存储最佳实践的思考
===============================

最近 :ref:`deploy_centos7_suse15_suse12_gluster11` 沿用了我之前的部署方案,但是在 :ref:`add_centos7_gluster11_server` 突然触发我意识到直接使用裸盘的利弊,特别是阅读了 `Gluster Storage for Oracle Linux: Best Practices and Sizing Guideline <https://www.oracle.com/a/ocom/docs/linux/gluster-storage-linux-best-practices.pdf>`_ 之后,我觉得需要进一步梳理方案,对比和分析以总结 :ref:`best_practices_for_gluster` ,实现技术上的迭代进步。

参考
=====

- `Gluster Storage for Oracle Linux: Best Practices and Sizing Guideline <https://www.oracle.com/a/ocom/docs/linux/gluster-storage-linux-best-practices.pdf>`_
85 changes: 85 additions & 0 deletions source/gluster/deploy/centos/add_centos7_gluster11_server.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
.. _add_centos7_gluster11_server:

=========================================
CentOS 7上部署的Gluster 11 集群添加服务器
=========================================

在完成 :ref:`deploy_centos7_gluster11` 后,我需要为集群增加服务器节点,扩容服务器。

准备工作
===========

- :ref:`build_glusterfs_11_for_centos_7`
- :ref:`gluster11_rpm_createrepo` **添加仓库配置**

安装和启动服务
===============

- 安装方法同 :ref:`deploy_centos7_gluster6` :

.. literalinclude:: deploy_centos7_gluster11/yum_install_glusterfs-server
:caption: 在CentOS上安装GlusterFS

- 启动GlusterFS管理服务:

.. literalinclude:: deploy_centos7_gluster11/systemctl_enable_glusterd
:caption: 启动和激活GlusterFS管理服务

- 检查 ``glusterd`` 服务状态:

.. literalinclude:: deploy_centos7_gluster11/systemctl_status_glusterd
:caption: 检查GlusterFS管理服务

添加GlusterFS新节点
======================

- 配置gluster配对 **只需要在一台服务器上执行一次** (可以在第一台服务器上) ,这里添加的服务器是我们集群的第7台服务器,不过由于只添加一台不需要区分,所以所以我命名为 ``server`` :

.. literalinclude:: add_centos7_gluster11_server/gluster_peer_probe
:language: bash
:caption: 在 **一台** 服务器上 **执行一次** ``gluster peer probe`` 添加这台新服务器

- 完成后检查 ``gluster peer`` 状态:

.. literalinclude:: deploy_centos7_gluster11/gluster_peer_status
:caption: 在 **一台** 服务器上执行 ``gluster peer status`` 检查新添加节点是否正确连接到集群

.. literalinclude:: add_centos7_gluster11_server/gluster_peer_status_output
:caption: ``gluster peer status`` 输出显示 ``peer`` 是 ``Connected`` 状态则表明构建成功
:emphasize-lines: 23-25

- 依然是采用 :ref:`deploy_centos7_gluster11` 的卷,所以扩展(也就是 ``add_brick`` )采用如下简单脚本:

.. literalinclude:: add_centos7_gluster11_server/add_gluster
:language: bash
:caption: ``create_gluster`` 脚本,传递卷名作为参数就可以 **扩展** ( ``add_brick`` ) 现有的``replica 3`` 分布式卷

这里有一个报错提示:

.. literalinclude:: add_centos7_gluster11_server/add_gluster_error
:language: bash
:caption: ``add_brick`` 提示为一个replica卷添加的多个brick位于相同服务器上,不是优化设置

我验证了一下,确实可以在命令最后加上 ``force`` 关键字完成 ``add_brick`` ,但是给我带来如下困扰

- 新添加的 ``brick`` 全部排在 ``bricks`` 列表的最后:

.. literalinclude:: ../../startup/gluster_architecture/gluster_volume_info
:caption: 执行 ``gluster volume info`` 可以检查卷信息

可以看到最后添加的 ``192.168.1.7`` 所有的bricks:

.. literalinclude:: add_centos7_gluster11_server/gluster_volume_info_output
:caption: 执行 ``gluster volume info`` 看到新增加的服务器上所有bricks都是列在最后
:emphasize-lines: 26-39

这里扩容的新节点有一个严重的问题,所有 ``bricks`` 都位于一台服务器上,会使得一部分hash到 ``brick73`` 到 ``brick84`` 的数据全部落在一台服务器上: :ref:`gluster_underlay_filesystem` 采用裸盘 :ref:`xfs` 带来的限制就是服务器节点不可增加或缩减

.. note::

我将在 :ref:`best_practices_for_gluster` 详细探讨我的实践方案以及总结改进

参考
======

- `rackspace docs: Add and Remove GlusterFS Servers <https://docs.rackspace.com/docs/add-and-remove-glusterfs-servers>`_
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
volume=$1
server=192.168.1.7

gluster volume add-brick ${volume} replica 3 \
${server}:/data/brick0/${volume} \
\
${server}:/data/brick1/${volume} \
\
${server}:/data/brick2/${volume} \
\
${server}:/data/brick3/${volume} \
\
${server}:/data/brick4/${volume} \
\
${server}:/data/brick5/${volume} \
\
${server}:/data/brick6/${volume} \
\
${server}:/data/brick7/${volume} \
\
${server}:/data/brick8/${volume} \
\
${server}:/data/brick9/${volume} \
\
${server}:/data/brick10/${volume} \
\
${server}:/data/brick11/${volume}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
volume add-brick: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal.
Bricks should be on different nodes to have best fault tolerant configuration.
Use 'force' at the end of the command if you want to override this behavior.
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
server=192.168.1.7

gluster peer probe ${server}
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Number of Peers: 6

Hostname: 192.168.1.2
Uuid: c664761a-5973-4e2e-8506-9c142c657297
State: Peer in Cluster (Connected)

Hostname: 192.168.1.3
Uuid: 901b8027-5eab-4f6b-8cf4-aafa4463ca13
State: Peer in Cluster (Connected)

Hostname: 192.168.1.4
Uuid: 5ff667dd-5f45-4daf-900e-913e78e52297
State: Peer in Cluster (Connected)

Hostname: 192.168.1.5
Uuid: ebd1d002-0719-4704-a59d-b4e8b3b28c29
State: Peer in Cluster (Connected)

Hostname: 192.168.1.6
Uuid: 1f958e31-2d55-4904-815a-89f6ade360fe
State: Peer in Cluster (Connected)

Hostname: 192.168.1.7
Uuid: a023c435-097c-411b-9d50-1e84629b9673
State: Peer in Cluster (Connected)
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Volume Name: backup
Type: Distributed-Replicate
Volume ID: 9ff7cdb3-abf0-4e33-8293-aae69c28b8d9
Status: Started
Snapshot Count: 0
Number of Bricks: 24 x 3 = 72
Transport-type: tcp
Bricks:
Brick1: 192.168.1.1:/data/brick0/backup
Brick2: 192.168.1.2:/data/brick0/backup
Brick3: 192.168.1.3:/data/brick0/backup
Brick4: 192.168.1.4:/data/brick0/backup
Brick5: 192.168.1.5:/data/brick0/backup
Brick6: 192.168.1.6:/data/brick0/backup
Brick7: 192.168.1.1:/data/brick1/backup
Brick8: 192.168.1.2:/data/brick1/backup
Brick9: 192.168.1.3:/data/brick1/backup
Brick10: 192.168.1.4:/data/brick1/backup
Brick11: 192.168.1.5:/data/brick1/backup
Brick12: 192.168.1.6:/data/brick1/backup
...
Brick67: 192.168.1.1:/data/brick11/backup
Brick68: 192.168.1.2:/data/brick11/backup
Brick69: 192.168.1.3:/data/brick11/backup
Brick70: 192.168.1.4:/data/brick11/backup
Brick71: 192.168.1.5:/data/brick11/backup
Brick72: 192.168.1.6:/data/brick11/backup
Brick73: 192.168.1.7:/data/brick0/backup
Brick74: 192.168.1.7:/data/brick1/backup
Brick75: 192.168.1.7:/data/brick2/backup
Brick76: 192.168.1.7:/data/brick3/backup
Brick77: 192.168.1.7:/data/brick4/backup
Brick78: 192.168.1.7:/data/brick5/backup
Brick79: 192.168.1.7:/data/brick6/backup
Brick80: 192.168.1.7:/data/brick7/backup
Brick81: 192.168.1.7:/data/brick8/backup
Brick82: 192.168.1.7:/data/brick9/backup
Brick83: 192.168.1.7:/data/brick10/backup
Brick84: 192.168.1.7:/data/brick11/backup
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
8 changes: 8 additions & 0 deletions source/gluster/deploy/centos/deploy_centos7_gluster11.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,14 @@ CentOS 7 部署Gluster 11
:language: bash
:caption: ``create_gluster`` 脚本,传递卷名作为参数就可以创建 ``replica 3`` 的分布式卷

.. note::

当 ``brick`` 数量是 ``replica`` 的整数倍(2倍或更多倍)时, :ref:`distributed_replicated_glusterfs_volume` 自动创建,能够同时获得高可用和高性能。但是对 ``brick`` 的排列有要求: 先 ``replica`` 后 ``distribute`` 。

所以为了能将数据分布到不同服务器上,我这里采用了特定的排列顺序: ``A:0,B:0,C:0,A:1,B:1,C:1,A:2,B2,C2...`` 以便让 ``replicas 3`` 能够精确分布到不同服务器上。

这种部署方式有利有弊: :ref:`best_practices_for_gluster` 我会详细探讨

- 将脚本加上执行权限::

chmod 755 create_gluster
Expand Down
1 change: 1 addition & 0 deletions source/gluster/deploy/centos/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ CentOS平台GlusterFS部署
download_gluster_rpm_createrepo.rst
gluster11_rpm_createrepo.rst
deploy_centos7_gluster11.rst
add_centos7_gluster11_server.rst

.. only:: subproject and html

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
.. _gluster_mount_options_multi_volfile_servers:

==================================================
配置GlusterFS客户端挂载指定多个volfile-servers
==================================================

在GlusterFS的客户端挂载服务器输出的卷时,可以指定多个卷文件服务器(客户端将从卷文件服务器获取挂载卷的配置信息)。早期的GlusterFS提供了配置 ``/etc/fstab`` 类似如下:

.. literalinclude:: centos/deploy_centos7_gluster11/gluster_fuse_fstab
:caption: GlusterFS客户端的 ``/etc/fstab``

此时只有主、备 **两个** 卷文件服务器

新版本GlusterFS已经支持更多的 ``volfile-server`` ,甚至可以把所有GlusterFS服务器端IP加入,GlusterFS会在主volfile-server宕机情况下,尝试依次使用 backup-volfile-servers用于挂载卷,直到挂载成功:

.. literalinclude:: gluster_mount_options_multi_volfile_servers/gluster_fuse_fstab
:caption: 配置更多的 ``backup-volfile-servers`` 提高可用性

.. note::

早期版本的 ``backupvolfile-server`` 已经被改为 ``backup-volfile-servers`` ,一定要注意,否则不能指定多个后备volfile服务器!!!

参考
=====

- `Red Hat Gluster Storage > 3.4 > Administration Guide > Chapter 6. Creating Access to Volumes #6.1.3.1. Mount Commands and Options <https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/chap-accessing_data_-_setting_up_clients#Mount_Commands_and_Options>`_
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
192.168.1.1:/backup /data/backup glusterfs defaults,_netdev,direct-io-mode=enable,backup-volfile-servers=192.168.1.2:192.168.1.3:192.168.1.4:192.168.1.5 0 0
1 change: 1 addition & 0 deletions source/gluster/deploy/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ GlusterFS部署实践,将不断完善和改进,所以最终方案会和最
suse/index
centos/index
ubuntu/index
gluster_mount_options_multi_volfile_servers.rst

.. only:: subproject and html

Expand Down
1 change: 1 addition & 0 deletions source/gluster/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Gluster Atlas

introduce_gluster.rst
gluster_vs_ceph.rst
best_practices_for_gluster/index
startup/index
deploy/index
build/index
Expand Down
7 changes: 5 additions & 2 deletions source/gluster/startup/gluster_architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,10 @@ Distributed GlusterFS Volume是默认创建的GlusterFS卷,也就是如果不

gluster volume create test-volume server1:/exp1 server2:/exp2 server3:/exp3 server4:/exp4

然后就可以检查卷信息::
然后就可以检查卷信息:

gluster volume info
.. literalinclude:: gluster_architecture/gluster_volume_info
:caption: 执行 ``gluster volume info`` 可以检查卷信息

显示信息类似::

Expand Down Expand Up @@ -72,6 +73,8 @@ GlusterFS复制卷解决了纯分布式卷的数据丢失风险。在所有的br
gluster volume create test-volume replica 3 transport tcp \
server1:/exp1 server2:/exp2 server3:/exp3

.. _distributed_replicated_glusterfs_volume:

分布式复制GlusterFS卷(Distributed Replicated GlusterFS Volume)
---------------------------------------------------------------

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
gluster volume info
1 change: 1 addition & 0 deletions source/kubernetes/monitor/skywalking/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ Apache SkyWalking
:maxdepth: 1

intro_skywalking.rst
skywalking_opensource_talk.rst

5 changes: 3 additions & 2 deletions source/kubernetes/monitor/skywalking/intro_skywalking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,12 @@ Apache SkyWalking简介
起源
=======

`Apache SkyWalking开源项目 <https://skywalking.apache.org/>`_
`Apache SkyWalking开源项目 <https://skywalking.apache.org/>`_ 是吴昊开发并开源的APM系统。最初在华为工作时开发,并得到华为的资源推广。2017年SkyWalking加入Apache基金会,开源后得到了社区支持以及很好的推广,所以目前已经成为Apache顶级开源项目。

参考
=====

- `Skywalking 中文资料 <https://skywalking.apache.org/zh/>`_ (京东有电子书 `Apache SkyWalking实战 <https://e.jd.com/30640502.html>`_ )
- `吴晟:SkyWalking 与 Apache 软件基金会的那些事 | DEV. Together 2021 中国开发者生态峰会 <https://developer.aliyun.com/article/805796>`_
- `Apache 首位华人董事吴晟谈开源:我对中国开源短期内是消极的 <https://developer.baidu.com/article/detail.html?id=294099>`_
- `Apache 首位华人董事吴晟谈开源:我对中国开源短期内是消极的 <https://developer.baidu.com/article/detail.html?id=294099>`_ 吴昊的演讲介绍了SkyWalking的开源成功的原因,其选择的方式方法非常重要;此外对开源运作的不同立场、差异,应该怎么参与开源社区协作,非常推荐阅读( 我的 :ref:`skywalking_opensource_talk` )
- `吴晟:程序员不当 CTO,也能过上理想的生活 |TGO 专访 <https://www.infoq.cn/article/blkqz6i3as3kjt5izqoo>`_ InfoQ采访,介绍了吴昊开发SkyWalking的经历, "确保你的时间是有价值的"

0 comments on commit fab4bce

Please sign in to comment.