Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
huataihuang committed Aug 31, 2023
1 parent d2fb2f4 commit 35a75fc
Show file tree
Hide file tree
Showing 18 changed files with 231 additions and 8 deletions.
82 changes: 82 additions & 0 deletions source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,88 @@
$ df -h | grep vdb1
/dev/vdb1 50G 8.2G 42G 17% /var/lib/containerd

离线扩展Ceph RBD磁盘vdb1
==============================

:ref:`install_kubeflow_single_command` 遇到磁盘空间不足导致 :ref:`node_pressure_eviction` ,所以我将节点依次关闭:

- 扩容 ``/dev/vbd1`` 磁盘空间
- 将 ``vbd1`` 挂载目录从 ``/var/lib/docker`` 改为 ``/var/lib/containerd`` (原因是 :ref:`y-k8s` 采用 :ref:`kubespray` 部署,实际上使用的 :ref:`containerd` )

案例以 ``z-k8s-n9`` 为例

虚拟磁盘添加到维护虚拟机
-------------------------

- 由于 ``z-k8s-n9`` 是虚拟机,磁盘是 :ref:`ceph_rbd` ,首先检查虚拟磁盘的 :ref:`libvirt` 配置:

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/rbd.xml
:language: xml
:caption: 需要维护服务器rbd配置

- 我使用 ``z-dev`` 虚拟机来加载这两个需要维护的磁盘,在 ``z-dev`` 的虚拟机上,原先的 ``vda`` 配置如下:

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/z-dev_vda.xml
:language: xml
:caption: 用于运维的 ``z-dev`` 磁盘 ``vda``

- 将上述 ``z-k8s-n9`` 的 :ref:`ceph_rbd` 配置添加到 ``z-dev`` 虚拟机,不过需要修改2个地方:

- 磁盘 ``target`` 命名需要从 ``vda`` 和 ``vdb`` 修改为 ``vdb`` 和 ``vdc``
- 删除虚拟磁盘 ``<address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>`` 类似的这行配置,让 :ref:`libvirt` 自动决定配置(否则容易冲突)

- 启动 ``z-dev`` 之后检查 ``fdisk -l`` 输入如下:

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/vdb_vdc
:caption: 挂载的 ``rbd`` 磁盘 ``vdb`` 和 ``vdc``
:emphasize-lines: 10,21

- 上述2个磁盘为需要调整磁盘,其中 ``vdc`` 先扩展到 100G (方法同上):

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/rbd_ls
:language: bash
:caption: 执行 rbd ls 命令检查存储池中rbd磁盘

需要扩容的磁盘如下:

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/rbd_ls_output
:caption: 执行 rbd ls 命令检查存储池中rbd磁盘,需要扩容的磁盘

- RBD调整磁盘大小到100GB ( 1024x100=102400 ),并且 ``virsh blockresize`` 刷新虚拟机磁盘:

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/rbd_resize_virsh_blockresize_100g
:language: bash
:caption: rbd resize调整RBD块设备镜像大小, virsh blockresize调整虚拟机磁盘大小,100g

- 完成后在虚拟机内部检查 ``fdisk -l`` 可以看到磁盘扩展到100G:

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/vdc_100g
:caption: ``rbd resize`` 和 ``virsh blockresize`` 之后在虚拟机内部可以看到扩展后的虚拟机磁盘达到100G
:emphasize-lines: 1

- 在虚拟机内部重新创建文件系统( :ref:`parted` 重建GPT分区,并且构建 :ref:`xfs` ):

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/vdc_xfs
:caption: 在 ``vdc`` 上构建 :ref:`xfs`

- 挂载磁盘:

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/mount_vdb_vdc
:caption: 挂载 ``vdb`` 和 ``vdc`` 的分区,准备数据迁移

现在需要迁移的数据::

/vdb2/var/lib/containerd => /vdc2 (这个磁盘后续将挂载为目标主机的 /var/lib/containerd)

- 数据迁移:

.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/migrate_containerd
:caption: 数据迁移

- 修改 ``/vdb2/etc/fstab`` (这个是系统磁盘上挂载磁盘配置)::

/dev/vdb1 /var/lib/containerd xfs defaults,quota,gquota,prjquota 0 1

参考
======

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# 原始目录重命名,创建空目录方便后在运行主机挂载 /dev/vdb1
mv /vdb2/var/lib/containerd /vdb2/var/lib/containerd.old
mkdir /vdb2/var/lib/containerd

# 使用 tar 命令同步数据
# (cd /vdb2/var/lib/containerd.old && tar cf .)|(cd /vdc1 && tar xf -)

#使用 rsync 同步数据
rsync -a /vdb2/var/lib/containerd.old/ /vdc1

Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mkdir /vdb2 /vdc1
mount /dev/vdb2 /vdb2
mount /dev/vdc1 /vdc1
26 changes: 26 additions & 0 deletions source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/rbd.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<auth username='libvirt'>
<secret type='ceph' uuid='3f203352-fcfc-4329-b870-34783e13493a'/>
</auth>
<source protocol='rbd' name='libvirt-pool/z-k8s-n-9'>
<host name='192.168.6.204' port='6789'/>
<host name='192.168.6.205' port='6789'/>
<host name='192.168.6.206' port='6789'/>
</source>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<auth username='libvirt'>
<secret type='ceph' uuid='3f203352-fcfc-4329-b870-34783e13493a'/>
</auth>
<source protocol='rbd' name='libvirt-pool/z-k8s-n-9.docker'>
<host name='192.168.6.204' port='6789'/>
<host name='192.168.6.205' port='6789'/>
<host name='192.168.6.206' port='6789'/>
</source>
<target dev='vdb' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
</disk>
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
NAME SIZE PARENT FMT PROT LOCK
...
z-k8s-n-9 32 GiB 2
z-k8s-n-9.docker 50 GiB 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
rbd resize --size 102400 libvirt-pool/z-k8s-n-9.docker
virsh blockresize --domain z-dev --path vdc --size 100G
21 changes: 21 additions & 0 deletions source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/vdb_vdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Disk /dev/vdb: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: D9ADB788-0FE1-45C3-80C5-B412A3C4AB19

Device Start End Sectors Size Type
/dev/vdb1 2048 499711 497664 243M EFI System
/dev/vdb2 499712 67108830 66609119 31.8G Linux filesystem


Disk /dev/vdc: 50 GiB, 53687091200 bytes, 104857600 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 7B582F7C-AC2D-4D04-B600-35743C17BD96

Device Start End Sectors Size Type
/dev/vdc1 2048 104857566 104855519 50G Linux filesystem
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Disk /dev/vdc: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 7B582F7C-AC2D-4D04-B600-35743C17BD96

Device Start End Sectors Size Type
/dev/vdc1 2048 104857566 104855519 50G Linux filesystem
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
parted -s /dev/vdc mklabel gpt
parted -s -a optimal /dev/vdc mkpart primary 0% 100%
parted -s /dev/vdc name 1 data
mkfs.xfs -n ftype=1 /dev/vdc1 -f
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none' io='native'/>
<source dev='/dev/vg-libvirt/z-dev'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</disk>
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ kubelet 使用各种参数来做出驱逐决定:
- 监控间隔



参考
======

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@

.. note::

我在 :ref:`y-k8s` 部署的虚拟机中采用了极小化的虚拟磁盘,遇到一个尴尬的问题就是 :ref:`node_pressure_eviction` ,也就是磁盘空间不足导致运行Pod被驱逐。在上述Pods检测就绪发现存在问题时,通过 :ref:`ceph_extend_rbd_drive_with_libvirt_xfs` 实现扩容解决
我在 :ref:`y-k8s` 部署的虚拟机中采用了极小化的虚拟磁盘,遇到一个尴尬的问题就是 :ref:`node_pressure_eviction` ,也就是磁盘空间不足导致运行Pod被驱逐。在上述Pods检测就绪发现存在问题时,通过 :ref:`ceph_extend_rbd_drive_with_libvirt_xfs` 实现扩容解决(离线扩展方式,并且将 ``/var/lib/docker`` 迁移到 ``/var/lib/containerd`` )

参考
=====
Expand Down
1 change: 1 addition & 0 deletions source/kvm/debug/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ KVM故障排查
:maxdepth: 1

dracut-initqueue_timeout.rst
qemu_vfio_connect_timeout.rst

.. only:: subproject and html

Expand Down
44 changes: 44 additions & 0 deletions source/kvm/debug/qemu_vfio_connect_timeout.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
.. _qemu_vfio_connect_timeout:

==========================
虚拟机启动访问vfio设备超时
==========================

今天在解决 :ref:`ceph_extend_rbd_drive_with_libvirt_xfs` 采用了将两个 :ref:`ceph_rbd` 设备连接到另外一个用于维护的虚拟机中处理磁盘扩容。但是,非常意外地发现,当我关闭了维护的虚拟机 ``z-dev`` 之后,想要恢复原先使用 :ref:`ceph_rbd` 的 ``y-k8s-n-1`` 却出现vfio设备连接超时:

.. literalinclude:: qemu_vfio_connect_timeout/vfio_used_error
:caption: 启动虚拟机时vfio设备报告已被使用,连接超时

这里的报错繁忙设备 ``3eb9d560-0b31-11ee-91a9-bb28039c61eb`` 在 ``virsh dumpxml y-k8s-n-1`` 可以看到其实就是 :ref:`vgpu_quickstart` 配置的2个 :ref:`vgpu` 设置之一:

.. literalinclude:: ../vgpu/vgpu_quickstart/vgpu_create_output_1
:language: bash
:caption: 执行 ``vgpu_create`` 脚本创建2个 ``P40-12C`` :ref:`vgpu` 输出信息第 **** 个vgpu设备

这是一个 ``mdev`` 设备 ( `VFIO Mediated devices <https://docs.kernel.org/driver-api/vfio-mediated-device.html>`_ 设备)

回顾 :ref:`install_vgpu_manager` 笔记可以看到,需要首先确保 ``nvidia-vgpu-mgr.service`` 正常运行,也就是说,必须先 :ref:`vgpu_unlock`

- 检查 ``nvidia-vgpu-mgr.service`` 状态:

.. literalinclude:: ../vgpu/install_vgpu_manager/systemctl_staus_nvidia-vgpu-mgr
:language: bash
:caption: 检查 ``nvidia-vgpu-mgr`` 服务状态

果然,再次发现这个服务启动失败...回到了老问题: :ref:`vgpu_unlock` 失效了:

查询 ``vgpu`` :

.. literalinclude:: ../vgpu/install_vgpu_manager/nvidia-smi_vgpu_q
:language: bash
:caption: ``nvidia-smi vgpu -q`` 查询vGPU

输出显示只激活了 ``0`` 个vGPU:

.. literalinclude:: ../vgpu/install_vgpu_manager/nvidia-smi_vgpu_q_output
:language: bash
:caption: ``nvidia-smi vgpu -q`` 查询vGPU显示只有 ``0`` 个vGPU

我想起来了, :ref:`vgpu_unlock` 需要使用 :ref:`dkms` 模块方式安装 :ref:`vgpu` 驱动。最近依次我升级了内核,内核升级时会重新编译安装 :ref:`vgpu` 模块。我重新检查一遍流程,发现原先修订的过程都正确,但是会不会最近升级的内核支持不稳定呢?

我重新编译了一次 :ref:`vgpu_unlock` (似乎不必),重启服务器
5 changes: 5 additions & 0 deletions source/kvm/debug/qemu_vfio_connect_timeout/vfio_used_error
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
error: Failed to start domain 'y-k8s-n-1'
error: internal error: qemu unexpectedly closed the monitor: 2023-08-31T13:35:04.132854Z qemu-system-x86_64:
-device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/3eb9d560-0b31-11ee-91a9-bb28039c61eb,display=off,bus=pci.7,addr=0x0:
vfio 3eb9d560-0b31-11ee-91a9-bb28039c61eb: error getting device from group 123: Connection timed out
Verify all devices in group 123 are bound to vfio-<bus> or pci-stub and not already in use
9 changes: 7 additions & 2 deletions source/kvm/vgpu/vgpu_quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,14 @@ NVIDIA License Server 安装是一个独立步骤

将执行脚本的输出信息:

.. literalinclude:: vgpu_quickstart/vgpu_create_output
.. literalinclude:: vgpu_quickstart/vgpu_create_output_1
:language: bash
:caption: 执行 ``vgpu_create`` 脚本创建2个 ``P40-12C`` :ref:`vgpu` 输出信息(用于配置VM)
:caption: 执行 ``vgpu_create`` 脚本创建2个 ``P40-12C`` :ref:`vgpu` 输出信息第 **** 个vgpu设备

.. literalinclude:: vgpu_quickstart/vgpu_create_output_2
:language: bash
:caption: 执行 ``vgpu_create`` 脚本创建2个 ``P40-12C`` :ref:`vgpu` 输出信息第 **** 个vgpu设备


(对应2个vGPU)分别添加到 **2个** 虚拟机 ``y-k8s-n-1`` 和 ``y-k8s-n-2`` 中,然后启动虚拟机

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,3 @@
<address uuid='3eb9d560-0b31-11ee-91a9-bb28039c61eb'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='3eb9d718-0b31-11ee-91aa-2b17f51ee12d'/>
</source>
</hostdev>
5 changes: 5 additions & 0 deletions source/kvm/vgpu/vgpu_quickstart/vgpu_create_output_2
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='3eb9d718-0b31-11ee-91aa-2b17f51ee12d'/>
</source>
</hostdev>

0 comments on commit 35a75fc

Please sign in to comment.