update

huataihuang · Aug 31, 2023 · 35a75fc · 35a75fc
1 parent d2fb2f4
commit 35a75fc
Show file tree

Hide file tree

Showing 18 changed files with 231 additions and 8 deletions.
diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs.rst b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs.rst
@@ -109,6 +109,88 @@
    $ df -h | grep vdb1
    /dev/vdb1        50G  8.2G   42G  17% /var/lib/containerd
 
+离线扩展Ceph RBD磁盘vdb1
+==============================
+
+在 :ref:`install_kubeflow_single_command` 遇到磁盘空间不足导致 :ref:`node_pressure_eviction` ，所以我将节点依次关闭:
+
+- 扩容 ``/dev/vbd1`` 磁盘空间
+- 将 ``vbd1`` 挂载目录从 ``/var/lib/docker`` 改为 ``/var/lib/containerd`` (原因是 :ref:`y-k8s` 采用 :ref:`kubespray` 部署，实际上使用的 :ref:`containerd` )
+
+案例以 ``z-k8s-n9`` 为例
+
+虚拟磁盘添加到维护虚拟机
+-------------------------
+
+- 由于 ``z-k8s-n9`` 是虚拟机，磁盘是 :ref:`ceph_rbd` ，首先检查虚拟磁盘的 :ref:`libvirt` 配置:
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/rbd.xml
+   :language: xml
+   :caption: 需要维护服务器rbd配置
+
+- 我使用 ``z-dev`` 虚拟机来加载这两个需要维护的磁盘，在 ``z-dev`` 的虚拟机上，原先的 ``vda`` 配置如下:
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/z-dev_vda.xml
+   :language: xml
+   :caption: 用于运维的 ``z-dev`` 磁盘 ``vda``
+
+- 将上述 ``z-k8s-n9`` 的 :ref:`ceph_rbd` 配置添加到 ``z-dev`` 虚拟机，不过需要修改2个地方:
+
+  - 磁盘 ``target`` 命名需要从 ``vda`` 和 ``vdb`` 修改为 ``vdb`` 和 ``vdc``
+  - 删除虚拟磁盘 ``<address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>`` 类似的这行配置，让 :ref:`libvirt` 自动决定配置(否则容易冲突)
+
+- 启动 ``z-dev`` 之后检查 ``fdisk -l`` 输入如下:
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/vdb_vdc
+   :caption: 挂载的 ``rbd`` 磁盘 ``vdb`` 和 ``vdc``
+   :emphasize-lines: 10,21
+
+- 上述2个磁盘为需要调整磁盘，其中 ``vdc`` 先扩展到 100G (方法同上):
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/rbd_ls
+   :language: bash
+   :caption: 执行 rbd ls 命令检查存储池中rbd磁盘
+
+需要扩容的磁盘如下:
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/rbd_ls_output
+   :caption: 执行 rbd ls 命令检查存储池中rbd磁盘，需要扩容的磁盘
+
+- RBD调整磁盘大小到100GB ( 1024x100=102400 )，并且 ``virsh blockresize`` 刷新虚拟机磁盘:
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/rbd_resize_virsh_blockresize_100g
+   :language: bash
+   :caption: rbd resize调整RBD块设备镜像大小, virsh blockresize调整虚拟机磁盘大小，100g
+
+- 完成后在虚拟机内部检查 ``fdisk -l`` 可以看到磁盘扩展到100G:
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/vdc_100g
+   :caption: ``rbd resize`` 和 ``virsh blockresize`` 之后在虚拟机内部可以看到扩展后的虚拟机磁盘达到100G
+   :emphasize-lines: 1
+
+- 在虚拟机内部重新创建文件系统( :ref:`parted` 重建GPT分区，并且构建 :ref:`xfs` ):
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/vdc_xfs
+   :caption: 在 ``vdc`` 上构建 :ref:`xfs`
+
+- 挂载磁盘:
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/mount_vdb_vdc
+   :caption: 挂载 ``vdb`` 和 ``vdc`` 的分区，准备数据迁移
+
+现在需要迁移的数据::
+
+   /vdb2/var/lib/containerd  => /vdc2 (这个磁盘后续将挂载为目标主机的 /var/lib/containerd)
+
+- 数据迁移:
+
+.. literalinclude:: ceph_extend_rbd_drive_with_libvirt_xfs/migrate_containerd
+   :caption: 数据迁移
+
+- 修改 ``/vdb2/etc/fstab`` (这个是系统磁盘上挂载磁盘配置)::
+
+   /dev/vdb1    /var/lib/containerd    xfs   defaults,quota,gquota,prjquota 0 1
+
 参考
 ======
 

diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/migrate_containerd b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/migrate_containerd
@@ -0,0 +1,10 @@
+# 原始目录重命名，创建空目录方便后在运行主机挂载 /dev/vdb1
+mv /vdb2/var/lib/containerd /vdb2/var/lib/containerd.old
+mkdir /vdb2/var/lib/containerd
+
+# 使用 tar 命令同步数据
+# (cd /vdb2/var/lib/containerd.old && tar cf .)|(cd /vdc1 && tar xf -)
+
+#使用 rsync 同步数据
+rsync -a /vdb2/var/lib/containerd.old/ /vdc1
+
diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/mount_vdb_vdc b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/mount_vdb_vdc
@@ -0,0 +1,3 @@
+mkdir /vdb2 /vdc1
+mount /dev/vdb2 /vdb2
+mount /dev/vdc1 /vdc1
diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/rbd.xml b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/rbd.xml
@@ -0,0 +1,26 @@
+    <disk type='network' device='disk'>
+      <driver name='qemu' type='raw' cache='none' io='native'/>
+      <auth username='libvirt'>
+        <secret type='ceph' uuid='3f203352-fcfc-4329-b870-34783e13493a'/>
+      </auth>
+      <source protocol='rbd' name='libvirt-pool/z-k8s-n-9'>
+        <host name='192.168.6.204' port='6789'/>
+        <host name='192.168.6.205' port='6789'/>
+        <host name='192.168.6.206' port='6789'/>
+      </source>
+      <target dev='vda' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
+    </disk>
+    <disk type='network' device='disk'>
+      <driver name='qemu' type='raw' cache='none' io='native'/>
+      <auth username='libvirt'>
+        <secret type='ceph' uuid='3f203352-fcfc-4329-b870-34783e13493a'/>
+      </auth>
+      <source protocol='rbd' name='libvirt-pool/z-k8s-n-9.docker'>
+        <host name='192.168.6.204' port='6789'/>
+        <host name='192.168.6.205' port='6789'/>
+        <host name='192.168.6.206' port='6789'/>
+      </source>
+      <target dev='vdb' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
+    </disk>
diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/rbd_ls_output b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/rbd_ls_output
@@ -0,0 +1,4 @@
+NAME               SIZE     PARENT  FMT  PROT  LOCK
+...
+z-k8s-n-9           32 GiB            2
+z-k8s-n-9.docker    50 GiB            2
diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/rbd_resize_virsh_blockresize_100g b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/rbd_resize_virsh_blockresize_100g
@@ -0,0 +1,2 @@
+rbd resize --size 102400 libvirt-pool/z-k8s-n-9.docker
+virsh blockresize --domain z-dev --path vdc --size 100G
diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/vdb_vdc b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/vdb_vdc
@@ -0,0 +1,21 @@
+Disk /dev/vdb: 32 GiB, 34359738368 bytes, 67108864 sectors
+Units: sectors of 1 * 512 = 512 bytes
+Sector size (logical/physical): 512 bytes / 512 bytes
+I/O size (minimum/optimal): 512 bytes / 512 bytes
+Disklabel type: gpt
+Disk identifier: D9ADB788-0FE1-45C3-80C5-B412A3C4AB19
+
+Device      Start      End  Sectors  Size Type
+/dev/vdb1    2048   499711   497664  243M EFI System
+/dev/vdb2  499712 67108830 66609119 31.8G Linux filesystem
+
+
+Disk /dev/vdc: 50 GiB, 53687091200 bytes, 104857600 sectors
+Units: sectors of 1 * 512 = 512 bytes
+Sector size (logical/physical): 512 bytes / 512 bytes
+I/O size (minimum/optimal): 512 bytes / 512 bytes
+Disklabel type: gpt
+Disk identifier: 7B582F7C-AC2D-4D04-B600-35743C17BD96
+
+Device     Start       End   Sectors Size Type
+/dev/vdc1   2048 104857566 104855519  50G Linux filesystem
diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/vdc_100g b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/vdc_100g
@@ -0,0 +1,9 @@
+Disk /dev/vdc: 100 GiB, 107374182400 bytes, 209715200 sectors
+Units: sectors of 1 * 512 = 512 bytes
+Sector size (logical/physical): 512 bytes / 512 bytes
+I/O size (minimum/optimal): 512 bytes / 512 bytes
+Disklabel type: gpt
+Disk identifier: 7B582F7C-AC2D-4D04-B600-35743C17BD96
+
+Device     Start       End   Sectors Size Type
+/dev/vdc1   2048 104857566 104855519  50G Linux filesystem
diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/vdc_xfs b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/vdc_xfs
@@ -0,0 +1,4 @@
+parted -s /dev/vdc mklabel gpt
+parted -s -a optimal /dev/vdc mkpart primary 0% 100%
+parted -s /dev/vdc name 1 data
+mkfs.xfs -n ftype=1 /dev/vdc1 -f
diff --git a/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/z-dev_vda.xml b/source/ceph/rbd/ceph_extend_rbd_drive_with_libvirt_xfs/z-dev_vda.xml
@@ -0,0 +1,6 @@
+    <disk type='block' device='disk'>
+      <driver name='qemu' type='raw' cache='none' io='native'/>
+      <source dev='/dev/vg-libvirt/z-dev'/>
+      <target dev='vda' bus='virtio'/>
+      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
+    </disk>
diff --git a/source/kubernetes/concepts/scheduling/node_pressure_eviction.rst b/source/kubernetes/concepts/scheduling/node_pressure_eviction.rst
@@ -13,6 +13,7 @@ kubelet 使用各种参数来做出驱逐决定：
 - 监控间隔
 
 
+
 参考
 ======
 

diff --git a/source/kubernetes/kubeflow/install_kubeflow_single_command.rst b/source/kubernetes/kubeflow/install_kubeflow_single_command.rst
@@ -43,7 +43,7 @@
 
 .. note::
 
-   我在 :ref:`y-k8s` 部署的虚拟机中采用了极小化的虚拟磁盘，遇到一个尴尬的问题就是 :ref:`node_pressure_eviction` ，也就是磁盘空间不足导致运行Pod被驱逐。在上述Pods检测就绪发现存在问题时，通过 :ref:`ceph_extend_rbd_drive_with_libvirt_xfs` 实现扩容解决
+   我在 :ref:`y-k8s` 部署的虚拟机中采用了极小化的虚拟磁盘，遇到一个尴尬的问题就是 :ref:`node_pressure_eviction` ，也就是磁盘空间不足导致运行Pod被驱逐。在上述Pods检测就绪发现存在问题时，通过 :ref:`ceph_extend_rbd_drive_with_libvirt_xfs` 实现扩容解决(离线扩展方式，并且将 ``/var/lib/docker`` 迁移到 ``/var/lib/containerd`` )
 
 参考
 =====

diff --git a/source/kvm/debug/index.rst b/source/kvm/debug/index.rst
@@ -8,6 +8,7 @@ KVM故障排查
    :maxdepth: 1
 
    dracut-initqueue_timeout.rst
+   qemu_vfio_connect_timeout.rst
 
 .. only::  subproject and html
 

diff --git a/source/kvm/debug/qemu_vfio_connect_timeout.rst b/source/kvm/debug/qemu_vfio_connect_timeout.rst
@@ -0,0 +1,44 @@
+.. _qemu_vfio_connect_timeout:
+
+==========================
+虚拟机启动访问vfio设备超时
+==========================
+
+今天在解决 :ref:`ceph_extend_rbd_drive_with_libvirt_xfs` 采用了将两个 :ref:`ceph_rbd` 设备连接到另外一个用于维护的虚拟机中处理磁盘扩容。但是，非常意外地发现，当我关闭了维护的虚拟机 ``z-dev`` 之后，想要恢复原先使用 :ref:`ceph_rbd` 的 ``y-k8s-n-1`` 却出现vfio设备连接超时:
+
+.. literalinclude:: qemu_vfio_connect_timeout/vfio_used_error
+   :caption: 启动虚拟机时vfio设备报告已被使用，连接超时
+
+这里的报错繁忙设备 ``3eb9d560-0b31-11ee-91a9-bb28039c61eb`` 在 ``virsh dumpxml y-k8s-n-1`` 可以看到其实就是 :ref:`vgpu_quickstart` 配置的2个 :ref:`vgpu` 设置之一:
+
+.. literalinclude:: ../vgpu/vgpu_quickstart/vgpu_create_output_1
+   :language: bash
+   :caption: 执行 ``vgpu_create`` 脚本创建2个 ``P40-12C`` :ref:`vgpu` 输出信息第 **一** 个vgpu设备
+
+这是一个 ``mdev`` 设备 ( `VFIO Mediated devices <https://docs.kernel.org/driver-api/vfio-mediated-device.html>`_ 设备)
+
+回顾 :ref:`install_vgpu_manager` 笔记可以看到，需要首先确保 ``nvidia-vgpu-mgr.service`` 正常运行，也就是说，必须先 :ref:`vgpu_unlock`
+
+- 检查 ``nvidia-vgpu-mgr.service`` 状态:
+
+.. literalinclude:: ../vgpu/install_vgpu_manager/systemctl_staus_nvidia-vgpu-mgr
+   :language: bash
+   :caption: 检查 ``nvidia-vgpu-mgr`` 服务状态
+
+果然，再次发现这个服务启动失败...回到了老问题: :ref:`vgpu_unlock` 失效了:
+
+查询 ``vgpu`` :
+
+.. literalinclude:: ../vgpu/install_vgpu_manager/nvidia-smi_vgpu_q
+   :language: bash
+   :caption: ``nvidia-smi vgpu -q`` 查询vGPU
+
+输出显示只激活了 ``0`` 个vGPU:
+
+.. literalinclude:: ../vgpu/install_vgpu_manager/nvidia-smi_vgpu_q_output
+   :language: bash
+   :caption: ``nvidia-smi vgpu -q`` 查询vGPU显示只有 ``0`` 个vGPU
+
+我想起来了， :ref:`vgpu_unlock` 需要使用 :ref:`dkms` 模块方式安装 :ref:`vgpu` 驱动。最近依次我升级了内核，内核升级时会重新编译安装 :ref:`vgpu` 模块。我重新检查一遍流程，发现原先修订的过程都正确，但是会不会最近升级的内核支持不稳定呢？
+
+我重新编译了一次 :ref:`vgpu_unlock` (似乎不必)，重启服务器
diff --git a/source/kvm/debug/qemu_vfio_connect_timeout/vfio_used_error b/source/kvm/debug/qemu_vfio_connect_timeout/vfio_used_error
@@ -0,0 +1,5 @@
+error: Failed to start domain 'y-k8s-n-1'
+error: internal error: qemu unexpectedly closed the monitor: 2023-08-31T13:35:04.132854Z qemu-system-x86_64: 
+-device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/3eb9d560-0b31-11ee-91a9-bb28039c61eb,display=off,bus=pci.7,addr=0x0: 
+vfio 3eb9d560-0b31-11ee-91a9-bb28039c61eb: error getting device from group 123: Connection timed out
+Verify all devices in group 123 are bound to vfio-<bus> or pci-stub and not already in use
diff --git a/source/kvm/vgpu/vgpu_quickstart.rst b/source/kvm/vgpu/vgpu_quickstart.rst
@@ -58,9 +58,14 @@ NVIDIA License Server 安装是一个独立步骤
 
 将执行脚本的输出信息:
 
-.. literalinclude:: vgpu_quickstart/vgpu_create_output
+.. literalinclude:: vgpu_quickstart/vgpu_create_output_1
    :language: bash
-   :caption: 执行 ``vgpu_create`` 脚本创建2个 ``P40-12C`` :ref:`vgpu` 输出信息(用于配置VM)
+   :caption: 执行 ``vgpu_create`` 脚本创建2个 ``P40-12C`` :ref:`vgpu` 输出信息第 **一** 个vgpu设备
+
+.. literalinclude:: vgpu_quickstart/vgpu_create_output_2
+   :language: bash
+   :caption: 执行 ``vgpu_create`` 脚本创建2个 ``P40-12C`` :ref:`vgpu` 输出信息第 **二** 个vgpu设备
+
 
 (对应2个vGPU)分别添加到 **2个** 虚拟机 ``y-k8s-n-1`` 和 ``y-k8s-n-2`` 中，然后启动虚拟机
 

diff --git a/...m/vgpu/vgpu_quickstart/vgpu_create_output → ...vgpu/vgpu_quickstart/vgpu_create_output_1 b/...m/vgpu/vgpu_quickstart/vgpu_create_output → ...vgpu/vgpu_quickstart/vgpu_create_output_1
@@ -3,8 +3,3 @@
         <address uuid='3eb9d560-0b31-11ee-91a9-bb28039c61eb'/>
     </source>
 </hostdev>
-<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
-    <source>
-        <address uuid='3eb9d718-0b31-11ee-91aa-2b17f51ee12d'/>
-    </source>
-</hostdev>
diff --git a/source/kvm/vgpu/vgpu_quickstart/vgpu_create_output_2 b/source/kvm/vgpu/vgpu_quickstart/vgpu_create_output_2
@@ -0,0 +1,5 @@
+<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
+    <source>
+        <address uuid='3eb9d718-0b31-11ee-91aa-2b17f51ee12d'/>
+    </source>
+</hostdev>