Ceph目前最新版本18(R版),此次部署的是(N版)
主机名 | 操作系统 | IP | 角色 |
---|---|---|---|
dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 | centos 7.9 | 10.104.2.23 | ceph-deploy、mon、osd |
dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 | centos 7.9 | 10.104.2.24 | mon、osd |
dx-lt-yd-zhejiang-jinhua-5-10-104-2-25 | centos 7.9 | 10.104.2.25 | mon、osd |
1、关闭防火墙 & selinux
systemctl stop firewalld
systemctl disable firewalld
sed -i 's/enforcing/disabled/' /etc/selinux/config
setenforce 0
2、关闭 swap
# 临时
swapoff -a
# 永久,重启生效
sed -ri 's/.*swap.*/#&/' /etc/fstab
3、修改 hosts 解析
cat >> /etc/hosts << EOF
10.104.2.23 dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
10.104.2.24 dx-lt-yd-zhejiang-jinhua-5-10-104-2-24
10.104.2.25 dx-lt-yd-zhejiang-jinhua-5-10-104-2-25
EOF
4、设置文件描述符
ulimit -SHn 65535
cat >> /etc/security/limits.conf << EOF
* soft nofile 65535
* hard nofile 65535
EOF
5、时间同步
yum install ntpdate -y
ntpdate time.windows.com
6、ssh 免交互认证
配置 ceph-deploy 可以访问其他主机即可
ssh-keygen -t rsa
** 上面操作所有节点都要进行初始化 **
14(N)版本及之前:
- yum:常规的部署方式
- ceph-deploy:ceph提供的简易部署工具,可以非常方便部署ceph集群
- ceph-ansible:官方基于ansible写的自动化部署工具
14(N)版本之后:
- cephadm:使用容器部署和管理Ceph集群,需要先部署Docker或者Podman和Python3
- rook:在Kubernetes中部署和管理Ceph集群
1、配置阿里云 yum 仓库(所有机器)
wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -ivh epel-release-latest-7.noarch.rpm
cat > /etc/yum.repos.d/ceph.repo << EOF
[Ceph]
name=Ceph packages for $basearch
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/\$basearch
gpgcheck=0
[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/noarch
gpgcheck=0
[ceph-source]
name=Ceph source packages
baseurl=http://mirrors.aliyun.com/ceph/rpm-nautilus/el7/SRPMS
gpgcheck=0
EOF
yum -y install ceph-common
2、安装 ceph-deploy 工具
yum install python2-pip -y
yum -y install ceph-deploy
3、生成安装目录
创建一个my-cluster目录,所有命令在此目录下进行。
mkdir -p /ceph-cluster && cd /ceph-cluster
设置环境变量,创建集群。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ceph-cluster]# node1=dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ceph-cluster]# node2=dx-lt-yd-zhejiang-jinhua-5-10-104-2-24
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ceph-cluster]# node3=dx-lt-yd-zhejiang-jinhua-5-10-104-2-25
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ceph-cluster]# ceph-deploy new $node1 $node2 $node3
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy new dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 dx-lt-yd-zhejiang-jinhua-5-10-104-2-25
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] func : <function new at 0x7f1de8b25398>
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f1de8b43c68>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] ssh_copykey : True
[ceph_deploy.cli][INFO ] mon : ['dx-lt-yd-zhejiang-jinhua-5-10-104-2-23', 'dx-lt-yd-zhejiang-jinhua-5-10-104-2-24', 'dx-lt-yd-zhejiang-jinhua-5-10-104-2-25']
[ceph_deploy.cli][INFO ] public_network : None
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] cluster_network : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] fsid : None
[ceph_deploy.new][DEBUG ] Creating new cluster named ceph
4、执行安装
安装 Ceph 包到指定节点: 注:–no-adjust-repos 参数是直接使用本地源,不使用官方默认源。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ceph-cluster] ceph-deploy install --no-adjust-repos $node1 $node2 $node3
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy install --no-adjust-repos dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 dx-lt-yd-zhejiang-jinhua-5-10-104-2-25
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] testing : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fe7a6dffbd8>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] dev_commit : None
[ceph_deploy.cli][INFO ] install_mds : False
[ceph_deploy.cli][INFO ] stable : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] adjust_repos : False
[ceph_deploy.cli][INFO ] func : <function install at 0x7fe7a744ab18>
[ceph_deploy.cli][INFO ] install_mgr : False
[ceph_deploy.cli][INFO ] install_all : False
[ceph_deploy.cli][INFO ] repo : False
[ceph_deploy.cli][INFO ] host : ['dx-lt-yd-zhejiang-jinhua-5-10-104-2-23', 'dx-lt-yd-zhejiang-jinhua-5-10-104-2-24', 'dx-lt-yd-zhejiang-jinhua-5-10-104-2-25']
[ceph_deploy.cli][INFO ] install_rgw : False
[ceph_deploy.cli][INFO ] install_tests : False
[ceph_deploy.cli][INFO ] repo_url : None
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] install_osd : False
[ceph_deploy.cli][INFO ] version_kind : stable
[ceph_deploy.cli][INFO ] install_common : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] dev : master
[ceph_deploy.cli][INFO ] nogpgcheck : False
[ceph_deploy.cli][INFO ] local_mirror : None
[ceph_deploy.cli][INFO ] release : None
[ceph_deploy.cli][INFO ] install_mon : False
[ceph_deploy.cli][INFO ] gpg_url : None
[ceph_deploy.install][DEBUG ] Installing stable version mimic on cluster ceph hosts dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 dx-lt-yd-zhejiang-jinhua-5-10-104-2-25
[ceph_deploy.install][DEBUG ] Detecting platform for host dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ...
5、部署 Monitor 服务
初始化并部署 monitor,收集所有密钥:
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ceph-cluster]# ceph-deploy mon create-initial
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy mon create-initial
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] subcommand : create-initial
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f58ba5874d0>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] func : <function mon at 0x7f58ba562938>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] keyrings : None
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 dx-lt-yd-zhejiang-jinhua-5-10-104-2-25
[ceph_deploy.mon][DEBUG ] detecting platform for host dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ...
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] connected to host: dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] detect platform information from remote host
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] detect machine type
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] find the location of an executable
[ceph_deploy.mon][INFO ] distro info: CentOS Linux 7.9.2009 Core
使用 ceph-deploy 命令将配置文件和 admin key 复制到管理节点和 Ceph 节点,以便每次执行 ceph CLI 命令无需指定 monitor 地址和 ceph.client.admin.keyring。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ceph-cluster]# ceph-deploy admin $node1 $node2 $node3
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy admin dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 dx-lt-yd-zhejiang-jinhua-5-10-104-2-25
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7fd92ffc7a28>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] client : ['dx-lt-yd-zhejiang-jinhua-5-10-104-2-23', 'dx-lt-yd-zhejiang-jinhua-5-10-104-2-24', 'dx-lt-yd-zhejiang-jinhua-5-10-104-2-25']
[ceph_deploy.cli][INFO ] func : <function admin at 0x7fd930ae7758>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.admin][DEBUG ] Pushing admin keys and conf to dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] connected to host: dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] detect platform information from remote host
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] detect machine type
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.admin][DEBUG ] Pushing admin keys and conf to dx-lt-yd-zhejiang-jinhua-5-10-104-2-24
6、添加 OSD
部署 OSD 服务并添加硬盘,分别在每个节点添加一块硬盘作为 OSD 服务。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ceph-cluster]# ceph-deploy osd create --data /dev/sdh dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO ] Invoked (2.0.1): /bin/ceph-deploy osd create --data /dev/sdh dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
[ceph_deploy.cli][INFO ] ceph-deploy options:
[ceph_deploy.cli][INFO ] verbose : False
[ceph_deploy.cli][INFO ] bluestore : None
[ceph_deploy.cli][INFO ] cd_conf : <ceph_deploy.conf.cephdeploy.Conf instance at 0x7f7304bf8a70>
[ceph_deploy.cli][INFO ] cluster : ceph
[ceph_deploy.cli][INFO ] fs_type : xfs
[ceph_deploy.cli][INFO ] block_wal : None
[ceph_deploy.cli][INFO ] default_release : False
[ceph_deploy.cli][INFO ] username : None
[ceph_deploy.cli][INFO ] journal : None
[ceph_deploy.cli][INFO ] subcommand : create
[ceph_deploy.cli][INFO ] host : dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
[ceph_deploy.cli][INFO ] filestore : None
[ceph_deploy.cli][INFO ] func : <function osd at 0x7f7304bbade8>
[ceph_deploy.cli][INFO ] ceph_conf : None
[ceph_deploy.cli][INFO ] zap_disk : False
[ceph_deploy.cli][INFO ] data : /dev/sdh
[ceph_deploy.cli][INFO ] block_db : None
[ceph_deploy.cli][INFO ] dmcrypt : False
[ceph_deploy.cli][INFO ] overwrite_conf : False
[ceph_deploy.cli][INFO ] dmcrypt_key_dir : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO ] quiet : False
[ceph_deploy.cli][INFO ] debug : False
[ceph_deploy.osd][DEBUG ] Creating OSD on cluster ceph with data device /dev/sdh
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] connected to host: dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] detect platform information from remote host
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] detect machine type
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO ] Distro info: CentOS Linux 7.9.2009 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] osd keyring does not exist yet, creating one
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] create a keyring file
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] find the location of an executable
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][INFO ] Running command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdh
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/ceph-authtool --gen-print-key
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new a17311b8-ee17-41ab-9242-ed0ef43418a6
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /sbin/vgcreate --force --yes ceph-29bcb30f-c0d3-4743-b4a5-be9d160fa883 /dev/sdh
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stdout: Wiping xfs signature on /dev/sdh.
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stdout: Physical volume "/dev/sdh" successfully created.
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stdout: Volume group "ceph-29bcb30f-c0d3-4743-b4a5-be9d160fa883" successfully created
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /sbin/lvcreate --yes -l 953861 -n osd-block-a17311b8-ee17-41ab-9242-ed0ef43418a6 ceph-29bcb30f-c0d3-4743-b4a5-be9d160fa883
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stdout: Logical volume "osd-block-a17311b8-ee17-41ab-9242-ed0ef43418a6" created.
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/ceph-authtool --gen-print-key
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-2
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/chown -h ceph:ceph /dev/ceph-29bcb30f-c0d3-4743-b4a5-be9d160fa883/osd-block-a17311b8-ee17-41ab-9242-ed0ef43418a6
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/chown -R ceph:ceph /dev/dm-1
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/ln -s /dev/ceph-29bcb30f-c0d3-4743-b4a5-be9d160fa883/osd-block-a17311b8-ee17-41ab-9242-ed0ef43418a6 /var/lib/ceph/osd/ceph-2/block
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-2/activate.monmap
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stderr: 2023-08-08 14:28:05.570 7f57fabd0700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] 2023-08-08 14:28:05.570 7f57fabd0700 -1 AuthRegistry(0x7f57f4066318) no keyring found at /etc/ceph/ceph.client.bootstrap-osd.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,, disabling cephx
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stderr: got monmap epoch 1
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/ceph-authtool /var/lib/ceph/osd/ceph-2/keyring --create-keyring --name osd.2 --add-key AQBz4NFk62FVHhAA1TVsMepg54fwLrP2tn06sA==
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stdout: creating /var/lib/ceph/osd/ceph-2/keyring
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] added entity osd.2 auth(key=AQBz4NFk62FVHhAA1TVsMepg54fwLrP2tn06sA==)
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/keyring
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2/
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 2 --monmap /var/lib/ceph/osd/ceph-2/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-2/ --osd-uuid a17311b8-ee17-41ab-9242-ed0ef43418a6 --setuser ceph --setgroup ceph
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stderr: 2023-08-08 14:28:06.078 7f87265f3a80 -1 bluestore(/var/lib/ceph/osd/ceph-2/) _read_fsid unparsable uuid
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] --> ceph-volume lvm prepare successful for: /dev/sdh
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-29bcb30f-c0d3-4743-b4a5-be9d160fa883/osd-block-a17311b8-ee17-41ab-9242-ed0ef43418a6 --path /var/lib/ceph/osd/ceph-2 --no-mon-config
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/ln -snf /dev/ceph-29bcb30f-c0d3-4743-b4a5-be9d160fa883/osd-block-a17311b8-ee17-41ab-9242-ed0ef43418a6 /var/lib/ceph/osd/ceph-2/block
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/chown -R ceph:ceph /dev/dm-1
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/systemctl enable ceph-volume@lvm-2-a17311b8-ee17-41ab-9242-ed0ef43418a6
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stderr: Created symlink from /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-2-a17311b8-ee17-41ab-9242-ed0ef43418a6.service to /usr/lib/systemd/system/ceph-volume@.service.
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/systemctl enable --runtime ceph-osd@2
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] stderr: Created symlink from /run/systemd/system/ceph-osd.target.wants/ceph-osd@2.service to /usr/lib/systemd/system/ceph-osd@.service.
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] Running command: /bin/systemctl start ceph-osd@2
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] --> ceph-volume lvm activate successful for osd ID: 2
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][WARNIN] --> ceph-volume lvm create successful for: /dev/sdh
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][INFO ] checking OSD status...
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][DEBUG ] find the location of an executable
[dx-lt-yd-zhejiang-jinhua-5-10-104-2-23][INFO ] Running command: /bin/ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 is now ready for osd use.
7、部署 MGR 服务
ceph-deploy mgr create $node1 $node2 $node3
注:MGR 是 Ceph L 版本新增加的组件,主要作用是分担和扩展 monitor 的部分功能,减轻 monitor 的负担, 建议每台 monitor 节点都部署一个 mgr,以实现相同级别的高可用。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ceph-cluster]# ceph -s
cluster:
id: 16b28327-7eca-494b-93fe-4218a156107a
health: HEALTH_WARN
mons are allowing insecure global_id reclaim
......
解决 mon is allowing insecure global_id reclaim
有些许延迟。
ceph config set mon auth_allow_insecure_global_id_reclaim false
最后查看 Ceph 版本:
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 ~]# ceph -v
ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
创建一个存储池,指定 pg 数量 64,副本数 3。
ceph osd pool create nomad 64 3
ceph osd pool ls
指定存储池作为 RBD 使用。
ceph osd pool application enable nomad rbd
创建 image,如下:
rbd create nomad/ceph-volume --size 2T --image-format 2 --image-feature layering
初始化并挂载在本地。
rbd map nomad/ceph-volume
sudo mkfs.xfs /dev/rbd0 -f
mkdir -p /mnt/rbd0
mount /dev/rbd0 /mnt/rbd0
1、Dashboard
从 L 版本开始,Ceph 提供了原生的 Dashboard 功能,通过 Dashboard 对 Ceph 集群状态查看和基本管理。
所有节点都需要安装:
yum install ceph-mgr-dashboard -y
接下来只需要在主节点执行,启用。
ceph mgr module enable dashboard --force
修改默认配置:
ceph config set mgr mgr/dashboard/server_addr 0.0.0.0
ceph config set mgr mgr/dashboard/server_port 7000
ceph config set mgr mgr/dashboard/ssl false
创建一个 dashboard 登录用户名密码。
echo "123456" > password.txt
ceph dashboard ac-user-create admin administrator -i password.txt
查看访问方式:
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 ~]# ceph mgr services
{
"dashboard": "http://dx-lt-yd-zhejiang-jinhua-5-10-104-2-23:7000/",
}
后面如果修改配置,重启生效:
ceph mgr module disable dashboard
ceph mgr module enable dashboard
2、Prometheus
启用 MGR Prometheus 插件。
ceph mgr module enable prometheus
开启 RBD 相关指标。
ceph config set mgr mgr/prometheus/rbd_stats_pools nomad
测试 promtheus 指标接口。
curl 127.0.0.1:9283/metrics
查看访问方式:
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 ~]# ceph mgr services
{
"dashboard": "http://dx-lt-yd-zhejiang-jinhua-5-10-104-2-23:7000/",
"prometheus": "http://dx-lt-yd-zhejiang-jinhua-5-10-104-2-23:9283/"
}
可以使用 NVMe 高性能 SSD 做元数据的存储,
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# mkfs.xfs -f /dev/nvme0n1
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mktable gpt
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.wal1 1M 50G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.wal2 50G 100G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.wal3 100G 150G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.wal4 150G 200G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.wal5 200G 250G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.wal6 250G 300G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.db1 300G 500G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.db2 500G 700G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.db3 700G 900G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.db4 900G 1100G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.db5 1100G 1300G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 mkpart osd.db6 1300G 1500G
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# parted -s /dev/nvme0n1 print
Model: NVMe Device (nvme)
Disk /dev/nvme0n1: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 50.0GB 50.0GB osd.wal1
2 50.0GB 100GB 50.0GB osd.wal2
3 100GB 150GB 50.0GB osd.wal3
4 150GB 200GB 50.0GB osd.wal4
5 200GB 250GB 50.0GB osd.wal5
6 250GB 300GB 50.0GB osd.wal6
7 300GB 500GB 200GB osd.db1
8 500GB 700GB 200GB osd.db2
9 700GB 900GB 200GB osd.db3
10 900GB 1100GB 200GB osd.db4
11 1100GB 1300GB 200GB osd.db5
12 1300GB 1500GB 200GB osd.db6
部署 osd 指定 db 和 wal 的存储地址。
$ ceph-volume lvm create --bluestore --data /dev/sdd --block.db /dev/nvme0n1p9 --block.wal /dev/nvme0n1p3
查看 osd 的信息。
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# ceph-volume lvm list /dev/sdc
====== osd.9 =======
[block] /dev/ceph-992f2429-562f-47da-8001-1398c9d3925b/osd-block-d8aec7ef-611e-491f-957c-335ae16623eb
block device /dev/ceph-992f2429-562f-47da-8001-1398c9d3925b/osd-block-d8aec7ef-611e-491f-957c-335ae16623eb
block uuid qHobxU-XUNs-hAXU-JxEM-5pOF-dorm-YWfxbr
cephx lockbox secret
cluster fsid 8490f6a8-d99d-43bd-85fb-43abc81bd261
cluster name ceph
crush device class None
db device /dev/nvme0n1p8
db uuid 235844c6-4c1e-4ab1-a7c0-c73f44f6c9a5
encrypted 0
osd fsid d8aec7ef-611e-491f-957c-335ae16623eb
osd id 9
osdspec affinity
type block
vdo 0
wal device /dev/nvme0n1p2
wal uuid 3570dafa-f0c6-4595-98bc-44dd1375780f
devices /dev/sdc
[db] /dev/nvme0n1p8
PARTUUID 235844c6-4c1e-4ab1-a7c0-c73f44f6c9a5
为了防止两次数据迁移,需要先调整 osd 的 crush weight。
# ceph osd crush reweight osd.1 3
# ceph osd crush reweight osd.1 2
# ceph osd crush reweight osd.1 0
停止 osd 节点服务。
# 命令确认停止osd不会影响数据可用性。
$ ceph osd ok-to-stop osd.1
$ systemctl stop ceph-osd@1
将节点状态标记为 out,开始迁移数据。
$ ceph osd down osd.1
$ ceph osd out osd.1
迁移完数据之后进行下列操作,确认删除 osd 不会影响数据可用性。
$ ceph osd safe-to-destroy osd.1
将 osd 从 CRUSH map 中移除。
$ ceph osd crush remove osd.1
删除鉴权秘钥(不删除编号会占住)。
$ ceph auth rm osd.1
最后从 OSDMap 中移除 osd,主要清理状态信息,包括通信地址在内的元数据等。
$ ceph osd rm osd.1
执行 ceph-volume lvm list
命令查到其中的 osd 编号并没有在集群中,osd.1 是残留的信息。
$ ceph-volume lvm list
查看 lvm 分区信息。
$ lsblk -l
使用 dmsetup remove
进行清理。
$ dmsetup ls
$ dmsetup remove ceph--56b5a16e--a01d4b97731b-osd--block--a0a15a95
再次执行 ceph-volume lvm list
时,残留信息被清理。
通过命令 ceph osd tree
查看,可以查看有的主机已经没有 osd 节点,可以进行清理
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-1-159 ~]# ceph osd tree
254 hdd 3.63869 osd.254 up 1.00000 1.00000
-64 0 host dx-lt-yd-zhejiang-jinhua-5-10-104-1-21
-21 43.66425 host dx-lt-yd-zhejiang-jinhua-5-10-104-1-37
52 hdd 3.63869 osd.52 up 1.00000 1.00000
执行清理
ceph osd crush remove dx-lt-yd-zhejiang-jinhua-5-10-104-1-21
基础命令,查看集群健康详情。
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-2-12 ~]# ceph health detail
HEALTH_WARN 95 pgs not deep-scrubbed in time; 100 pgs not scrubbed in time
PG_NOT_DEEP_SCRUBBED 95 pgs not deep-scrubbed in time
pg 1.fb9 not deep-scrubbed since 2023-07-11 12:37:44.526932
pg 1.f19 not deep-scrubbed since 2023-07-20 23:57:06.398897
pg 1.ec5 not deep-scrubbed since 2023-07-17 01:59:38.137176
pg 1.eac not deep-scrubbed since 2023-07-02 06:58:50.516897
pg 1.68f not deep-scrubbed since 2023-06-27 23:26:45.383738
pg 1.603 not deep-scrubbed since 2023-06-17 14:56:39.413443
pg 1.5c8 not deep-scrubbed since 2023-06-25 16:32:47.583276
pg 1.5b1 not deep-scrubbed since 2023-07-01 12:58:55.506933
本地执行 osd 创建命令。
$ /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdc
查看 osd 状态。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-1-159 ~]# ceph osd stat
156 osds: 154 up (since 2d), 154 in (since 8w); epoch: e132810
查看所有 osd 分布。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-25 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 65.49600 root default
-7 21.83200 host dx-lt-yd-zhejiang-jinhua-5-10-104-2-23
2 hdd 3.63899 osd.2 up 1.00000 1.00000
3 hdd 3.63899 osd.3 up 1.00000 1.00000
6 hdd 3.63899 osd.6 up 1.00000 1.00000
9 hdd 3.63899 osd.9 up 1.00000 1.00000
12 hdd 3.63899 osd.12 up 1.00000 1.00000
15 hdd 3.63899 osd.15 up 1.00000 1.00000
-3 21.83200 host dx-lt-yd-zhejiang-jinhua-5-10-104-2-24
0 hdd 3.63899 osd.0 up 1.00000 1.00000
4 hdd 3.63899 osd.4 up 1.00000 1.00000
7 hdd 3.63899 osd.7 up 1.00000 1.00000
10 hdd 3.63899 osd.10 up 1.00000 1.00000
13 hdd 3.63899 osd.13 up 1.00000 1.00000
16 hdd 3.63899 osd.16 up 1.00000 1.00000
-5 21.83200 host dx-lt-yd-zhejiang-jinhua-5-10-104-2-25
1 hdd 3.63899 osd.1 up 1.00000 1.00000
5 hdd 3.63899 osd.5 up 1.00000 1.00000
8 hdd 3.63899 osd.8 up 1.00000 1.00000
11 hdd 3.63899 osd.11 up 1.00000 1.00000
14 hdd 3.63899 osd.14 up 1.00000 1.00000
17 hdd 3.63899 osd.17 up 1.00000 1.00000
查看所有 osd 磁盘使用状态。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-25 ~]# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 hdd 3.63899 1.00000 3.6 TiB 20 GiB 19 GiB 108 KiB 1024 MiB 3.6 TiB 0.54 0.90 50 up
3 hdd 3.63899 1.00000 3.6 TiB 28 GiB 27 GiB 41 KiB 1024 MiB 3.6 TiB 0.76 1.26 72 up
6 hdd 3.63899 1.00000 3.6 TiB 25 GiB 24 GiB 144 KiB 1024 MiB 3.6 TiB 0.66 1.09 63 up
9 hdd 3.63899 1.00000 3.6 TiB 20 GiB 19 GiB 64 KiB 1024 MiB 3.6 TiB 0.53 0.87 49 up
12 hdd 3.63899 1.00000 3.6 TiB 22 GiB 21 GiB 44 KiB 1024 MiB 3.6 TiB 0.59 0.97 56 up
15 hdd 3.63899 1.00000 3.6 TiB 23 GiB 22 GiB 80 KiB 1024 MiB 3.6 TiB 0.61 1.01 58 up
0 hdd 3.63899 1.00000 3.6 TiB 27 GiB 26 GiB 132 KiB 1024 MiB 3.6 TiB 0.72 1.19 69 up
4 hdd 3.63899 1.00000 3.6 TiB 22 GiB 21 GiB 92 KiB 1024 MiB 3.6 TiB 0.59 0.98 55 up
7 hdd 3.63899 1.00000 3.6 TiB 19 GiB 18 GiB 40 KiB 1024 MiB 3.6 TiB 0.52 0.86 48 up
10 hdd 3.63899 1.00000 3.6 TiB 23 GiB 22 GiB 68 KiB 1024 MiB 3.6 TiB 0.62 1.02 59 up
13 hdd 3.63899 1.00000 3.6 TiB 22 GiB 21 GiB 52 KiB 1024 MiB 3.6 TiB 0.58 0.96 54 up
16 hdd 3.63899 1.00000 3.6 TiB 21 GiB 20 GiB 88 KiB 1024 MiB 3.6 TiB 0.56 0.93 54 up
1 hdd 3.63899 1.00000 3.6 TiB 23 GiB 22 GiB 108 KiB 1024 MiB 3.6 TiB 0.61 1.00 57 up
5 hdd 3.63899 1.00000 3.6 TiB 20 GiB 19 GiB 313 KiB 1024 MiB 3.6 TiB 0.55 0.90 51 up
8 hdd 3.63899 1.00000 3.6 TiB 27 GiB 26 GiB 108 KiB 1024 MiB 3.6 TiB 0.73 1.21 68 up
11 hdd 3.63899 1.00000 3.6 TiB 19 GiB 18 GiB 48 KiB 1024 MiB 3.6 TiB 0.52 0.86 48 up
14 hdd 3.63899 1.00000 3.6 TiB 24 GiB 23 GiB 84 KiB 1024 MiB 3.6 TiB 0.65 1.08 62 up
17 hdd 3.63899 1.00000 3.6 TiB 20 GiB 19 GiB 48 KiB 1024 MiB 3.6 TiB 0.54 0.88 51 up
TOTAL 65 TiB 406 GiB 388 GiB 1.6 MiB 18 GiB 65 TiB 0.61
MIN/MAX VAR: 0.86/1.26 STDDEV: 0.07
查看所有 osd 映射信息。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-1-159 ~]# ceph osd dump
查看具体 osd 的 block 信息。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-25 ~]# ceph-volume lvm list /dev/sdj
====== osd.8 =======
[block] /dev/ceph-4093f374-fc36-4768-ac98-c544d36be9b0/osd-block-ee85c7c9-1b37-4d69-bfa0-a12db586facf
block device /dev/ceph-4093f374-fc36-4768-ac98-c544d36be9b0/osd-block-ee85c7c9-1b37-4d69-bfa0-a12db586facf
block uuid 9nwPUB-Z5rE-9TqH-vvEc-3x6M-D7ze-QeX3pC
cephx lockbox secret
cluster fsid 16b28327-7eca-494b-93fe-4218a156107a
cluster name ceph
crush device class None
encrypted 0
osd fsid ee85c7c9-1b37-4d69-bfa0-a12db586facf
osd id 8
osdspec affinity
type block
vdo 0
devices /dev/sdj
停止和启动 osd 服务。
$ systemctl stop ceph-osd@2
$ systemctl start ceph-osd@2
查看 osd 核心配置。
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 ~]# ceph config show osd.44
NAME VALUE SOURCE OVERRIDES IGNORES
auth_client_required cephx file
auth_cluster_required cephx file
auth_service_required cephx file
daemonize false override
keyring $osd_data/keyring default
leveldb_log default
mon_host 10.103.3.134,10.103.3.137,10.103.3.139 file
mon_initial_members dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134, dx-lt-yd-hebei-shijiazhuang-10-10-103-3-137, dx-lt-yd-hebei-shijiazhuang-10-10-103-3-139 file
osd_max_backfills 3 override (mon[3])
osd_recovery_max_active 9 override (mon[9])
osd_recovery_max_single_start 1 mon
osd_recovery_sleep 0.500000 mon
rbd_default_features 1 mon default[61]
rbd_default_format 2 mon
setgroup ceph cmdline
setuser ceph
查看 osd commit 和 apply 延迟信息。
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-90 ~]# ceph osd perf
osd commit_latency(ms) apply_latency(ms)
0 109 109
6 0 0
3 1 1
17 1 1
16 0 0
5 0 0
4 0 0
查看 osd 状态信息。
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 ~]# ceph osd status
+-----+---------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+-----+---------------------------------------------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 | 3760G | 3690G | 7 | 6055k | 1 | 179k | exists,up |
| 1 | dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 | 3791G | 3660G | 25 | 4291k | 3 | 426k | exists,up |
| 2 | dx-lt-yd-hebei-shijiazhuang-10-10-103-3-44 | 1302G | 485G | 63 | 45.6M | 0 | 0 | exists,up |
| 3 | dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 | 4267G | 3184G | 15 | 7214k | 8 | 345k | exists,up |
| 4 | dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 | 2715G | 4735G | 6 | 2773k | 5 | 609k | exists,up |
| 5 | dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 | 3290G | 4161G | 15 | 4640k | 7 | 580k | exists,up |
| 6 | dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 | 3947G | 3504G | 14 | 6533k | 18 | 2003k | exists,up |
| 7 | dx-lt-yd-hebei-shijiazhuang-10-10-103-3-41 | 1368G | 419G | 20 | 25.8M | 0 | 0 | exists,up |
| 8 | dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 | 3614G | 3836G | 12 | 5960k | 7 | 804k | exists,up |
查看集群所有的存储池。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 65 TiB 65 TiB 178 GiB 196 GiB 0.29
TOTAL 65 TiB 65 TiB 178 GiB 196 GiB 0.29
POOLS:
POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL
nomad 1 512 71 GiB 560.91k 176 GiB 0.28 31 TiB
获取 pool 的相关参数和进行参数修改。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ~]# ceph osd pool get nomad size
size: 2
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ~]# ceph osd pool get nomad pg_num
pg_num: 64
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-23 ~]# ceph osd pool set nomad pg_num 512
获取 pool 的所有参数。
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-10-92 ~]# ceph osd pool get nomad all
size: 2
min_size: 1
pg_num: 512
pgp_num: 512
crush_rule: replicated_rule
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: true
nodeep-scrub: true
use_gmt_hitset: 1
fast_read: 0
pg_autoscale_mode: warn
获取 image 相关信息。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-25 ~]# rbd ls nomad
ceph-volume
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-25 ~]# rbd -p nomad info ceph-volume
rbd image 'ceph-volume':
size 2 TiB in 524288 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 1254ebed8401
block_name_prefix: rbd_data.1254ebed8401
format: 2
features: layering
op_features:
flags:
create_timestamp: Wed Aug 9 11:41:55 2023
access_timestamp: Wed Aug 9 19:52:11 2023
modify_timestamp: Wed Aug 9 20:44:17 2023
查看挂载信息。
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-3-139 ceph]# rbd showmapped
id pool namespace image snap device
0 nomad csi-vol-1c563042-de7c-11ed-9750-6c92bf9d36fc - /dev/rbd0
1 nomad test-image - /dev/rbd1
2 nomad csi-vol-88daec7d-7195-11ee-bb87-246e96073af4 - /dev/rbd2
3 nomad csi-vol-e78c5c60-84d2-11ed-9fe3-0894ef7dd406 - /dev/rbd3
取消挂载。
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-3-139 ceph]# rbd unmap -o force /dev/rbd2
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-3-139 ceph]# cat /sys/kernel/debug/ceph/380a1e72-da89-4041-8478-76383f5f6378.client636459/osdc
REQUESTS 0 homeless 0
LINGER REQUESTS
18446462598732840962 osd10 1.ac5aac28 1.c28 [10,51]/10 [10,51]/10 e445078 rbd_header.92dc4d5698745 0x20 14 WC/0
BACKOFFS
查看 rbd 所在的节点机器
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-4-17 ~]# rbd status nomad/csi-vol-9afdfd47-94b5-11ee-be28-e8611f394983
Watchers:
watcher=10.104.5.13:0/4205102659 client.274090 cookie=18446462598732840971
获取 mgr 服务相关信息。
$ ceph mgr dump
打印当前 crush 相关配置。
$ ceph osd crush dump
创建:
$ ceph osd crush rule create-replicated rule-hdd default host hdd
设置:
$ ceph osd pool set nomad crush_rule rule-hdd
修改:
$ ceph osd getcrushmap -o compiled-crushmap
8
$ crushtool -d compiled-crushmap -o decompiled-crushmap
$ vim decompiled-crushmap
$ crushtool -c decompiled-crushmap -o new-crushmap
$ ceph osd setcrushmap -i new-crushmap
获取所有 pg 的基本状态信息,可以用来查看 scrub 时间。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 ~]# ceph pg ls
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP
1.0 1165 0 0 0 515784704 0 0 3091 active+clean 20h 1703'148431 1703:292944 [8,15]p8 [8,15]p8 2023-08-14 17:14:44.751649 2023-08-09 11:33:21.874603
1.1 1231 0 0 0 459677696 0 0 3058 active+clean 29h 1703'148800 1703:318795 [3,0]p3 [3,0]p3 2023-08-14 08:38:22.575081 2023-08-09 11:33:21.874603
1.2 1147 0 0 0 500555776 0 0 3015 active+clean 29h 1703'203541 1703:388279 [8,16]p8 [8,16]p8 2023-08-14 09:02:27.178658 2023-08-09 11:33:21.874603
1.3 1225 0 0 0 558604288 0 0 3066 active+clean 27h 1703'147294 1703:332381 [4,1]p4 [4,1]p4 2023-08-14 11:10:06.946977 2023-08-13 08:39:59.122150
1.4 1256 0 0 0 619810816 0 0 3045 active+clean 25h 1703'197198 1703:411128 [3,8]p3 [3,8]p3 2023-08-14 12:23:55.370157 2023-08-10 19:00:04.712304
1.5 1183 0 0 0 516734976 0 0 3026 active+clean 3h 1703'150475 1703:352785 [15,10]p15 [15,10]p15 2023-08-15 10:49:32.150982 2023-08-09 11:33:21.874603
1.6 1158 0 0 0 516968448 0 0 3072 active+clean 33h 1703'155869 1703:351394 [6,14]p6 [6,14]p6 2023-08-14 05:06:41.818069 2023-08-10 14:58:40.268083
1.7 1154 0 0 0 494632960 0 0 3010 active+clean 31h 1703'142384 1703:341985 [14,6]p14 [14,6]p14 2023-08-14 06:44:28.852279 2023-08-13 01:25:37.973756
1.8 1228 0 0 0 541659136 0 0 3017 active+clean 24h 1703'206843 1703:389781 [5,4]p5 [5,4]p5 2023-08-14 14:01:58.874771 2023-08-11 22:37:30.565692
1.9 1188 0 0 0 501682176 0 0 3014 active+clean 7h 1703'175567 1703:361147 [1,3]p1 [1,3]p1 2023-08-15 07:09:57.125092 2023-08-09 11:33:21.874603
查看 pg 的基本状态和分布情况,输出更详细的信息。其中 Omap 统计信息依赖深度扫描时收集,数据只能作为参考,下同。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-25 ~]# ceph pg dump
......
OSD_STAT USED AVAIL USED_RAW TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM
17 19 GiB 3.6 TiB 20 GiB 3.6 TiB [0,2,3,4,6,7,9,12,13,15,16] 51 23
16 20 GiB 3.6 TiB 21 GiB 3.6 TiB [1,3,5,8,9,11,12,14,15,17] 54 22
15 22 GiB 3.6 TiB 23 GiB 3.6 TiB [0,1,4,5,7,8,10,11,13,14,16,17] 58 32
14 23 GiB 3.6 TiB 24 GiB 3.6 TiB [0,3,4,6,7,10,12,13,15,16] 62 21
13 21 GiB 3.6 TiB 22 GiB 3.6 TiB [1,2,3,5,6,8,9,11,12,14,15,17] 54 31
12 21 GiB 3.6 TiB 22 GiB 3.6 TiB [0,1,4,5,7,8,10,11,13,14,16,17] 56 30
11 18 GiB 3.6 TiB 19 GiB 3.6 TiB [0,2,3,4,6,7,9,10,12,13,16] 48 24
10 22 GiB 3.6 TiB 23 GiB 3.6 TiB [1,2,3,5,6,8,9,11,12,14,15,17] 59 35
3 27 GiB 3.6 TiB 28 GiB 3.6 TiB [0,1,2,4,5,7,8,10,11,13,14,17] 72 39
2 19 GiB 3.6 TiB 20 GiB 3.6 TiB [1,3,4,5,7,8,10,11,13,14,16,17] 50 24
1 22 GiB 3.6 TiB 23 GiB 3.6 TiB [0,2,3,4,6,7,9,10,12,13,15,16] 57 29
0 26 GiB 3.6 TiB 27 GiB 3.6 TiB [1,2,3,4,5,6,8,9,10,11,12,14,15,17] 69 40
4 21 GiB 3.6 TiB 22 GiB 3.6 TiB [1,2,3,5,6,8,9,11,14,15,17] 55 31
5 19 GiB 3.6 TiB 20 GiB 3.6 TiB [0,2,3,4,6,7,9,10,12,13,15,16] 51 21
6 24 GiB 3.6 TiB 25 GiB 3.6 TiB [0,1,3,4,5,7,8,10,11,13,14,16,17] 63 32
7 18 GiB 3.6 TiB 19 GiB 3.6 TiB [1,2,3,5,6,8,9,11,14,17] 48 23
8 26 GiB 3.6 TiB 27 GiB 3.6 TiB [0,2,3,4,6,7,9,10,12,13,15,16] 68 34
9 19 GiB 3.6 TiB 20 GiB 3.6 TiB [0,1,4,5,8,10,11,13,14,16,17] 49 21
sum 388 GiB 65 TiB 406 GiB 65 TiB
查看特殊状态 pg 的详情。
[root@dx-lt-yd-hebei-shijiazhuang-10-10-103-3-134 ~]# ceph pg dump_stuck
PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY
1.e87 active+remapped+backfill_wait [23,113] 23 [23,116] 23
1.e7c active+remapped+backfilling [113,87] 113 [17,89] 17
1.e76 active+remapped+backfill_wait [21,82] 21 [21,107] 21
1.e70 active+remapped+backfill_wait [33,109] 33 [33,25] 33
1.e61 active+remapped+backfilling [30,108] 30 [30,57] 30
1.e60 active+remapped+backfill_wait [62,9] 62 [9,26] 9
1.e2c active+remapped+backfill_wait [116,66] 116 [116,10] 116
1.e21 active+remapped+backfill_wait [64,11] 64 [11,103] 11
1.dff active+remapped+backfilling [111,27] 111 [27,17] 27
1.dc3 active+remapped+backfilling [73,32] 73 [32,94] 32
1.daa active+remapped+backfilling [88,82] 88 [88,58] 88
1.d1d active+remapped+backfill_wait [111,58] 111 [26,76] 26
获取指定 osd 上做为 primary pg 的基本状态信息。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-25 ~]# ceph pg ls-by-primary osd.5
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP
1.8 1189 0 0 0 378081280 0 0 3033 active+clean 71m 1701'117659 1701:213562 [5,4]p5 [5,4]p5 2023-08-11 22:37:30.565692 2023-08-11 22:37:30.565692
1.10 1158 0 0 0 329056256 0 0 3249 active+clean 67m 1701'105715 1701:226836 [5,6]p5 [5,6]p5 2023-08-11 22:40:56.596985 2023-08-09 11:33:21.874603
1.18 1195 0 0 0 368934912 0 0 3080 active+clean 68m 1701'111229 1701:226586 [5,16]p5 [5,16]p5 2023-08-11 22:39:46.167104 2023-08-09 11:33:21.874603
1.97 1137 0 0 0 434163712 0 0 3352 active+clean 52m 1701'100890 1701:247114 [5,12]p5 [5,12]p5 2023-08-11 22:56:13.836447 2023-08-09 11:33:21.874603
1.9c 1136 0 0 0 398704640 0 0 4385 active+clean 52m 1701'102417 1701:266931 [5,13]p5 [5,13]p5 2023-08-11 22:56:26.655758 2023-08-11 22:56:26.655758
1.c9 1144 0 0 0 344027136 0 0 4901 active+clean 52m 1701'99001 1701:216962 [5,10]p5 [5,10]p5 2023-08-11 22:56:26.962591 2023-08-09 11:33:21.874603
1.d9 1164 0 0 0 385499136 0 0 3384 active+clean 52m 1701'109884 1701:230833 [5,15]p5 [5,15]p5 2023-08-11 22:56:27.937661 2023-08-10 20:24:38.161190
1.df 1147 0 0 0 354439168 0 0 3257 active+clean 52m 1701'100467 1701:273232 [5,10]p5 [5,10]p5 2023-08-11 22:56:29.934775 2023-08-09 11:33:21.874603
1.e3 1146 0 0 0 352804864 0 0 3527 active+clean 51m 1701'115509 1701:311209 [5,16]p5 [5,16]p5 2023-08-11 22:56:34.927850 2023-08-11 21:36:33.341214
1.e8 1102 0 0 0 365084672 0 0 3048 active+clean 51m 1701'107744 1701:282304 [5,3]p5 [5,3]p5 2023-08-11 22:56:39.886345 2023-08-10 16:54:12.475772
1.f4 1177 0 0 0 343142400 0 0 3247 active+clean 51m 1701'99630 1701:269765 [5,10]p5 [5,10]p5 2023-08-11 22:56:42.882092 2023-08-09 11:33:21.874603
1.ff 1142 0 0 0 374489088 0 0 3720 active+clean 51m 1701'111168 1701:334898 [5,16]p5 [5,16]p5 2023-08-11 22:56:43.908609 2023-08-10 19:20:36.589840
1.119 1134 0 0 0 377098240 0 0 3450 active+clean 51m 1701'109950 1701:227026 [5,10]p5 [5,10]p5 2023-08-11 22:57:17.188768 2023-08-10 20:24:38.161190
1.144 1152 0 0 0 407887872 0 0 3045 active+clean 51m 1701'115413 1701:255511 [5,3]p5 [5,3]p5 2023-08-11 22:56:45.965646 2023-08-10 19:00:04.712304
1.14f 1140 0 0 0 372760576 0 0 3484 active+clean 51m 1701'103580 1701:268104 [5,0]p5 [5,0]p5 2023-08-11 22:57:08.216644 2023-08-09 11:33:21.874603
1.15f 1159 0 0 0 385597440 0 0 3384 active+clean 51m 1701'100594 1701:272070 [5,12]p5 [5,12]p5 2023-08-11 22:56:46.999844 2023-08-09 11:33:21.874603
1.17a 1138 0 0 0 384913408 0 0 3263 active+clean 51m 1701'96836 1701:227578 [5,6]p5 [5,6]p5 2023-08-11 22:57:13.190288 2023-08-09 11:33:21.874603
1.19f 1087 0 0 0 375009280 0 0 3376 active+clean 51m 1701'100586 1701:273153 [5,9]p5 [5,9]p5 2023-08-11 22:57:18.188223 2023-08-09 11:33:21.874603
1.1a4 1141 0 0 0 404832256 0 0 5535 active+clean 51m 1701'100090 1701:217159 [5,7]p5 [5,7]p5 2023-08-11 22:57:19.168426 2023-08-09 11:33:21.874603
1.1b2 1167 0 0 0 365268992 0 0 3353 active+clean 51m 1701'105670 1701:243116 [5,12]p5 [5,12]p5 2023-08-11 22:57:24.224763 2023-08-09 11:33:21.874603
1.1f8 1175 0 0 0 399880192 0 0 3657 active+clean 52m 1701'119453 1701:290254 [5,2]p5 [5,2]p5 2023-08-11 22:41:16.365644 2023-08-11 21:47:39.203228
获取指定 osd 上 all pg 的基本状态信息,数据指标和上面一致。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 ~]# ceph pg ls-by-osd osd.5
PG OBJECTS DEGRADED MISPLACED UNFOUND BYTES OMAP_BYTES* OMAP_KEYS* LOG STATE SINCE VERSION REPORTED UP ACTING SCRUB_STAMP DEEP_SCRUB_STAMP
1.8 1228 0 0 0 541659136 0 0 3086 active+clean 24h 1703'207512 1703:391129 [5,4]p5 [5,4]p5 2023-08-14 14:01:58.874771 2023-08-11 22:37:30.565692
1.10 1192 0 0 0 471662592 0 0 3052 active+clean 8h 1703'138027 1703:289376 [5,6]p5 [5,6]p5 2023-08-15 06:07:13.350737 2023-08-09 11:33:21.874603
1.18 1229 0 0 0 511541248 0 0 3064 active+clean 26h 1703'219513 1703:439831 [5,16]p5 [5,16]p5 2023-08-14 12:01:21.151893 2023-08-14 12:01:21.151893
1.70 1237 0 0 0 586428416 0 0 3043 active+clean 24h 1703'188548 1703:353307 [2,5]p2 [2,5]p2 2023-08-14 14:23:26.260207 2023-08-11 17:28:31.199338
1.97 1170 0 0 0 572575744 0 0 3090 active+clean 22h 1703'168454 1703:379925 [5,12]p5 [5,12]p5 2023-08-14 15:53:23.611347 2023-08-09 11:33:21.874603
1.9c 1168 0 0 0 532922368 0 0 3060 active+clean 24h 1703'174105 1703:407977 [5,13]p5 [5,13]p5 2023-08-14 13:47:38.192312 2023-08-11 22:56:26.655758
1.b8 1164 0 0 0 545398784 0 0 3081 active+clean 2h 1703'161378 1703:366626 [15,5]p15 [15,5]p15 2023-08-15 11:29:00.295327 2023-08-15 11:29:00.295327
1.c8 1149 0 0 0 523264000 0 0 3092 active+clean 28h 1703'196036 1703:375087 [9,5]p9 [9,5]p9 2023-08-14 10:27:15.350574 2023-08-10 22:14:20.771845
1.c9 1183 0 0 0 507604992 0 0 3100 active+clean 8h 1703'142302 1703:301476 [5,10]p5 [5,10]p5 2023-08-15 05:37:37.965928 2023-08-09 11:33:21.874603
1.d9 1200 0 0 0 536494080 0 0 3009 active+clean 26h 1703'127113 1703:262978 [5,15]p5 [5,15]p5 2023-08-14 12:13:10.846808 2023-08-10 20:24:38.161190
1.dc 1177 0 0 0 497799168 0 0 3013 active+clean 3h 1703'163892 1703:391947 [13,5]p13 [13,5]p13 2023-08-15 10:55:36.024320 2023-08-09 11:33:21.874603
1.df 1177 0 0 0 480268288 0 0 3040 active+clean 23h 1703'133319 1703:336717 [5,10]p5 [5,10]p5 2023-08-14 15:02:17.859279 2023-08-09 11:33:21.874603
1.e3 1173 0 0 0 461873152 0 0 3074 active+clean 28h 1703'161053 1703:400210 [5,16]p5 [5,16]p5 2023-08-14 10:06:55.247139 2023-08-11 21:36:33.34121
获取当前集群 pg 状态。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-2-24 ~]# ceph pg stat
128 pgs: 67 active+undersized+degraded, 61 active+clean; 35 GiB data, 106 GiB used, 2.1 TiB / 2.2 TiB avail; 4642/27156 objects degraded (17.094%)
如果 pg 出现不一致,可以尝试执行命令对 pg 进行修复。
ceph pg repair 1.1f8
更多命令:
ceph pg dump
ceph pg dump all
ceph pg dump summary
ceph pg dump pgs
ceph pg dump pools
ceph pg ls
查看集群 IOPS、读写带宽信息。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-1-159 ~]# ceph iostat
+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+
| Read | Write | Total | Read IOPS | Write IOPS | Total IOPS |
+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+--------------------------------------------+
| 149 MiB/s | 381 MiB/s | 530 MiB/s | 1915 | 1455 | 3370 |
| 149 MiB/s | 381 MiB/s | 530 MiB/s | 1915 | 1455 | 3370 |
| 149 MiB/s | 382 MiB/s | 531 MiB/s | 1917 | 1457 | 3375 |
| 149 MiB/s | 382 MiB/s | 531 MiB/s | 1917 | 1457 | 3375 |
| 233 MiB/s | 525 MiB/s | 759 MiB/s | 3007 | 2157 | 5164 |
| 233 MiB/s | 525 MiB/s | 759 MiB/s | 3007 | 2157 | 5164 |
首先,通过命令查看 osd 影响 recovery 速度的关键配置项。
[root@dx-lt-yd-zhejiang-jinhua-5-10-104-1-159 ~]# ceph daemon osd.13 config show | egrep "osd_max_backfills|osd_recovery_max_active|osd_recovery_sleep|osd_recovery_op_priority|osd_recovery_max_single_start"
"osd_max_backfills": "1",
"osd_recovery_max_active": "1",
"osd_recovery_max_single_start": "1",
"osd_recovery_op_priority": "3",
"osd_recovery_sleep": "0.500000",
"osd_recovery_sleep_hdd": "0.100000",
"osd_recovery_sleep_hybrid": "0.025000",
"osd_recovery_sleep_ssd": "0.000000",
核心影响恢复速度的参数:
-
osd_max_backfills:由于一个 osd 承载了多个 pg,所以一个 osd 中的 pg 很大可能需要做 recovery。 这个参数就是设置每个 osd 最多能让 osd_max_backfills 个 pg 进行同时做 backfill。
-
osd_recovery_op_priority:osd 修复操作的优先级,可小于该值;这个值越小,recovery 优先级越高。 高优先级会导致集群的性能降级直到 recovery 结束。
-
osd_recovery_max_active:一个 osd 上可以承载多个 pg,可能好几个 pg 都需要 recovery, 这个值限定该 osd 最多同时有多少 pg 做 recovery。
-
osd_recovery_max_single_start: 未知
-
osd_recovery_sleep:每个 recovery 操作之间的间隔时间,单位是 ms。
修改配置提高 recovery 速度,同时注意观察集群延迟情况。
$ ceph tell "osd.*" injectargs --osd_max_backfills=15
$ ceph tell "osd.*" injectargs --osd_recovery_max_active=15
$ ceph tell "osd.*" injectargs --osd_recovery_max_single_start=10
$ ceph tell "osd.*" injectargs --osd_recovery_sleep=0.3
获取 osd 全部配置信息。
$ ceph daemon osd.0 config show
获取 mon 全部配置信息。
$ ceph daemon mon.dx-lt-yd-zhejiang-jinhua-5-10-104-1-159 config show
查看当前日志级别。
$ ceph tell osd.0 config get debug_osd
15/20
设置日志级别。
$ ceph tell osd.0 config set debug_osd 30/30
Set debug_osd to 30/30
$ ceph tell osd.0 config get debug_osd
30/30
所有 osd 服务都生效
$ ceph tell osd.* config set debug_osd 30/30
Set debug_osd to 30/30
服务器节点需求重启,不迁移 osd 数据。
全局设置 noout,禁止数据平衡迁移。
$ ceph osd set noout
停止节点上的 osd 服务。
$ systemctl stop ceph-osd@{}
接下来进行服务器重启,然后恢复服务
$ ceph osd unset noout
$ systemctl start ceph-osd@{}