Skip to content

OVN DB 恢复

oilbeater edited this page Jun 27, 2022 · 9 revisions

Wiki 下的中文文档将不在维护,请访问我们最新的中文文档网站,获取最新的文档更新。

少数节点文件受损,集群正常工作

某个 ovn-central 无法正常启动,查看日志显示

 * ovn-northd is not running
ovsdb-server: ovsdb error: error reading record 2739 from OVN_Northbound log: record 2739 advances commit index to 6308 but last log index is 6307
 * Starting ovsdb-nb

该节点之前出现过时间不同步或者磁盘满的情况,可确认数据库文件受损。

根据提示是 OVN_Northbound 还是 OVN_Southbound 选择对应的 leader 节点进行操作

kubectl get ep -n kube-system
ovn-nb                                             10.0.128.61:6641                                                       2d1h
ovn-sb                                             10.0.128.61:6642                                                       2d1h

Exec 到对应 Pod 后查看当前数据库集群状态

root@VM-128-61-centos:/kube-ovn# ovs-appctl -t /var/run/ovn/ovnnb_db.ctl  cluster/status OVN_Northbound
9182
Name: OVN_Northbound
Cluster ID: e75f (e75fa340-49ed-45ab-990e-26cb865ebc85)
Server ID: 9182 (9182e8dd-b5b0-4dd8-8518-598cc1e374f3)
Address: tcp:[10.0.128.61]:6643
Status: cluster member
Role: leader
Term: 1454
Leader: self
Vote: self

Last Election started 1732603 ms ago, reason: timeout
Last Election won: 1732587 ms ago
Election timer: 1000
Log: [7332, 12512]
Entries not yet committed: 1
Entries not yet applied: 1
Connections: ->f080 <-f080 <-e631 ->e631
Disconnections: 1
Servers:
    f080 (f080 at tcp:[10.0.129.139]:6643) next_index=12512 match_index=12510 last msg 63 ms ago
    9182 (9182 at tcp:[10.0.128.61]:6643) (self) next_index=10394 match_index=12510
    e631 (e631 at tcp:[10.0.131.173]:6643) next_index=12512 match_index=0

从集群中踢出状态异常节点

ovs-appctl -t /var/run/ovn/ovnnb_db.ctl  cluster/kick OVN_Northbound e631

回到状态异常节点,删除对应的数据库文件

mv /etc/origin/ovn/ovnnb_db.db /tmp

删除对应的 ovn-central Pod 重启恢复

多数节点文件受损,集群不能正常工作

当 ovn-central 节点无法启动或数据库受损,无法保证多数节点正常,可通过下面的步骤来恢复 ovn-central 集群。

  1. 记录当前 ovn-central 副本数量,并停止 ovn-central 避免新的数据库变更
kubectl scale deployment -n kube-system ovn-central --replicas=0
  1. 选择 NODE_IPS 中排第一的节点恢复数据库文件,如果第一个节点数据库文件已损坏,从其他机器 /etc/origin/ovn 下复制文件到第一台机器,执行下列命令恢复数据库文件。
docker run -it -v /etc/origin/ovn:/etc/ovn kubeovn/kube-ovn:v1.10.0 bash
cd /etc/ovn/
ovsdb-tool cluster-to-standalone ovnnb_db_standalone.db ovnnb_db.db
ovsdb-tool cluster-to-standalone ovnsb_db_standalone.db ovnsb_db.db
  1. 退出容器,移除每个 ovn-central 节点上的数据库文件
mv /etc/origin/ovn/ovnnb_db.db /tmp
mv /etc/origin/ovn/ovnsb_db.db /tmp
  1. 恢复第一个节点的数据库文件
mv /etc/origin/ovn/ovnnb_db_standalone.db /etc/origin/ovn/ovnnb_db.db
mv /etc/origin/ovn/ovnsb_db_standalone.db /etc/origin/ovn/ovnsb_db.db
  1. 启动 ovn-central 容器
kubectl scale deployment -n kube-system ovn-central --replicas={之前副本数}
Clone this wiki locally