Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix chassis gc #3525

Merged
merged 1 commit into from
Dec 15, 2023
Merged

fix chassis gc #3525

merged 1 commit into from
Dec 15, 2023

Conversation

zhangzujian
Copy link
Member

@zhangzujian zhangzujian commented Dec 14, 2023

Pull Request

What type of this PR

Examples of user facing changes:

  • Features
  • Bug fixes
  • Docs
  • Tests

Which issue(s) this PR fixes

Fixes #(issue-number)

WHAT

🤖[deprecated] Generated by Copilot at 6904498

Refactor and improve chassis management in gc.go and node.go. Enhance performance, reliability, and compatibility of garbage collection and chassis tag update functions.

🤖[deprecated] Generated by Copilot at 6904498

To clean up the chassis in OVN
The gcChassis function was honed
It deletes only those
That have no node to host
And avoids any needless work done

HOW

🤖[deprecated] Generated by Copilot at 6904498

  • Simplify and implement garbage collection logic for chassis in OVN southbound database
  • Use ListChassis method to get all chassis instead of filtering by vendor tag (link)
  • Delete chassis name from chassisNodes map after finding matching node (link)
  • Iterate over remaining chassisNodes map and delete chassis that are not in use by any node (link)
  • Use GetChassis method with kubeovnOnly parameter set to true to get chassis with kube-ovn vendor tag (link)
  • Call gcChassis function and return error if chassis is not registered for node (link)
  • Remove redundant return statement (link)

Signed-off-by: zhangzujian <zhangzujian.7@gmail.com>
@zhangzujian zhangzujian added bug Something isn't working need backport labels Dec 14, 2023
@zhangzujian zhangzujian marked this pull request as ready for review December 14, 2023 10:18
pkg/controller/gc.go Show resolved Hide resolved
if err != nil {
klog.Errorf("failed to get node %s chassis: %s, %v", node.Name, annoChassisName, err)
return err
}
if chassis == nil {
Copy link
Collaborator

@bobz965 bobz965 Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里如果有多个chassis 或者 只有一个chassis 但是和 node name不一致,直接触发 gc。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

节点被删除时,chassis 会被删除,但 ovn-controller 还会自动注册一个 chassis,这个 chassis 没有 external-ids:vendor=kube-ovn。后续如果一个同名且同 IP 的节点加入集群,ovn-controller 无法注册新的 chassis。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那就是说两种情况:

  1. 你说的这种,node之前被清理了,但是 ovn-controller 还在跑或者重新跑,导致有脏记录。 这种node不存在,但是有多余的chassis,现在你的 gcChassis 已经覆盖到了。 但是 node 更新的时候触发gc的逻辑还是有问题,目前没触发到。
  2. 我清理了整个集群,想快速重建下kube-ovn,可能会出现有两个chassis的情况

if err != nil {
klog.Errorf("failed to get node %s chassis: %s, %v", node.Name, annoChassisName, err)
return err
}
if chassis == nil {
Copy link
Collaborator

@bobz965 bobz965 Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里如果有多个chassis 或者 只有一个chassis 但是和 node name不一致,直接触发 gc。

@zhangzujian zhangzujian merged commit a8ee70b into kubeovn:master Dec 15, 2023
60 checks passed
@zhangzujian zhangzujian deleted the fix-gc branch December 15, 2023 01:34
zhangzujian added a commit that referenced this pull request Dec 15, 2023
Signed-off-by: zhangzujian <zhangzujian.7@gmail.com>
bobz965 pushed a commit that referenced this pull request Dec 15, 2023
Signed-off-by: zhangzujian <zhangzujian.7@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working need backport
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants