Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix problem with empty cache after reboot #304

Merged

Conversation

ykulazhenkov
Copy link
Contributor

@ykulazhenkov ykulazhenkov commented Apr 15, 2024

What this PR does / why we need it:
Use /var/lib/cni/ovs-cni/cache dir to store config cache.
Old path /tmp/ovscache is cleaned up after reboot on some operating systems.
Lack of the cache entries may prevent ovs-cni to do a proper cleanup on CmdDel.
Use /var/lib/cni/ovs-cni/cache as a persistent cache dir.

Special notes for your reviewer:

Problem with the current code (before this patch):

  1. /tmp/ovscache is used as a cache path by default
    DefaultCacheDir = "/tmp/ovscache"
  2. /tmp dir will be cleaned up during the host reboot on some distros
  3. CmdDel exit without an error if cache entry not found
    return nil
  4. ovs-cni will not be able to do a cleanup after the reboot, this will cause, the following state in ovs
d178a096-20dd-4aa1-931a-0a3e82613228
    Bridge mybr
        Port mybr
            Interface mybr
                type: internal
        Port vethd25775b1
            Interface vethd25775b1
                error: "could not open network device vethd25775b1 (No such device)"
    ovs_version: "2.17.9"
  1. on next CmdAdd all port with an error will be removed
    if err := cleanPorts(ovsDriver); err != nil {
  2. for non HW offloading use-case this will fix the issue (symptom, not the root cause)
  3. for HW offloading use-case, Ports will have no error and these ports will not be released by the cleanPorts function. As a result VFs will stuck in the OVS and manual clean-up is required before they can be used again.

Release note:

NONE

cc @e0ne @SchSeba

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Apr 15, 2024
@kubevirt-bot
Copy link
Collaborator

Hi @ykulazhenkov. Thanks for your PR.

I'm waiting for a k8snetworkplumbingwg member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kubevirt-bot kubevirt-bot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 15, 2024
@kubevirt-bot kubevirt-bot added dco-signoff: yes Indicates the PR's author has DCO signed all their commits. and removed dco-signoff: no Indicates the PR's author has not DCO signed all their commits. labels Apr 15, 2024
@ykulazhenkov ykulazhenkov marked this pull request as draft April 18, 2024 12:28
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 18, 2024
@ykulazhenkov ykulazhenkov changed the title Add support for cache_path configuration option Fxi empty cache problem after reboot Apr 18, 2024
@ykulazhenkov
Copy link
Contributor Author

After discussion with @e0ne we decided that it is better to implement alternative option. I marked PR as draft to apply required changes.

@ykulazhenkov ykulazhenkov changed the title Fxi empty cache problem after reboot Fix empty cache problem after reboot Apr 18, 2024
Old path ("/tmp/ovscache") is cleaned up after reboot
on some operating systems. Lack of the cache entries
may prevent ovs-cni to do a proper cleanup on CmdDel.

Use /var/lib/cni/ovs-cni/cache as a persistent cache dir.

Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 19, 2024
@ykulazhenkov ykulazhenkov marked this pull request as ready for review April 19, 2024 13:23
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2024
@ykulazhenkov
Copy link
Contributor Author

After discussion with @e0ne we decided that it is better to implement alternative option. I marked PR as draft to apply required changes.

required changes implemented. The PR is ready for reviews

@ykulazhenkov ykulazhenkov changed the title Fix empty cache problem after reboot Fix problem with empty cache after reboot Apr 19, 2024
@SchSeba
Copy link
Collaborator

SchSeba commented Apr 24, 2024

/ok-to-test

@kubevirt-bot kubevirt-bot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 24, 2024
@SchSeba
Copy link
Collaborator

SchSeba commented Apr 24, 2024

/lgtm
/approve

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 24, 2024
@kubevirt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SchSeba, ykulazhenkov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 24, 2024
@ykulazhenkov
Copy link
Contributor Author

/test pull-e2e-ovs-cni

@kubevirt-bot kubevirt-bot merged commit 3663c3d into k8snetworkplumbingwg:main Apr 24, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants