Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No timeout when Kubelet Calling cni plugin #65743

Closed
liucimin opened this issue Jul 3, 2018 · 29 comments · Fixed by #71653
Closed

No timeout when Kubelet Calling cni plugin #65743

liucimin opened this issue Jul 3, 2018 · 29 comments · Fixed by #71653
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@liucimin
Copy link
Contributor

liucimin commented Jul 3, 2018

BUG REPORT:

/kind bug

What happened:
When the kubelet create sandbox by cni plugin.
It will just use exec.Cmd to call the cni binary file.
In some cni plugin,such as contiv,the cni binary file may be locked and no return.
In this case, when kubelet first time call RunPodSandbox got a ctx timeout. the next time kubelet re-creates PodSandbox, it will always report this timeout.

image

What you expected to happen:
Next time kubelet re-creates PodSandbox it will do call cni again.

How to reproduce it (as minimally and precisely as possible):
Create a pod in a node,and make the cni binary file not return.

Environment:

  • Kubernetes version (use kubectl version):v.1.9.2

@kubernetes/sig-node-bugs

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. kind/bug Categorizes issue or PR as related to a bug. labels Jul 3, 2018
@liucimin
Copy link
Contributor Author

liucimin commented Jul 3, 2018

/assign liucimin

@liucimin
Copy link
Contributor Author

liucimin commented Jul 3, 2018

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 3, 2018
@huydinhle
Copy link

we are also got affected by this issue #45419, can someone take a look at this PR. we will be happy to test this out

@liucimin
Copy link
Contributor Author

@huydinhle
Yes, i m trying to fix this problem.
My change is first set a timeout in cni.
When the cni merge my code,then kubelet will fix more easy.
containernetworking/cni#568

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 19, 2018
@nikopen
Copy link
Contributor

nikopen commented Oct 24, 2018

containernetworking/cni#568 got merged, this is resolved I presume? @liucimin

/close

@k8s-ci-robot
Copy link
Contributor

@nikopen: Closing this issue.

In response to this:

containernetworking/cni#568 got merged, this is resolved I presume? @liucimin

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@nikopen
Copy link
Contributor

nikopen commented Oct 24, 2018

Just saw that a fix is needed in kubelet as well, can you confirm @liucimin ?

/reopen

@k8s-ci-robot
Copy link
Contributor

@nikopen: Reopening this issue.

In response to this:

Just saw that a fix is needed in kubelet as well, can you confirm @liucimin ?

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Oct 24, 2018
@bboreham
Copy link
Contributor

Yes kubelet needs to be changed.
And add a test for this case.

@liucimin
Copy link
Contributor Author

@nikopen
Yes,kubelet should update the cni version.
I will change the kubelet when the cni project release a new version

@nikopen
Copy link
Contributor

nikopen commented Oct 29, 2018

@bboreham what are the plans for cutting a new CNI release and bumping it in k8s? Does it need a new major k8s release to be bumped, v1.13?

Here I can see the latest is v0.7.3, seen as plain 'cni' in fedora, but in the official page the latest is 0.6.0 with some 0.7.0-alpha versions. Maybe because on the v1.12 changelogs I can see k8s is still using 0.6.0, but it's still inconsistent?

It's an old issue affecting many ( #45419 ), if the fix in kubelet is straightforward then this can be a great addition, given v1.13 is marked as a stability release.

@nikopen
Copy link
Contributor

nikopen commented Oct 29, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 29, 2018
@bboreham
Copy link
Contributor

Please can you check that all necessary changes are made in Kubernetes. The CNI change is all in the library so just vendor in whatever commit you want to, and check it all works as you need.

Here I can see the latest is v0.7.3

That's the plugins - different thing.

@nikopen
Copy link
Contributor

nikopen commented Nov 16, 2018

@liucimin feel free to prepare the Kubelet PR, it will then be easier to coordinate the full fix.

@liucimin
Copy link
Contributor Author

@nikopen
yes, i just waiting for the cni plugin release a new version.Then i will upgrade our kubelet's dockeshim to
use the new cni plugin.

@bboreham
Copy link
Contributor

There is no relevant change in CNI plugins.

The change you are waiting for is in libcni, and is already merged to master. Do not wait for it: test at will.

(I expect the calling code will need changing to make use of the new timeout, but I haven’t studied it)

@roywangtj
Copy link

The .go files that are under [https://github.com/kubernetes/kubernetes/tree/master/vendor/github.com/containernetworking/cni/libcni] were all two years ago. Where can we find the libcni code merged?

@bboreham
Copy link
Contributor

@roywangti the repo for libcni is https://github.com/containernetworking/cni/

Part of the next step is to copy updated files into the location you pointed at.

@roywangtj
Copy link

@bboreham, got it, Thanks!

@liucimin
Copy link
Contributor Author

liucimin commented Dec 3, 2018

#71653

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2019
@ijumps
Copy link

ijumps commented Mar 3, 2019

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 1, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 1, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@liucimin
Copy link
Contributor Author

/reopen
/remove-lifecycle rotten

@k8s-ci-robot
Copy link
Contributor

@liucimin: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Jul 31, 2019
@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants