-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.7.0-beta.2 kubelet does not restart containers when /etc/kubernetes/manifests change #48219
Comments
@asac There are no sig labels on this issue. Please add a sig label by: |
/sig node |
cc @kubernetes/sig-node-bugs PTAL @asac Did you have the rc.1 image prepulled? Otherwise kubelet probably started pulling the image, which can take some time. |
yes, on apiserver i didnt have it pulled beforehands... but ui waited
_forever_ ... on the scheduler and controller manager i explicitly first
did the docker pull ... but still they didnt come back up...
note that i am editing with vi so it creates the .swp files...
…On 29 June 2017 at 12:28, Lucas Käldström ***@***.***> wrote:
cc @kubernetes/sig-node-bugs
<https://github.com/orgs/kubernetes/teams/sig-node-bugs> PTAL
@asac <https://github.com/asac> Did you have the rc.1 image prepulled?
Otherwise kubelet probably started pulling the image, which can take some
time.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48219 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACQqol92lu3mPRC9yq0gV8RP23Em8e9xks5sI3xXgaJpZM4OId5N>
.
|
That might also be a problem... |
I remembered that .swp issue had been fixed. |
Are there any related logs of kubelet? |
I can't reproduce this. @asac could you post the relevant kubelet log? |
will try to repro and get logs tonight...
…On 29 June 2017 at 13:16, Peter Zhao ***@***.***> wrote:
Are there any related logs of kubelet?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48219 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACQqotI8dr-Xoz0wGJR2xJKPi6MXE-Whks5sI4eMgaJpZM4OId5N>
.
|
ok i tailed syslog right before i edited manifest to change apiserver from rc.1 to final 1.7.0... again apiserver didnt come back.... (i pulled the docker image before). here the syslog part....
|
attaching long log that covers 4 minutes in case you want more.... |
then stopping kubelet and starting it makes it start api server... attaching a log of that start. while the apiserver is not fully up at the time i stop the log, it is already running... |
here the last few lines of that log when the kubelet finally stops complaining about api server not reachable anymore... |
Notice that there is a line in the log:
What's the 4913 manifest file? |
Good question... right now (e.g. right after starting kubelet again through
systemctl restart kubelet) its NOT THERE anymore and I definitely didnt put
i there manually.
Feels like a PID?
…On 30 June 2017 at 16:54, Peter Zhao ***@***.***> wrote:
Notice that there is a line in the log:
Jun 30 14:31:59 8bc81f5a-d92a-428f-b71c-0f31b3ce958f kubelet[22024]: E0630 14:31:59.860279 22024 file_linux.go:114] can't process config file "/etc/kubernetes/manifests/4913": open /etc/kubernetes/manifests/4913: no such file or directory
What's the 4913 manifest file?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48219 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACQqonCBVPC8xijdXM9wixql-vcH1puHks5sJQwMgaJpZM4OId5N>
.
|
Yeah, seems like. That's odd. @yujuhong any ideas? |
Nope. Some suggestions to help diagnose the issue:
BTW, it'd be better to include the logs prior to the "can't process config file" line. |
Another log from just the event we care about: Managed to reproduce with just opening the kube-scheduler manifest and without change saving it in vim. Log is (and kubelet gives up - not restarting scheduler until i restart kubelet). Also I could validate that if i just copy the edited yaml file over to manifests, kubelet indeed restarts the service correctly. So seems we are most likely really looking at a bug about "random files appearing temporarly in manifests directory" confuse kubelet in a way that it looses its ability to start downed services somehow...
|
also note that the number "4913" seems to be stable, so its probably not a PID .... I tried creating a random file called "1111" with binary garbage, whcih didnt cause problems restarting kube-scheduler on changes by a "cp" from a preedited place. So far only way i can reliably reproduce is by using vim from xenials: I am root user when using vim.. I straced vim and got the following (suggesting that its vim creating this file)
further looking i found that apparently vim is doing this: So guess our case is about a "rapidly appearing and disappearing" file that uses the syscalls from the strace? |
@asac nice digging! If your editor creates a temporary file which contains exactly the same pod object, this would leave kubelet confused and it will delete the pod when the file is deleted. This is a known issue, and the fix would be for kubelet to scan the content of the directory periodically to ensure the correct pods are started eventually. There is a bug tracking this but I couldn't find it at this moment. Please note that even if kubelet can self-recover by syncing periodically, the temporary can still cause unncessary disruption to your pod/workload (e.g., temporary downtime until the periodic sync kicks in). The best way to avoid disruptions like this is to copy your file to a separate directory, modify it, and then copy it back. |
Found it. It's #40123 |
Hi,
On 10 July 2017 at 19:56, Yu-Ju Hong ***@***.***> wrote:
further looking i found that apparently vim is doing this:
neovim/neovim#3460 <neovim/neovim#3460>
So guess our case is about a "rapidly appearing and disappearing" file
that uses the syscalls from the strace?
@asac <https://github.com/asac> nice digging! If your editor creates a
temporary file which contains exactly the same pod object, this would leave
kubelet confused and it will delete the pod when the file is deleted. This
is a known issue, and the fix would be for kubelet to scan the content of
the directory periodically to ensure the correct pods are started
eventually. There is a bug tracking this but I couldn't find it at this
moment. Please note that even if kubelet can self-recover by syncing
periodically, the temporary can still cause unncessary disruption to your
pod/workload (e.g., temporary downtime until the periodic sync kicks in).
The best way to avoid disruptions like this is to copy your file to a
separate directory, modify it, and then copy it back.
I dont see that vim is creating a file with the same content. it just
creates an empty file to see if it can create a file in the directory it
wants to write to with the right flags afaiui ... check out the code here:
https://github.com/vim/vim/blob/master/src/fileio.c#L3744
Can you explain problems for kubelet with such behaviour too or are we
looking at a different variant that might be better to fix?
—
… You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48219 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACQqoj0yTnUZOQJCO6Z3QHCKdopZz4DCks5sMmXHgaJpZM4OId5N>
.
|
If the temporary file did not contain the pod manifest, then that's not the problem. Can you post your kubelet log again but this time, include the messages around/before you edit the file. I don't think the log you posted before is complete |
the same problem in v1.7.3
here is my log and system information.
|
but if I move kube-apiserver.yaml to other dir except /etc/kubernetes/manifests, and then move back, it working. |
@ALL I face a problem shows like this issues describe, and I have add new issues(#55928) because the version of kubernetes is different with this one.
|
FWIW: When I edited So this problem may have been solved? |
hmm... will try when upgrading my apiserver next time ... hope a couple
days.
…On Tue 5. Dec 2017 at 16:53, Andreas Krüger ***@***.***> wrote:
FWIW: When I edited /etc/kubernetes/manifests/kube-apiserver.manifest
yesterday, that caused the kubelet v1.7.10 to restart the API server.
Reproducibly, several times.
So this problem may have been solved?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#48219 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACQqoqutqqPwzAQ3ND1dxZwqFENuyIpIks5s9WcOgaJpZM4OId5N>
.
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale Still facing this on v1.9.0. cc @kubernetes/sig-node-bugs @yujuhong |
@xiangpengzhao, do you mind posting the details - how you edited the file, and the associated kubelet log? |
I was facing the same issue with 1.10 when trying to add oidc parameters. For now the best thing is to avoid editing the manifest in-place. Copying the file from other location works just fine. |
I just encountered this issue on a kubeadm cluster version 1.10.1 after editing |
I encountered this issue on kube cluster running 1.9.3. After editing /etc/kubernetes/manifests/kube-apiserver.yaml with nano i was able to make updates to config file ( did not work after modifying file with vi) |
Guys, I filed #63910 to solve this issue. Please help review it. |
@dixudx I wonder the #63910 is the same problem as this issue reported. I have compiled the part of https://gist.github.com/wklken/145c8d70389c3f11381a1771623a3ba8
|
Automatic merge from submit-queue (batch tested with PRs 58690, 64773, 64880, 64915, 64831). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. ignore not found file error when watching manifests **What this PR does / why we need it**: An alternative of #63910. When using vim to create a new file in manifest folder, a temporary file, with an arbitrary number (like 4913) as its name, will be created to check if a directory is writable and see the resulting ACL. These temporary files will be deleted later, which should by ignored when watching the manifest folder. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #55928, #59009, #48219 **Special notes for your reviewer**: /cc dims luxas yujuhong liggitt tallclair **Release note**: ```release-note ignore not found file error when watching manifests ```
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I have the same issue i tried editing the kube-apiserver.yaml using nano but somehow I am not able to connect to the apiserver. I get cannot connect to the apiserver error. |
iirc, you need to restart the kubelet. the api container somehow knows the manifest was updated and reloads it. |
thank you netdisciple. i had an older parameter admission-control but i need to use - --enable-admission-plugins instead. now it works thank you. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Maybe a feature, most likely a dupe (found a few similar reading issues, some were closed), but since i am on RC i better mention it...
I upgrade from 1.7.0 beta.2 to rc.1 on my arm masternode by:
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Environment:
kubectl version
):kubectl version
Client Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.0-beta.2", GitCommit:"ceab7f7a6753c20d3be75463b17402fdcea856ba", GitTreeState:"clean", BuildDate:"2017-06-15T17:12:53Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/arm"}
Server Version: version.Info{Major:"1", Minor:"7+", GitVersion:"v1.7.0-rc.1", GitCommit:"6b9ded1649cfb512d4e88570c738aca9f8265639", GitTreeState:"clean", BuildDate:"2017-06-24T05:30:00Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/arm"}
self-hosted / scaleway
cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
uname -a
):uname -a
Linux 8bc81f5a-d92a-428f-b71c-0f31b3ce958f.pub.cloud.scaleway.com 4.9.20-std-1 #1 SMP Wed Apr 5 15:38:34 UTC 2017 armv7l armv7l armv7l GNU/Linux
kubeadm during alphas...
The text was updated successfully, but these errors were encountered: