Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host events cannot be obtained #100236

Open
aimuz opened this issue Mar 15, 2021 · 21 comments
Open

Host events cannot be obtained #100236

aimuz opened this issue Mar 15, 2021 · 21 comments
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@aimuz
Copy link
Contributor

aimuz commented Mar 15, 2021

What happened:

I shut down the host, and I don't see the related events through kubectl describe, but I can get the related events through kubectl get node

% kubectl get events
LAST SEEN   TYPE      REASON                    OBJECT                                  MESSAGE
31m         Normal    NodeNotReady              node/i-w0msdm7d                         Node i-w0msdm7d status is now: NodeNotReady
31m         Warning   NodeNotReady              pod/nginx-deployment-66b6c48dd5-g75sl   Node is not ready
31m         Warning   NodeNotReady              pod/nginx-deployment-66b6c48dd5-lj5hn   Node is not ready
29m         Normal    Starting                  node/i-w0msdm7d                         Starting kube-proxy.
29m         Normal    Starting                  node/i-w0msdm7d                         Starting kubelet.
29m         Warning   InvalidDiskCapacity       node/i-w0msdm7d                         invalid capacity 0 on image filesystem
29m         Normal    NodeAllocatableEnforced   node/i-w0msdm7d                         Updated Node Allocatable limit across pods
29m         Warning   Rebooted                  node/i-w0msdm7d                         Node i-w0msdm7d has been rebooted, boot id: 37f114e1-e547-4f58-b1f1-c16268f942cf
29m         Normal    NodeHasSufficientMemory   node/i-w0msdm7d                         Node i-w0msdm7d status is now: NodeHasSufficientMemory
29m         Normal    NodeHasSufficientPID      node/i-w0msdm7d                         Node i-w0msdm7d status is now: NodeHasSufficientPID
29m         Normal    NodeHasNoDiskPressure     node/i-w0msdm7d                         Node i-w0msdm7d status is now: NodeHasNoDiskPressure
29m         Normal    NodeReady                 node/i-w0msdm7d                         Node i-w0msdm7d status is now: NodeReady
29m         Normal    TaintManagerEviction      pod/nginx-deployment-66b6c48dd5-g75sl   Cancelling deletion of Pod default/nginx-deployment-66b6c48dd5-g75sl
29m         Normal    TaintManagerEviction      pod/nginx-deployment-66b6c48dd5-lj5hn   Cancelling deletion of Pod default/nginx-deployment-66b6c48dd5-lj5hn
29m         Warning   FailedMount               pod/nginx-deployment-66b6c48dd5-g75sl   MountVolume.SetUp failed for volume "default-token-qvb7z" : failed to sync secret cache: timed out waiting for the condition
29m         Warning   FailedMount               pod/nginx-deployment-66b6c48dd5-lj5hn   MountVolume.SetUp failed for volume "default-token-qvb7z" : failed to sync secret cache: timed out waiting for the condition
29m         Normal    SandboxChanged            pod/nginx-deployment-66b6c48dd5-lj5hn   Pod sandbox changed, it will be killed and re-created.
29m         Normal    SandboxChanged            pod/nginx-deployment-66b6c48dd5-g75sl   Pod sandbox changed, it will be killed and re-created.
29m         Normal    Pulled                    pod/nginx-deployment-66b6c48dd5-g75sl   Container image "nginx:1.14.2" already present on machine
29m         Normal    Pulled                    pod/nginx-deployment-66b6c48dd5-lj5hn   Container image "nginx:1.14.2" already present on machine
29m         Normal    Created                   pod/nginx-deployment-66b6c48dd5-g75sl   Created container nginx
29m         Normal    Created                   pod/nginx-deployment-66b6c48dd5-lj5hn   Created container nginx
29m         Normal    Started                   pod/nginx-deployment-66b6c48dd5-g75sl   Started container nginx
29m         Normal    Started                   pod/nginx-deployment-66b6c48dd5-lj5hn   Started container nginx

This is the data. You can't see it through the description. I wonder if this is the right behavior?

31m         Normal    NodeNotReady              node/i-w0msdm7d                         Node i-w0msdm7d status is now: NodeNotReady

Events obtained through kubectl describe

Events:
  Type     Reason                   Age                From        Message
  ----     ------                   ----               ----        -------
  Normal   Starting                 30m                kube-proxy  Starting kube-proxy.
  Normal   Starting                 30m                kubelet     Starting kubelet.
  Warning  InvalidDiskCapacity      30m                kubelet     invalid capacity 0 on image filesystem
  Normal   NodeAllocatableEnforced  30m                kubelet     Updated Node Allocatable limit across pods
  Warning  Rebooted                 30m                kubelet     Node i-w0msdm7d has been rebooted, boot id: 37f114e1-e547-4f58-b1f1-c16268f942cf
  Normal   NodeHasSufficientMemory  30m (x2 over 30m)  kubelet     Node i-w0msdm7d status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID     30m (x2 over 30m)  kubelet     Node i-w0msdm7d status is now: NodeHasSufficientPID
  Normal   NodeHasNoDiskPressure    30m (x2 over 30m)  kubelet     Node i-w0msdm7d status is now: NodeHasNoDiskPressure
  Normal   NodeReady                30m                kubelet     Node i-w0msdm7d status is now: NodeReady

After my investigation, I found that the uid of the two events

apiVersion: v1
count: 2
eventTime: null
firstTimestamp: "2021-03-06T12:40:22Z"
involvedObject:
  apiVersion: v1
  kind: Node
  name: i-w0msdm7d
  uid: 0ceac5fb-a393-49d7-b04f-9ea5f18de5e9
kind: Event

The uid of the events obtained through the description is like this

apiVersion: v1
count: 1
eventTime: null
firstTimestamp: "2021-03-15T06:56:53Z"
involvedObject:
  kind: Node
  name: i-w0msdm7d
  uid: i-w0msdm7d
kind: Event
lastTimestamp: "2021-03-15T06:56:53Z"
message: 'Node i-w0msdm7d has been rebooted, boot id: 37f114e1-e547-4f58-b1f1-c16268f942cf'

uid: 0ceac5fb-a393-49d7-b04f-9ea5f18de5e9 After my research, it is found that it is the uid of the node object

uid: i-w0msdm7d It seems to be caused by this

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet.go#L478

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Restart a host

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@aimuz aimuz added the kind/bug Categorizes issue or PR as related to a bug. label Mar 15, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 15, 2021
@neolit123
Copy link
Member

/sig node instrumentation

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 15, 2021
@logicalhan
Copy link
Member

/assign @dashpole

@dashpole
Copy link
Contributor

Good find. That is definitely wrong.

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 24, 2021
@dashpole
Copy link
Contributor

The only thing i'm not sure of is if the UID is actually available when the event is being created. That reference is created when the node is booting up, but the UID isn't available until the node object is created.

@aimuz
Copy link
Contributor Author

aimuz commented Mar 25, 2021

Is it possible to make it compatible so that the uid of the node event is always nodeName

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 23, 2021
@aimuz
Copy link
Contributor Author

aimuz commented Jun 23, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 23, 2021
@n4j n4j added this to Triaged in SIG Node Bugs Jul 9, 2021
@aimuz
Copy link
Contributor Author

aimuz commented Sep 17, 2021

@pacoxu #100847 (comment)

@dashpole
It is indeed the best practice to change to a standard way, but the impact involved is very large.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 16, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 15, 2022
@aimuz
Copy link
Contributor Author

aimuz commented Jan 15, 2022

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 15, 2022
@aimuz
Copy link
Contributor Author

aimuz commented Apr 15, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 15, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2022
@vaibhav2107
Copy link
Member

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 17, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2022
@aimuz
Copy link
Contributor Author

aimuz commented Oct 16, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 16, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2023
@k8s-triage-robot
Copy link

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jan 20, 2024
@dgrisonnet
Copy link
Member

ping @dashpole

@dashpole
Copy link
Contributor

Was this fixed by #106485?
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests

9 participants