Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graduate 3rd party device monitoring plugins to beta! #77274

Merged

Conversation

@RenaudWasTaken
Copy link
Member

commented Apr 30, 2019

Signed-off-by: Renaud Gaubert rgaubert@nvidia.com

What type of PR is this?
/kind feature

What this PR does / why we need it: Graduates the KubeletPodResources API to beta.
See kubernetes/enhancements#606

Which issue(s) this PR fixes: N/A

Special notes for your reviewer:
cc @dashpole @derekwaynecarr @dchen1107 @guptaNswati

Does this PR introduce a user-facing change?:

Enable 3rd party device monitoring by default
@dashpole

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

/sig node
/priority important-soon
/retest

@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented Apr 30, 2019

/retest

@dashpole

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Apr 30, 2019

@RenaudWasTaken RenaudWasTaken force-pushed the RenaudWasTaken:kubeletpodresources-beta branch from 96e394e to defbac4 May 22, 2019

@RenaudWasTaken RenaudWasTaken force-pushed the RenaudWasTaken:kubeletpodresources-beta branch from defbac4 to d1b55d3 May 22, 2019

@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 27, 2019

/retest
/assign @dchen1107 @derekwaynecarr

@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 27, 2019

/retest

2 similar comments
@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 27, 2019

/retest

@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 27, 2019

/retest

@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 28, 2019

/retest

@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 29, 2019

/retest

@fejta-bot

This comment has been minimized.

Copy link

commented May 30, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

2 similar comments
@fejta-bot

This comment has been minimized.

Copy link

commented May 30, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@fejta-bot

This comment has been minimized.

Copy link

commented May 30, 2019

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Graduate KubeletPodResources to beta!
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

Isolated a cleanup race condition after looking into the persistent failure of the e2e job (which I thought was a transient error).

It seems to be linked to how gRPC handles connection crashes on unix socket. When we look at the kubelet's log we can clearly see the following happening:

  • Kubelet starts correctly
    • Kubelet creates the socket file correctly
    • Kubelet listens on the socket file
    • Kubelet crashes on some later unrelated error
  • Kubelet restarts
    • Kubelet crashes on socket creation net.Listen

Logs are available here: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/77274/pull-kubernetes-kubemark-e2e-gce-big/1134075490731036672/artifacts/e2e-77274-ac87c-minion-group-1tpz/kubelet.log

Looking a bit into that specific error message, people seem to be suggesting that this is due to a cleanup issue.

EDIT: I settled over a nicer solution, create the socket on a temp file and move it over the new file.

F0530 13:02:08.494004    1554 server.go:179] Failed to create listener for podResources endpoint: listen unix /var/lib/kubelet/pod-resources/kubelet.sock: bind: no such file or directory

@RenaudWasTaken RenaudWasTaken force-pushed the RenaudWasTaken:kubeletpodresources-beta branch from d1b55d3 to 74887e8 May 30, 2019

@k8s-ci-robot k8s-ci-robot added size/S and removed lgtm size/XS labels May 30, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented May 30, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, RenaudWasTaken

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@RenaudWasTaken RenaudWasTaken force-pushed the RenaudWasTaken:kubeletpodresources-beta branch 2 times, most recently from 7cd5750 to fca6c59 May 30, 2019

@dashpole

This comment has been minimized.

Copy link
Contributor

commented May 30, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 30, 2019

@RenaudWasTaken RenaudWasTaken force-pushed the RenaudWasTaken:kubeletpodresources-beta branch from fca6c59 to a03d2d4 May 30, 2019

@k8s-ci-robot k8s-ci-robot removed the lgtm label May 30, 2019

@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

/retest

2 similar comments
@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

/retest

@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 30, 2019

/retest

@dashpole

This comment has been minimized.

Copy link
Contributor

commented May 30, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 30, 2019

@RenaudWasTaken

This comment has been minimized.

Copy link
Member Author

commented May 31, 2019

/test pull-kubernetes-kubemark-e2e-gce-big

@k8s-ci-robot k8s-ci-robot merged commit fc00578 into kubernetes:master May 31, 2019

21 checks passed

cla/linuxfoundation RenaudWasTaken authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-conformance-image-test Skipped.
pull-kubernetes-cross Skipped.
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-csi-serial Skipped.
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gce-storage-slow Skipped.
pull-kubernetes-godeps Skipped.
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped.
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
pull-publishing-bot-validate Skipped.
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.