Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node: sample-device-plugin: register to kubelet by default and ensure re-registration to kubelet on kubelet restarts #118534

Merged

Conversation

swatisehgal
Copy link
Contributor

@swatisehgal swatisehgal commented Jun 7, 2023

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

In #115107 we added an environment variable to control the registration of sample
device plugin to kubelet. The intent of this patch is to ensure that the default
behaviour of the plugin is to register to kubelet (in case no environment
variable is specified).
In addition to that, we want to ensure that the plugin registers itself not just once.
It should re-register itself to kubelet in case of node reboot or kubelet restarts.

Which issue(s) this PR fixes:

Related to:

Special notes for your reviewer:

This PR would require a follow-up to promote the sample device plugin image similar to the work done here.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 7, 2023
@swatisehgal
Copy link
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 7, 2023
@swatisehgal swatisehgal changed the title node: sample-device-plugin: register to kubelet by default [WIP] node: sample-device-plugin: register to kubelet by default Jun 7, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2023
@swatisehgal swatisehgal changed the title [WIP] node: sample-device-plugin: register to kubelet by default node: sample-device-plugin: register to kubelet by default Jun 7, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2023
@swatisehgal swatisehgal changed the title node: sample-device-plugin: register to kubelet by default [WIP]node: sample-device-plugin: register to kubelet by default Jun 7, 2023
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2023
@swatisehgal swatisehgal force-pushed the sample-dp-register-by-default branch 2 times, most recently from ec9c8de to b6f43aa Compare June 7, 2023 15:37
@swatisehgal swatisehgal changed the title [WIP]node: sample-device-plugin: register to kubelet by default node: sample-device-plugin: register to kubelet by default Jun 7, 2023
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 7, 2023
@swatisehgal
Copy link
Contributor Author

/cc @ffromani

@swatisehgal
Copy link
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 26, 2023
@swatisehgal
Copy link
Contributor Author

@mrunalp @klueska Can you please take a look? Once we have approval on container manager (pkg/kubelet/cm) updates, we can unhold this PR.

swatisehgal and others added 6 commits October 17, 2023 12:14
In issue: 115107 we added an environment variable to control the registration of sample
device plugin to kubelet. The intent of this patch is to ensure that the default
behaviour of the plugin is to register to kubelet (in case no environment
variable is specified).

In addition to that, we want to ensure that the plugin registers itself not just once.
It should re-register itself to kubelet in case of node reboot or kubelet restarts.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In case `REGISTER_CONTROL_FILE` is specified, we want to ensure that the
registration is triggered by deletion of the control file. This is
applicable both when the registration happens for the first time and
subsequent ones because of kubelet restarts.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
If the user specifies the intent to control registration process, we rely on
registration triggers (deletion of control file) to prompt registration.

This behvaiour is expected to be consistent across kubelet restarts and therefore
across the watch calls where we watch for changes to the unix socket so we make
this part of Stub object instead of a parameter.

Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Setting a reasonable default in case `PLUGIN_SOCK_DIR`
environment variable is not specified.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Add kubeletSocket file to fsnotify instead of polling and waiting for deletion
of device plugin unix socket as a way of detecting kubelet restart. We need to
ensure that the device plugin re-registers itself after kubelet restart depending
on the configured registration mode (auto-registration or controller registration).

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Add retry mechanism to handle cases where after kubelet restarts, the device
plugin unix socket(s) were created but not ready to serve yet.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 17, 2023
@swatisehgal
Copy link
Contributor Author

LGTM label was removed as the PR had to be rebased.

@SergeyKanzhelev
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 18, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: d4cbc01fed243e279f7cade8611f421b067d65a4

@derekwaynecarr
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, dims, SergeyKanzhelev, swatisehgal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@swatisehgal
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 23, 2023
@k8s-ci-robot k8s-ci-robot merged commit f41ede6 into kubernetes:master Oct 23, 2023
14 of 15 checks passed
SIG Node CI/Test Board automation moved this from PRs - Needs Approver to Done Oct 23, 2023
SIG Node PR Triage automation moved this from Needs Approver to Done Oct 23, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 23, 2023
@swatisehgal swatisehgal deleted the sample-dp-register-by-default branch October 23, 2023 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

None yet

10 participants