Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor all device-plugin logic into separate 'plugin' package under the devicemanager #109016

Merged
merged 3 commits into from May 4, 2022

Conversation

klueska
Copy link
Contributor

@klueska klueska commented Mar 25, 2022

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This PR moves all logic specific to interacting with a device plugin into a separate plugin directory under the devicemanager. This change makes the code much more readable and is in preparation for an upcoming change to allow the devicemanager to service multiple plugin APIs at once (as we begin to introduce a v1beta2 API).

The core of the logic was moved mostly unchanged, and the tests were only updated to accommodate the new structure. None of the core logic of the tests themselves has changed.

Special notes for your reviewer:

Feel free to begin reviewing this PR, but it should not land in master until after the 1.24 release branch has been cut and master is opened up again for 1.25 contributions.
/hold

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 25, 2022
@k8s-ci-robot k8s-ci-robot added area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 25, 2022
@klueska
Copy link
Contributor Author

klueska commented Mar 25, 2022

/sig node

@klueska klueska force-pushed the refactor-devicemanager branch 2 times, most recently from 8f2be3a to b63f706 Compare March 25, 2022 12:42
@klueska klueska changed the title Reactor all device-plugin logic into separate 'plugin' package under the devicemanager Refactor all device-plugin logic into separate 'plugin' package under the devicemanager Mar 25, 2022
@klueska klueska force-pushed the refactor-devicemanager branch 3 times, most recently from 9eebca5 to ad5c6a6 Compare March 25, 2022 17:24
@ehashman ehashman added this to Triage in SIG Node PR Triage Mar 25, 2022
@SergeyKanzhelev SergeyKanzhelev moved this from Triage to Archive-it in SIG Node CI/Test Board Mar 30, 2022
@klueska
Copy link
Contributor Author

klueska commented Apr 4, 2022

Now that main has been frozen for 1.24

/unhold

/cc @fromanirh @swatisehgal @kad @bart0sh

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 4, 2022
@klueska
Copy link
Contributor Author

klueska commented Apr 29, 2022

@bart0sh
Copy link
Contributor

bart0sh commented Apr 29, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 29, 2022
This is the first step towards being able to support a new plugin API version
in parallel with the existing one.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
Signed-off-by: Kevin Klues <kklues@nvidia.com>
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 29, 2022
Copy link
Member

@mkumatag mkumatag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm for test/images changes
/approve
/hold for others to review

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 29, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: klueska, mkumatag

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2022
@bart0sh
Copy link
Contributor

bart0sh commented Apr 29, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 29, 2022
@ffromani
Copy link
Contributor

/lgtm

@klueska
Copy link
Contributor Author

klueska commented Apr 29, 2022

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 29, 2022
@swatisehgal
Copy link
Contributor

Apologies for the delay in getting to this. Looks good. Looking forward to the upcoming changes in the device manager that would allow to service multiple plugin APIs at once!

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit 05e3919 into kubernetes:master May 4, 2022
SIG Node CI/Test Board automation moved this from PRs Waiting on Author to Done May 4, 2022
SIG Node PR Triage automation moved this from Triage to Done May 4, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone May 4, 2022
@zhuchenwang
Copy link

We encountered an issue with this PR. The device plugin simply returns nil for GetDevicePluginOptions call. There is no problem in k8s 1.24. However, this PR will call GetDevicePluginOptions during the plugin registration which makes the device plugin crash due to nil pointer.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x1 pc=0x7fd2ab]

goroutine 22 [running]:
k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1.(*DevicePluginOptions).MarshalToSizedBuffer(...)
	/go/src/github.com/kubevirt/macvtap-cni/vendor/k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1/api.pb.go:1546
k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1.(*DevicePluginOptions).Marshal(0x0)
	/go/src/github.com/kubevirt/macvtap-cni/vendor/k8s.io/kubelet/pkg/apis/deviceplugin/v1beta1/api.pb.go:1529 +0x6b
google.golang.org/protobuf/internal/impl.legacyMarshal({{}, {0x9e4858, 0xc00021e660}, {0x0, 0x0, 0x0}, 0x0})
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/protobuf/internal/impl/legacy_message.go:404 +0xa2
google.golang.org/protobuf/proto.MarshalOptions.marshal({{}, 0x80, 0x0, 0x0}, {0x0, 0x90ae20, 0x0}, {0x9e4858, 0xc00021e660})
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/protobuf/proto/encode.go:163 +0x27b
google.golang.org/protobuf/proto.MarshalOptions.MarshalAppend({{}, 0x20, 0xae, 0x90}, {0x0, 0x0, 0x0}, {0x9cba80, 0xc00021e660})
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/protobuf/proto/encode.go:122 +0x79
github.com/golang/protobuf/proto.marshalAppend({0x0, 0x0, 0x0}, {0x7f21911ceb28, 0x0}, 0x58)
	/go/src/github.com/kubevirt/macvtap-cni/vendor/github.com/golang/protobuf/proto/wire.go:40 +0xa5
github.com/golang/protobuf/proto.Marshal(...)
	/go/src/github.com/kubevirt/macvtap-cni/vendor/github.com/golang/protobuf/proto/wire.go:23
google.golang.org/grpc/encoding/proto.codec.Marshal({}, {0x90ae20, 0x0})
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/grpc/encoding/proto/proto.go:45 +0x4e
google.golang.org/grpc.encode({0x7f21911cd1e8, 0xd7c718}, {0x90ae20, 0x0})
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/grpc/rpc_util.go:594 +0x44
google.golang.org/grpc.(*Server).sendResponse(0xc0000b5340, {0x9e0e60, 0xc0001f6180}, 0xc00023e120, {0x90ae20, 0x0}, {0x0, 0x0}, 0x0, {0x0, ...})
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/grpc/server.go:1082 +0x18e
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0000b5340, {0x9e0e60, 0xc0001f6180}, 0xc00023e120, 0xc000185290, 0xd3f860, 0x0)
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/grpc/server.go:1332 +0xd93
google.golang.org/grpc.(*Server).handleStream(0xc0000b5340, {0x9e0e60, 0xc0001f6180}, 0xc00023e120, 0x0)
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/grpc/server.go:1626 +0xa2a
google.golang.org/grpc.(*Server).serveStreams.func1.2()
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/grpc/server.go:941 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/src/github.com/kubevirt/macvtap-cni/vendor/google.golang.org/grpc/server.go:939 +0x294

I guess our device plugin probably should return an empty object instead of nil. However, I couldn't find any document that states this enforcement. Maybe this could be an enhancement to the device plugin document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

None yet

7 participants