Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet plugin #12

Closed
wants to merge 0 commits into from
Closed

Conversation

pohly
Copy link
Owner

@pohly pohly commented Aug 10, 2022

This takes the latest code from PR #7 and updates the code to make it simpler and enhance various aspects (for example, logging).

@pohly
Copy link
Owner Author

pohly commented Aug 10, 2022

Example run:

go run ./test/integration/cdi/example-driver --feature-gates ContextualLogging=true -v=6 kubelet-plugin  2>&1 | tee /tmp/log
I0810 14:00:40.251694  124569 server.go:107] Found KUBECONFIG environment variable set, using that..
I0810 14:00:40.252376  124569 loader.go:372] Config loaded from file:  /var/run/kubernetes/admin.kubeconfig
I0810 14:00:40.253071  124569 nonblockinggrpcserver.go:73] "dra: GRPC server started" endpoint="/var/lib/kubelet/plugins/example-driver/dra.sock"
I0810 14:00:40.253127  124569 nonblockinggrpcserver.go:73] "registrar: GRPC server started" endpoint="/var/lib/kubelet/plugins_registry/example-driver.cdi.k8s.io-reg.sock"
I0810 14:00:40.675216  124569 nonblockinggrpcserver.go:84] "registrar: handling request" requestID=1 request="&InfoRequest{}"
I0810 14:00:40.675267  124569 nonblockinggrpcserver.go:95] "registrar: handling request succeeded" requestID=1 response="&PluginInfo{Type:DRAPlugin,Name:example-driver.cdi.k8s.io,Endpoint:/var/lib/kubelet/plugins/example-driver/dra.sock,SupportedVersions:[1.0.0],}"
I0810 14:00:40.676105  124569 nonblockinggrpcserver.go:84] "registrar: handling request" requestID=2 request="&RegistrationStatus{PluginRegistered:true,Error:,}"
I0810 14:00:40.676136  124569 nonblockinggrpcserver.go:95] "registrar: handling request succeeded" requestID=2 response="&RegistrationStatusResponse{}"
I0810 14:00:54.020965  124569 nonblockinggrpcserver.go:84] "dra: handling request" requestID=1 request="&NodePrepareResourceRequest{Namespace:default,ClaimUid:801bfcb5-d9d2-48c3-b04f-84db86bc27e7,ClaimName:test-inline-claim-resource,ResourceHandle:{\"user_a\":\"b\"},}"
I0810 14:00:54.021381  124569 kubeletplugin.go:145] "dra: CDI file created" requestID=1 path="/var/run/cdi/example-driver.cdi.k8s.io-801bfcb5-d9d2-48c3-b04f-84db86bc27e7.json" device="example-driver.cdi.k8s.io/exampledevice=exampledevice"
I0810 14:00:54.021416  124569 nonblockinggrpcserver.go:95] "dra: handling request succeeded" requestID=1 response="&NodePrepareResourceResponse{CdiDevice:[example-driver.cdi.k8s.io/exampledevice=exampledevice],}"
I0810 14:00:58.795100  124569 nonblockinggrpcserver.go:84] "dra: handling request" requestID=2 request="&NodeUnprepareResourceRequest{Namespace:default,ClaimUid:801bfcb5-d9d2-48c3-b04f-84db86bc27e7,ClaimName:test-inline-claim-resource,CdiDevice:[example-driver.cdi.k8s.io/exampledevice=exampledevice],}"
I0810 14:00:58.795184  124569 kubeletplugin.go:159] "dra: CDI file removed" requestID=2 path="/var/run/cdi/example-driver.cdi.k8s.io-801bfcb5-d9d2-48c3-b04f-84db86bc27e7.json"
I0810 14:00:58.795210  124569 nonblockinggrpcserver.go:95] "dra: handling request succeeded" requestID=2 response="&NodeUnprepareResourceResponse{}"

// create json file
filePath := ex.getJsonFilePath(req.ClaimUid)
if err := os.WriteFile(filePath, jsonBytes, os.FileMode(0644)); err != nil {
if err := os.WriteFile(filePath, buffer, os.FileMode(0644)); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using this API ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, it would be nice to avoid rewriting the file if its content is the same. We might need support from CDI for comparing specs to implement that.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the new API yet. Let's wait for cncf-tags/container-device-interface#77 before making further changes.

Rewriting the file should happen very infrequenly. I wouldn't write extra code for it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, NodePrepareResource can be called by Kubelet any time and with any frequency. Rewriting CDI file should either be atomic(btw, spec.Write seems to do it atomically) or, better, not happening at all. We can't predict when Kubelet will decide to call NodePrepareResource. It could happen at the same time when runtime reads previously created CDI file.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use that API as it's most probably not going to change. Even if they add APIs for generating CDI file names it'll probably not change Write() API as a file path is passed to the spec constructor and Write API uses path that's set in the spec object.

Copy link

@bart0sh bart0sh Aug 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So your current code doesn't serialize that?

it does. There is a lock for this purpose, so two calls can't be done at the same time. However, it doesn't guarantee that NodePrepareResource can't be called for the (second) pod when runtime reads CDI file created by the previous call (for the first pod referencing the same claim).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's unlikely to happen, I agree. But it can happen, so it's better to avoid re-generate CDI files in my opinion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub

WriteSpec() uses Spec.Write() under the hood... which maybe should be an unexported Spec.write() instead.

Code in this PR does the following:

  • creates spec object (passing CDI file path to it)
  • marshals it to JSON
  • writes json to the file (with the same CDI file path as was passed to the cdi.NewSpec)

That's why I thought that creating a spec object and then calling spec.Write() looks as a reasonable way to replace that code.

It seems not the case though. Can you explain what needs to be done instead?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub

But preferably the Cache/Registry WriteSpec() interface should be used instead.

Wouldn't this require reading all CDI files present in CDI directories? This can potentially slow down the plugin. However, if it's done once on the plugin start it shouldn't be a problem, right?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a TODO about using newer API(s) but for now would prefer to merge the code as-is so that we can make progress with other, more important aspects (like moving the code around).

@@ -114,8 +110,10 @@ func (ex *examplePlugin) NodePrepareResource(ctx context.Context, req *drapbv1.N
}

spec := specs.Spec{
Version: cdiVersion,
Version: "0.2.0",
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klihub: what is the "right" version string to use here?

Isn't it defined indirectly through the Go API, i.e. specs.Spec, specs.Device, etc.?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to cncf-tags/container-device-interface#60 (comment) and added a TODO to the code.

allErrs = append(allErrs, field.Required(fldPath, ""))
}

if len(driverName) > 63 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 63? In this case a constant or variable with a meaningful name would be more readable I believe.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same length limit as for CSI driver name. The entire function was copied from there.

I am following how API validation is handled elsewhere in Kubernetes:

  • not all limitations are fully documented in code comments (but perhaps should be - not sure)
  • no constants for such limits, neither in the API nor in the code

I can imagine that defining this limit as part of the API would be useful. I need to check with the API reviewers whether that is a good or bad idea.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a TODO.

logger := klog.FromContext(ctx)
deviceName := "claim-" + req.ClaimUid
kind := ex.driverName + "/test"
filePath := ex.getJSONFilePath(req.ClaimUid)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd propose to move this variables to the place where they're used(filePath), or get rid of them (kind, deviceName) if they're used only once.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I kept logger where it is (top of the function) because if it was created directly before the first usage and then later another log line got added earlier in the function, the logger initialization would have to be moved.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deviceName is used twice.

@pohly pohly force-pushed the dynamic-resource-allocation branch from 4b19411 to dad1a28 Compare August 12, 2022 14:36
@pohly
Copy link
Owner Author

pohly commented Aug 12, 2022

All of these changes are now part of the dynamic-resource-allocation branch, after squashing the commits.

@pohly pohly closed this Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants