Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRA:add example driver #7

Conversation

hj-johannes-lee
Copy link

@hj-johannes-lee hj-johannes-lee commented Aug 4, 2022

DRA: add an example driver

Copy link
Owner

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs more work. Can you address this in the next few working days?

klog.Infof("Creating CDI Files")

containerEditsInDevices := &cdiapi.ContainerEdits{}
containerEditsInDevices.Append(&cdiapi.ContainerEdits{ContainerEdits: &specs.ContainerEdits{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard-coding some content, decode the handle string into a string/string map and set that as environment variables. That's what has been documented as functionality of the example driver in the README and what the controller implements.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe done.! If you think it's done. Please, resolve it.! I am not sure.

@@ -0,0 +1,134 @@
package server
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding new files, remember to add the copyright header.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.!

@@ -44,6 +44,7 @@ import (
"k8s.io/component-base/metrics/legacyregistry"
"k8s.io/component-base/term"
"k8s.io/klog/v2"
plugin "k8s.io/kubernetes/test/integration/cdi/example-driver/kubeletplugin"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refactor the code:

  • generic functionality belongs under "example-driver" (like "example-driver/controller") and will be moved into a staging repo later
  • the driver code itself is under "example-driver/app"

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe done.! I think the structure is almost same as Kevin's one. example-driver/kubelet-plugin may also move together with example-driver/controller. I do not know where it's moved to, but I understand your point roughly.

Anyway, the directory is for generic functionalities of kubelet-plugin (registration server, registrar, grpc server).
And, the only kubeletplugin.go file in example-driver/app is the part that actually has the example driver functionalities (prepare/unprepare).

If you think it's done. Please, resolve it.! I am not sure.

return nil
}

var jsonFilePath = "/etc/cdi/vendor.json"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot hard-code a file name like this.

@bart0sh: is there a naming convention that vendors should follow to avoid overwriting each others files?

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this when I deployed shared-resourceclaim.yaml.
Interestingly, claimNames are actually unique whereas claimUids are not.
Do you think, we can use this for naming the json files?

I0805 02:00:47.574983 2621373 kubeletplugin.go:86] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:default,ClaimUid:094e7ec3-4713-4404-97b3-08ddd180a338,ClaimName:f1f5c13c-e908-4edf-9fbc-71ccbc643995-example,ResourceHandle:{"user_a":"b"},}
I0805 02:00:47.575035 2621373 kubeletplugin.go:94] Creating CDI Files
I0805 02:00:47.575060 2621373 kubeletplugin.go:135] CDI spec is validated
I0805 02:00:47.575197 2621373 kubeletplugin.go:147] Json file is created at /etc/cdi/vendor.json
I0805 02:00:47.575217 2621373 kubeletplugin.go:152] Return: vendor.com/device=example
I0805 02:00:47.580004 2621373 kubeletplugin.go:86] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:default,ClaimUid:094e7ec3-4713-4404-97b3-08ddd180a338,ClaimName:441e23ec-9cac-47d1-898b-b1e879d68646-example,ResourceHandle:{"user_a":"b"},}
I0805 02:00:47.580055 2621373 kubeletplugin.go:94] Creating CDI Files
I0805 02:00:47.580090 2621373 kubeletplugin.go:135] CDI spec is validated
I0805 02:00:47.580274 2621373 kubeletplugin.go:147] Json file is created at /etc/cdi/vendor.json
I0805 02:00:47.580298 2621373 kubeletplugin.go:152] Return: vendor.com/device=example

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe done.! It is based on the guese that ClaimName seems unique for each pod though they share same resource claim. I tested when Prepare and Unprepare functions are called.

If you think it's done. Please, resolve it.! I am not sure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh @pohly Please see this log and let me know what to do. Idk why uid is same, and name is unique.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hj-johannes-lee uid is the same because 2 pods share the same claim.

Claim name is not passed to this call. It's a bug in my code. Here is the fix

Here is a plugin output with the applied fix:

I0806 02:16:14.676485    3776 nodeserver.go:51] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:default,ClaimUid:61e69ce7-e1f7-4d15-9f6f-7c6192ba7a91,ClaimName:shared-claim,ResourceHandle:{"user_a":"b"},}
I0806 02:16:14.676552    3776 nodeserver.go:59] Creating CDI Files
I0806 02:16:14.676575    3776 nodeserver.go:89] CDI spec is validated
I0806 02:16:14.676979    3776 nodeserver.go:101] Json file is created at /etc/cdi/vendor.json
I0806 02:16:14.677001    3776 nodeserver.go:106] Return: vendor.com/device=example
I0806 02:16:14.684296    3776 nodeserver.go:51] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:default,ClaimUid:61e69ce7-e1f7-4d15-9f6f-7c6192ba7a91,ClaimName:shared-claim,ResourceHandle:{"user_a":"b"},}
I0806 02:16:14.684429    3776 nodeserver.go:59] Creating CDI Files
I0806 02:16:14.684465    3776 nodeserver.go:89] CDI spec is validated
I0806 02:16:14.684733    3776 nodeserver.go:101] Json file is created at /etc/cdi/vendor.json
I0806 02:16:14.684756    3776 nodeserver.go:106] Return: vendor.com/device=example
I0806 02:16:18.556491    3776 nodeserver.go:112] NodeUnprepareResource is called: request: &NodeUnprepareResourceRequest{Namespace:default,ClaimUid:61e69ce7-e1f7-4d15-9f6f-7c6192ba7a91,ClaimName:shared-claim,CdiDevice:[vendor.com/device=example],}
I0806 02:16:19.561129    3776 nodeserver.go:112] NodeUnprepareResource is called: request: &NodeUnprepareResourceRequest{Namespace:default,ClaimUid:61e69ce7-e1f7-4d15-9f6f-7c6192ba7a91,ClaimName:shared-claim,CdiDevice:[vendor.com/device=example],}

Copy link

@bart0sh bart0sh Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea is to make a hash out of parameters and use that as a suffix. This would complicate your code a little bit though as you should keep track of used file names and only delete them when processing the last NodeUnprepareResource call for the same resource.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is NodePrepareResource called twice when it is the same claim? It should only be called once!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly should be fixed in #9

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a plugin output fore the shared claim example with changes from #9:

I0809 00:31:42.190905   11002 server.go:106] Found KUBECONFIG environment variable set, using that..
I0809 00:31:42.194965   11002 noderegistrar.go:49] Starting Registration Server at: /var/lib/kubelet/plugins_registry/example-driver.cdi.k8s.io-reg.sock
I0809 00:31:42.195060   11002 noderegistrar.go:58] Registration Server started at: /var/lib/kubelet/plugins_registry/example-driver.cdi.k8s.io-reg.sock
I0809 00:31:43.710827   11002 registrationserver.go:31] Received GetInfo call: &InfoRequest{}
I0809 00:31:43.711683   11002 registrationserver.go:42] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
I0809 00:31:54.967270   11002 nodeserver.go:51] NodePrepareResource is called: request: &NodePrepareResourceRequest{Namespace:default,ClaimUid:0df67e02-93ff-4e14-8f00-c815af4f511f,ClaimName:shared-claim,ResourceHandle:{"user_a":"b"},}
I0809 00:31:54.967332   11002 nodeserver.go:59] Creating CDI Files
I0809 00:31:54.967363   11002 nodeserver.go:89] CDI spec is validated
I0809 00:31:54.967747   11002 nodeserver.go:101] Json file is created at /etc/cdi/vendor.json
I0809 00:31:54.967767   11002 nodeserver.go:106] Return: vendor.com/device=example
I0809 00:31:58.788329   11002 nodeserver.go:112] NodeUnprepareResource is called: request: &NodeUnprepareResourceRequest{Namespace:default,ClaimUid:0df67e02-93ff-4e14-8f00-c815af4f511f,ClaimName:shared-claim,CdiDevice:[vendor.com/device=example],}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh Thanks! :)

*/

// Package app does all of the work necessary to configure and run a
// Kubernetes app process.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this comment. We would need to rewrite it and if we wanted to, it should be in a stand-alone doc.go file.

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AH, i thought I checked all comments, but still there were unnecessary things. Let me check once more.

func newExampleDriver(config pluginConfig) (*examplePlugin, error) {

if err := os.MkdirAll(filepath.Dir(config.draAddress), 0750); err != nil {
return nil, err
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General remark about error handling:

  • Always wrap errors with fmt.Errorf("do something: %v", err) unless it is absolutely certain that err itself already contains all information that a user or developer might need. In this case, os guarantees that the file path is included, but it might not be clear what the code was trying to do, so fmt.Errorf("create socket directory: %v", err) would be better.
  • Don't log errors that get returned to the caller. The caller gets to decide how to report the error to the user, not the callee. This isn't done here, but I've seen it below.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating this directory in the constructor feels weird. startPlugin looks like a better place for this.

}), nil
}

func startPlugin(config *pluginConfig) error {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a stand-alone function and not a method of examplePlugin? This seems backwards to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, true. let me fix it.

}

// Run a grpc server for the driver, that listens on draAddress
plugin.StartNonblockingGrpcServer(config.draAddress, driver)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the GRPC server get started first before telling kubelet about it?

}

var cdiVersion = "0.2.0"
var kind = "vendor.com/device"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like the JSON file name, the "kind" also needs to be unique. Work with @bart0sh to figure out how to do this, I am not sure.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look like constants, not variables.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @pohly here. 'device' is too generic name. vendor.com is ok.
I'd suggest two things regarding CDI file naming:

  • get CDI directory from the command line with /var/run/cdi as a default
  • use this naming convention: <cdi directory>/<vendor>-<device>-<claim uid or something unique>.cdi

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this on Slack.

Let's use <cdi directory>/<driver name>-<claim uid>.json, with <cdi directory being configurable and defaulting to /var/run/cdi. Inside the file, let's use k8s.io as vendor and <driver name>/exampledevice as kind.

The kind doesn't need to be unique, does it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't, but fully qualified device name must be unique. Otherwise runtime wouldn't be able to get CDI file by device fqdn. This is not the case in the current implementation and needs to be fixed.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Drop the random suffix to simplify idempotency.

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly what about kind? You said before <driver name>/example-device, but @bart0sh now says <vendor>/exmaple-device

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly @bart0sh I didn't get any answer about 'vendor'?

Please read what you were talking. Who do I need to listen to if you talk different things?

Patrick: kind = /example-device // drivername = vendor
Ed: kind = /example-device // drivername != vendor

What do I need to do? Did I understand wrong?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're different people, why should we talk the same thing? @pohly is a main reviewer as I've mentioned before.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get any answer about 'vendor'?

Here is mine in case you've missed it:

[bart0sh](https://github.com/bart0sh) [4 hours ago](https://github.com/pohly/kubernetes/pull/7#discussion_r941174593)
vendor is "k8s.io", driverName is "example-driver.cdi." + vendor, kind is vendor+"/exampledevice" in my understanding

@@ -1,32 +1,41 @@
# One external resource claim, one pod, two containers.
# One container uses resource, one does not.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove changes to the example YAMLs from this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, I was testing with Ed's one,,


// Package app does all of the work necessary to configure and run a
// Kubernetes app process.
package server
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy-and-paste error. This is not the "server" package...

Add doc.go file where you explain what this package is about.

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, sorry it's not the matter of copy and paste..
it used to be in 'server' directory under 'kubernetes-plugin' and decided to remove the folder since it seems to be incorrect name. So,, I just forgot to rename it..

return false, nil
}

func StartRegistrar(driverName, draAddress, pluginRegistrationPath string) error {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation?

s := newNonBlockingGRPCServer()

s.Start(draAddress, driver)
s.Wait()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this be non-blocking when it waits for the server to terminate?

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's also from.. csi hostpath..
https://github.com/kubernetes-csi/csi-driver-host-path/blob/master/pkg/hostpath/hostpath.go#L126

Let me check it if I understand..

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Run there is blocking. But here you use it in a functions saying "non-blocking".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah.. it's blocking..

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really resolved?

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh It does not block any more?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, resolved then. Thanks.


func (s *nonBlockingGRPCServer) Start(endpoint string, ns drapbv1.NodeServer) {

s.wg.Add(1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems to haven gotten lost as you copied this code around: where is the corresponding s.wg.Done()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did I?
https://github.com/kubernetes-csi/csi-driver-host-path/blob/master/pkg/hostpath/server.go#L43
Maybe,, the example I got from was not really good option...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I studed about waitinggroup a bit, and I understand your point.!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where it went of the rails, but s.wg.Add(1) followed by s.wg.Done() makes no sense. It becomes a no-op. The entire wait group serves no purpose anymore in the current revision of the code.

Perhaps it is not needed (always spawn goroutine, never wait for it), but then the entire code could be simpler.

@hj-johannes-lee
Copy link
Author

hj-johannes-lee commented Aug 5, 2022

@pohly
Updates:

  1. unresolved things above are left not changed. Those are about drivername, jsonfilepath, etc that are hard coded. Let me change after making it sure that "claimuid" is unique. It seems not.
  2. I set "valid" form of "ResourceHandle" that user can write in yaml files as "env: foo bar". Plugin checks very simply just by seeing whether the key is "user_env" or not. If we are gonna make it more complex (with mounts, hooks, or whatever specific for the example plugin) then I will change it as you think as good. But, as you said, for now it's just handling "env" for now. So, I made it like that.
  3. Pod yaml files should not work with the plugin for now, because it checks 'if the parameters are valid form or not'. Do I update those yamls also?

@bart0sh
Copy link

bart0sh commented Aug 5, 2022

@hj-johannes-lee re 2 and 3: can you implement it the way it's described in the README?

@hj-johannes-lee
Copy link
Author

hj-johannes-lee commented Aug 8, 2022

@bart0sh AH, it's what I meant last time: two types of parameters are user_ and admin_.I didn't remember that the another one has prefix 'admin_'.

But, anyway, as I told, for now, I can see only one type, user_, from the request. Is there any reason why admin_ is not possible to get from the request.GetResourceHandle()?
If you want me to make as it is, I need to get admin_ resourcehandle also from the request?

@pohly
Copy link
Owner

pohly commented Aug 8, 2022

I can see only one type, user_, from the request. ... Is there any reason why admin_ is not possible to get from the request.GetResourceHandle()?

That it is not in any of the current examples does not mean that it isn't possible (cannot draw conclusions from an example).

But there is a reason why it is not in the examples already, and that reason is that the reference to the ConfigMap isn't possible at the moment without a namespace in the parameter reference: kubernetes#111023 (comment)

@hj-johannes-lee hj-johannes-lee force-pushed the dra-example-driver branch 2 times, most recently from 58c50f5 to 2989811 Compare August 9, 2022 01:03
@pohly pohly force-pushed the dynamic-resource-allocation branch from 1f00794 to ca95a8a Compare August 9, 2022 06:21
@pohly
Copy link
Owner

pohly commented Aug 9, 2022

@hj-johannes-lee: can you rebase on top of the latest branch?

return nil, fmt.Errorf("failed to marshal: %v", err)
}

if err := os.WriteFile(ex.getJsonFilePath(req.ClaimUid), doc, os.FileMode(0644)); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was suggested by CDI devs to use Tempfile to avoid name conflicts. Please, use that.

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! But the document says it's deprecated. So, maybe another recommendation in the document 'os.CreateTemp'.!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, use os.CreateTemp or any API that creates temporary file.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this marked as resolved? It's still using os.WriteFile.

Regarding the proposal to use some tempfile function: that makes it harder to ensure that the function is idempotent. Before writing a new temp file, it would have to read all CDI files to check whether a previous invocation already wrote it.

I prefer to have a deterministic name that is guaranteed (by convention) to be unique. If no such convention exists, then CDI should define one.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly Because,, you are writing hundreds of comments, I cannot see what is what if I do not just resolve after solving in my local env.

Do you mean, we need to still use writefile?
getJsonFilePath will return the name you suggested earlier.

Copy link

@bart0sh bart0sh Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because,, you are writing hundreds of comments

Just pick up one conversation, fix it. Push the code and only after that resolve. You don't need to resolve them all at once. Alternatively we can agree that reviewers will resolve them when they think they are resolved.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, from now on, I will never click 'Resolve'. Either of you resolve please!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

klog.Infof("Json file " + ex.getJsonFilePath(req.ClaimUid) + " is removed")
} else if errors.Is(err, os.ErrNotExist) {
// jsonFile does not exist
klog.Infof("Json file " + ex.getJsonFilePath(req.ClaimUid) + " is already deleted")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be a warning I believe. The fact that it's deleted signals that something went wrong.

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It's a bit difficult to know what to put. There are two many options, and, to be honest, documents do not help that much no matter how many times I read... Thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest not to overcomplicate things here. It's an example code. I'd do something like this:

if err := WriteFile(path); err != nil {
    return nil, fmt.Errorf("error creating CDI file %s: %v", path, err)
}

err will contain more info about the reason.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not resolved either?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pohly You will see a bit later if I already clicked "resolve". If I keep it open, this page is ..... too long

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if it is marked resolved without me checking it first, then I won't know that I still need to check it because GitHub hides the discussion thread.

// jsonFile does not exist
klog.Infof("Json file " + ex.getJsonFilePath(req.ClaimUid) + " is already deleted")
} else {
return nil, fmt.Errorf("unexpected error: %v", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give more details here, please. 'Unexpected error' doesn't explain much.

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do actually not know what can go there..
File is neither 'existing', nor 'not-existing'.. So, I thought the error itself would be 'unexpected error'.

Do you have an idea?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're trying to write a CDI file and getting unexpected error. How about fmt.Errorf("Unexpected error writing CDI file %s: %v", path, err)?

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha!! Thanks! But, this part is not 'writing' but 'checking fileinfo', and I think path is not necessary since os.Stat() returns '*PathError'!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should start with a lowercase letter. Log messages should start with uppercase.

@@ -77,6 +77,7 @@ func NewCommand() *cobra.Command {

fs = sharedFlagSets.FlagSet("CDI")
driverName := fs.String("drivername", "example-driver.cdi.k8s.io", "Resource driver name.")
cdiDir := fs.String("cdi-dir", "/var/run/cdi", "CDI directory where the driver sockets and CDI files will be")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"/var/run/cdi" should be a constant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, sorry??
What you said before:

get CDI directory from the command line with /var/run/cdi as a default

What @pohly said right below your comment:

with <cdi directory being configurable and defaulting to /var/run/cdi

In go language, is there a way to make it constant after getting the value from the user...?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about "/var/run/cdi", not about cdiDir.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I got it, now. Thanks!! :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not resolved, but marked as resolved again!

kubeletPluginFlagSets := cliflag.NamedFlagSets{}
fs = kubeletPluginFlagSets.FlagSet("kubelet plugin")

kubeletDir := fs.String("kubelet-dir", "/var/lib/kubelet", "Path to the Kubelet directory.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid using hardcoded paths. Either import it from the k8s code or at least define your own constant.

Copy link
Author

@hj-johannes-lee hj-johannes-lee Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I do not understand what is 'hard code'. Isn't it configurable by users, and the default is /var/lib/kubelet?

I cannot find how to make it configurable.
When I asked before @pohly with the example os.Getenv, he said about 'drivername'.

driverName := fs.String("drivername", "example-driver.cdi.k8s.io", "Resource driver name.")

Isn't it then also.. hard-coded as 'example-driver.cdi.k8s.io'?

Copy link

@bart0sh bart0sh Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define a constant, e.g. const defaultKubeletDir = "/var/lib/kubelet" and use it in the fs.String call. Or, better, import it from k8s if it exists there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bart0sh AHA!!!!!! Thanks!

fs = kubeletPluginFlagSets.FlagSet("kubelet plugin")

kubeletDir := fs.String("kubelet-dir", "/var/lib/kubelet", "Path to the Kubelet directory.")
draAddress := fs.String("dra-address", *kubeletDir+"/plugins/"+*driverName+"/dra.sock", "Path to the DRA driver socket kubelet will use to issue CDI operations.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use path.Join

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, I forgot to do for this part though I changed other parts. Thanks!


kubeletDir := fs.String("kubelet-dir", "/var/lib/kubelet", "Path to the Kubelet directory.")
draAddress := fs.String("dra-address", *kubeletDir+"/plugins/"+*driverName+"/dra.sock", "Path to the DRA driver socket kubelet will use to issue CDI operations.")
pluginRegistrationPath := fs.String("plugin-registration-path", *kubeletDir+"/plugins_registry", "Path to Kubernetes plugin registration socket.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same thing: use path.Join

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@bart0sh
Copy link

bart0sh commented Aug 9, 2022

@hj-johannes-lee please rebase your branch on top of the latest target branch. Thanks.


func (ex *examplePlugin) NodePrepareResource(ctx context.Context, req *drapbv1.NodePrepareResourceRequest) (*drapbv1.NodePrepareResourceResponse, error) {

const (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move constants to the top of the file


//return qualified name of the cdi device
dev := spec.GetDevice(deviceName).GetQualifiedName()
klog.Infof("Return: " + dev)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is misleading. you're not returning dev, you're returning NodePrepareResourceResponse.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and don't log debug information at klog.Infof = klog.V(0).Infof. That's way too verbose.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, use contextual logging as in my code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here. this is not resolved.

file.Close()
return nil, fmt.Errorf("failed to write json file: %v", err)
}
if err := file.Close(); err != nil {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use defer file.Close() golang idiom.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please stop resolving conversation before submitting a code that fixes it! Thanks.

@bart0sh
Copy link

bart0sh commented Aug 9, 2022

So, please resolve them if you think they are resolved.

I don't see "Resolve" button. @pohly can do that I think.

}

// create ContainerEdits in device spec and ContainerEdits
containerEditsInDeviceSpec := &cdiapi.ContainerEdits{ContainerEdits: &specs.ContainerEdits{}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this variable? Can we just initialise the whole spec below, where we call cdiapi.NewSpec?

containerEditsInDeviceSpec.Append(&cdiapi.ContainerEdits{ContainerEdits: &specs.ContainerEdits{
Env: envs,
}})
containerEdits := &cdiapi.ContainerEdits{ContainerEdits: &specs.ContainerEdits{}}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one can be also dropped, I believe.

@hj-johannes-lee
Copy link
Author

@pohly I cannot know what I need to see any more because all are unresolved. Please resolve them, if you think they are resolved.

// create bytes from spec, which would be written into the json file
jsonBytes, err := json.Marshal(*spec)
if err != nil {
return nil, fmt.Errorf("failed to marshal: %v", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed to marshal what? please, be more specific.

// create json files
filePath := ex.getJsonFilePath(req.ClaimUid)
if err := os.WriteFile(filePath, jsonBytes, os.FileMode(0644)); err != nil {
return nil, fmt.Errorf("failed to write file %v", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to use 'CDI file' instead of just file in error messages. This is more descriptive in my pov.

if err := os.WriteFile(filePath, jsonBytes, os.FileMode(0644)); err != nil {
return nil, fmt.Errorf("failed to write file %v", err)
}
klog.Infof("Json file is created at " + filePath)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CDI file is created

s := newNonBlockingGRPCServer()

s.Start(draAddress, driver)
s.Wait()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, resolved then. Thanks.

@pohly
Copy link
Owner

pohly commented Aug 9, 2022

@pohly I cannot know what I need to see any more because all are unresolved. Please resolve them, if you think they are resolved.

I cannot resolve either. What you can do is give the person you are talking with a chance to acknowledge that your response answers the question (via comment or a thumbs up reaction), then resolve it.


// create spec
spec, err := cdiapi.NewSpec(
&specs.Spec{
Copy link

@bart0sh bart0sh Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make this a separate variable as its content is quite big, which makes NewSpec call less readable, e.g.

spec := spec.Spec{ ....
...
}

cdiSpec, err := cdiapi.NewSpec(&spec)
...


dev := spec.GetDevice(deviceName).GetQualifiedName()
res := &drapbv1.NodePrepareResourceResponse{CdiDevice: []string{dev}}
klog.V(5).Infof("Returned: %s", res.String())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use something like this to match the starting message for this call:

klog.Infof("NodePrepareResource is successfuly finished, response: %+v", resp)

@hj-johannes-lee
Copy link
Author

@pohly, @bart0sh As I told, I won't resolve anything. If I miss something because it's super messy from 'everything-is-unresolved' condition, then please just make a comment again to make me solve. I have no energy to see from the beginning to the end every time.

@hj-johannes-lee hj-johannes-lee force-pushed the dra-example-driver branch 2 times, most recently from 3210390 to 42511d5 Compare August 9, 2022 13:27
}

// newExamplePlugin returns an initialized examplePlugin instance.
func newExamplePlugin(config examplePlugin) (*examplePlugin, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, you're passing examplePlugin struct as a parameter to create examplePlugin struct? This looks quite strange to be honest. Can you just pass 4 parameters instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a good argument for passing a struct: it's easy to mix up the order of parameters when a function takes four parameters of the same type (strings in this case). Optional parameters are easy with a struct and impossible with args.

No need to change this again, I just wanted to point it out.


klog.Infof("NodePrepareResource is called: request: %+v", req)

klog.Infof("Creating CDI Files")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we create multiple CDI files?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, from the begining, asked multiple times

  1. why you say the plugin generates CDI fileS though we create only JSON file.
  2. Is there any other file I need to generate?

I got no answer, and kept it like that?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is only one file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, changed.

}
cdiSpec, err := cdiapi.NewSpec(
&spec,
ex.getJsonFilePath(req.ClaimUid), 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd put all parameters in one line. This way error checking is closer to the call and more readable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

@hj-johannes-lee hj-johannes-lee force-pushed the dra-example-driver branch 3 times, most recently from 89422ef to 90dd92e Compare August 9, 2022 13:47

func (ex *examplePlugin) NodePrepareResource(ctx context.Context, req *drapbv1.NodePrepareResourceRequest) (*drapbv1.NodePrepareResourceResponse, error) {

kind := ex.driverName + "/" + deviceName
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this variable lower, where it's used or get rid of it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use driver name as a vendor. We don't validate it, so any string can be passed as a driver name to the plugin with a command line parameter. If we put it as a vendor into the CDI file it might cause validation errors.

ContainerEdits: specs.ContainerEdits{
Env: envs,
},
}},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we don't have any devices here can we skip this and set env vars only in the global container edits section below?

}
klog.V(5).InfoS("CDI spec is validated")

// create bytes from spec, which would be written into the json file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a cdiSpec.Write() API that does exactly what we need here. I've asked CDI maintainer to tag it. After that we can use it here.

@pohly
Copy link
Owner

pohly commented Aug 9, 2022

@hj-johannes-lee: I know that you have worked hard on this, but we are not making progress quickly enough with the current approach (you write some code, @bart0sh and I comment, you revise it again, etc.).

Let's try something else: I'll take your latest revision (90dd92e) and then I'll go through everything and make all changes that I consider necessary or useful. When I am done, I can post that as a separate commit on top of yours and then we can discuss why I made changes in those case where it isn't obvious before squashing.

@pohly pohly mentioned this pull request Aug 10, 2022
@pohly pohly force-pushed the dynamic-resource-allocation branch 3 times, most recently from d4c9729 to d9b1636 Compare August 12, 2022 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants