Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to select supported devices in the operator #599

Merged
merged 7 commits into from
Apr 12, 2021

Conversation

ozhuraki
Copy link
Contributor

In order to support per device deployment in the operator:

  • Add customization overlays for FPGA, DSA, SGX, GPU, QAT under deployments/operator/device/
  • Add --device cmdline to operator main.go which defines the controllers/webhooks to set up

@mythi
Copy link
Contributor

mythi commented Mar 15, 2021

/cc @dmitsh FYI

pm := patcher.NewPatcherManager(mgr.GetLogger().WithName("webhooks").WithName("Fpga"))
if deviceType == "fpga" {
pm := patcher.NewPatcherManager(mgr.GetLogger().WithName("webhooks").WithName("Fpga"))
}

mgr.GetWebhookServer().Register("/pods", &webhook.Admission{
Handler: admission.HandlerFunc(pm.GetPodMutator()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also part of FPGA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks, moved it under "if".

@@ -0,0 +1,5 @@
bases:
- ../../default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also pulls in ../../crd/bases which creates the CRDs for all devices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a way to drop the unneeded CRDs?

I tried with the approach to pick only specific CRDs (for example for FPGA), however couldn't get it to work:

diff --git a/deployments/operator/device/fpga/kustomization.yaml b/deployments/operator/device/fpga/kustomization.yaml
index cea26f2..60de267 100644
--- a/deployments/operator/device/fpga/kustomization.yaml
+++ b/deployments/operator/device/fpga/kustomization.yaml
@@ -1,5 +1,16 @@
+namespace: inteldeviceplugins-system
+namePrefix: inteldeviceplugins-
+
+resources:
+- deviceplugin.intel.com_fpgadeviceplugins.yaml
+- fpga.intel.com_acceleratorfunctions.yaml
+- fpga.intel.com_fpgaregions.yaml
+
 bases:
- - ../../default
+- ../../rbac
+- ../../manager
+- ../../webhook
+- ../../certmanager
 
 patchesStrategicMerge:
  - fpga.yaml
# kustomize build deployments/operator/device/fpga
Error: no matches for IdId apps_v1_Deployment|inteldeviceplugins-system|inteldeviceplugins-controller-manager; failed to find unique target for patch apps_v1_Deployment|inteldeviceplugins-controller-manager

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a way to drop the unneeded CRDs?

maybe the only way to do it is to have the CRDs created by the manager itself. This could follow the idea I demonstrate in #394 (comment)

kind: Deployment
metadata:
name: inteldeviceplugins-controller-manager
namespace: inteldeviceplugins-system
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the namespace required for this to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to omit it, however kustomize errors without namespace:

Error: no matches for IdId apps_v1_Deployment|~X|inteldeviceplugins-controller-manager; failed to find unique target for patch apps_v1_Deployment|inteldeviceplugins-controller-manager

@ozhuraki ozhuraki force-pushed the operator-select-device-type branch 3 times, most recently from d5671cb to 546fca5 Compare March 16, 2021 16:42
- args:
- --metrics-addr=127.0.0.1:8080
- --enable-leader-election
- --device=fpga
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be --device=gpu?

- args:
- --metrics-addr=127.0.0.1:8080
- --enable-leader-election
- --device=fpga
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here, should be --device=qat I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch. Fixed both.

flag.StringVar(&metricsAddr, "metrics-addr", ":8080", "The address the metric endpoint binds to.")
flag.StringVar(&devicePluginNamespace, "deviceplugin-namespace", metav1.NamespaceSystem, "The namespace where deviceplugin daemonsets are created")
flag.BoolVar(&enableLeaderElection, "enable-leader-election", false,
"Enable leader election for controller manager. "+
"Enabling this will ensure there is only one active controller manager.")
flag.StringVar(&deviceType, "device", "", "Device type to set up.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that only one device can be chosen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can allow multiple devices. However as kustmizations overlays are per device type, would it make sense, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is possible, but it would be nice to support more than one device at a time. Otherwise if cluster has more than one device type users will have to use default operator with support of all devices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's support also more then one device at a time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be done so that -devices parameter is a comma separated list, you split it, validate and loop. fpga/sgx webhook and fpga crd controllers' registration could also be moved into the loop with if fpga {} and if sgx {}.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let's make it so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest a bit more user friendly approach:

package main
  
import "fmt"
import "flag"
import "strings"

type flagList []string

func (fl *flagList) String() string {
        return strings.Join(*fl, ", ")
}

func (fl *flagList) Set(value string) error {
        *fl = append(*fl, value)
        return nil
}

var devices flagList

func main() {
        flag.Var(&devices, "device", "device to enable, can be used multiple times")
        flag.Parse()
        fmt.Println(devices)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, forgot to show how it looks in the command line:

$ go run flags.go --device fpga --device gpu
[fpga gpu]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks, let's do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

cmd/operator/main.go Show resolved Hide resolved
@ozhuraki ozhuraki force-pushed the operator-select-device-type branch 5 times, most recently from 984b9cf to 112d660 Compare March 22, 2021 10:23
README.md Outdated Show resolved Hide resolved
@ozhuraki ozhuraki force-pushed the operator-select-device-type branch 3 times, most recently from f2e067f to f3bf258 Compare March 22, 2021 13:15
@bart0sh
Copy link
Member

bart0sh commented Mar 22, 2021

@ozhuraki There was some activity on combining components in kustomize last year: kubernetes-sigs/kustomize#1251 Would this approach help us to generate combined operator resources for multiple devices?

@ozhuraki
Copy link
Contributor Author

@bart0sh

Would this approach help us to generate combined operator resources for multiple devices?

Thanks. It might help when we solve the problem of dropping the unwanted CRDs (at the moment they are combined).

I tried to pull only the required bases, however it didn't work (see the discussion above). One possibility is:

maybe the only way to do it is to have the CRDs created by the manager itself. This could follow the idea I demonstrate in #394 (comment)

@ozhuraki ozhuraki force-pushed the operator-select-device-type branch 4 times, most recently from 6a1fe7b to ce34ca9 Compare March 22, 2021 17:21
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller", "controller", "FpgaRegion")
os.Exit(1)
if devices["fpga"] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this if condition and above if devices["fpga"] can be merged into one. The same can be done for 2 if devices["sgx"] conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks! I will merge them.

@intel intel deleted a comment from ozhuraki Mar 23, 2021
@ozhuraki ozhuraki force-pushed the operator-select-device-type branch from ce34ca9 to 8f68925 Compare March 23, 2021 12:55
@ozhuraki ozhuraki force-pushed the operator-select-device-type branch from 8f68925 to 95be9d1 Compare March 23, 2021 13:22
@ozhuraki
Copy link
Contributor Author

OK, updated, please take a look:

  • Added a validation of --device arguments
  • Added a check for duplicates
  • Switched to string array
  • Merged "fpga" clauses

bart0sh
bart0sh previously approved these changes Mar 23, 2021
Copy link
Member

@bart0sh bart0sh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@bart0sh
Copy link
Member

bart0sh commented Mar 31, 2021

@mythi @rojkov please review.

README.md Outdated
$ kubectl apply -k deployments/operator/device/fpga
```

Operator also supports deployments with multiple selected device types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a note that the CRDs are created but no controller is registered.

Also, please rebase. This section got moved to cmd/operator/README

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mythi

OK, thanks!

Copy link
Contributor

@mythi mythi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ozhuraki sorry for the delay. I added a couple of comments.

marking as 'request changes' from me.

Add a patch to operator's manager.yaml to add "--device fpga"
command line in orfer to enable per device deployment.

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
Add --device command line to operator's main.go which defines
the controllers/webhooks to set up.

Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
Signed-off-by: Oleg Zhurakivskyy <oleg.zhurakivskyy@intel.com>
@bart0sh bart0sh merged commit 3161459 into intel:main Apr 12, 2021
@ozhuraki ozhuraki deleted the operator-select-device-type branch December 17, 2021 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants