-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sysched: add the implementation of SySched #568
Conversation
Hi @salmanyam. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@Huang-Wei This is the implementation of KEP #399, a system call-aware scheduler plugin called SySched. Could you please take a look to see if there are any concerns or comments? Thank you! |
5ec6eb3
to
ca6eee7
Compare
Thanks for the PR. I'm a bit swamped recently. So if any reviewers have free cycles, please assign to yourself. /help |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My first pass on this PR. Didn't dive deep into the logic.
#serviceAccountName: default | ||
#schedulerName: scheduler-plugins-scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
#serviceAccountName: default | ||
#schedulerName: scheduler-plugins-scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
#serviceAccountName: default | ||
#schedulerName: scheduler-plugins-scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
#serviceAccountName: default | ||
#schedulerName: scheduler-plugins-scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
#resources: | ||
# requests: | ||
# cpu: 100m | ||
# memory: 100Mi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
pkg/sysched/sysched.go
Outdated
W := 1 | ||
score := len(syscalls) - tot_crit | ||
score = score + W*tot_crit | ||
klog.Info(score, " ", tot_crit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function will always return same score. tot_crit
is always 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, our plugin does not support syscall weights. We are working on it and update the weighting in our next version. We leave that portion of code intentionally. Also, we have put a note there regarding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a comment explaining that weight is hard-coded for now.
pkg/sysched/sysched.go
Outdated
|
||
nodeIP := node.Status.Addresses[0].Address | ||
|
||
podSyscalls, _ := sc.getSyscalls(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always check errors
pkg/sysched/sysched.go
Outdated
} | ||
|
||
func (sc *SySched) updateHostSyscalls(pod *v1.Pod) { | ||
syscall, _ := sc.getSyscalls(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always check errors
pkg/sysched/sysched.go
Outdated
syscalls := make(map[string]bool) | ||
|
||
for _, p := range pods { | ||
syscall, _ := sc.getSyscalls(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always check errors
pkg/sysched/sysched.go
Outdated
if p.Name == pod.Name { | ||
sc.HostToPods[nodeIP] = remove(sc.HostToPods[nodeIP], i) | ||
sc.HostSyscalls[nodeIP] = sc.recomputeHostSyscalls(sc.HostToPods[nodeIP]) | ||
c, _ := sc.getHostSyscalls(nodeIP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
always check errors
/ok-to-test |
Thank you very much, @PiotrProkop! We have addressed all the comments. Please let us know if you have more comments or concerns. |
214e36c
to
5265676
Compare
Hi @Huang-Wei, we have rebased our code as the bot triggered |
49100dc
to
2da5026
Compare
|
||
// Score invoked at the score extension point. | ||
func (sc *SySched) Score(ctx context.Context, cs *framework.CycleState, pod *v1.Pod, nodeName string) (int64, *framework.Status) { | ||
node, err := sc.handle.ClientSet().CoreV1().Nodes().Get(context.TODO(), nodeName, metav1.GetOptions{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to get cached NodeInfo from scheduler framework, which is the source of truth to do scheduling. (if reading from API server, the "cached" info would not be there)
nodeInfo, err := sc.handle.SnapshotSharedLister().NodeInfos().Get(nodeName)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We noticed the cached state is not always up-to-date, especially when there are activities on other schedulers in the system or when our scheduler comes up after a bunch of pods and nodes have already appeared, hence, we opted not to use the SnapshotSharedLister.
We tried inserting some comments in the code to explain our rationale. Please let us know if this makes sense.
pkg/sysched/sysched.go
Outdated
return 0, nil | ||
} | ||
|
||
nodeIP := node.Status.Addresses[0].Address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why using a node IP instead of node.Name
as the canonical identifier to refer to a Node?
pkg/sysched/sysched.go
Outdated
type SySched struct { | ||
handle framework.Handle | ||
clientSet v1alpha1.SPOV1Alpha1Interface | ||
HostToPods map[string][]*v1.Pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the frameworkHandle already holds this: sc.handle.SnapshotSharedLister()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained above, we keep a local map of HostToPods in our plugin rather than rely on the cached state to reliably keep track of all pods on the cluster, regardless of which scheduler it used.
pkg/sysched/sysched.go
Outdated
handle framework.Handle | ||
clientSet v1alpha1.SPOV1Alpha1Interface | ||
HostToPods map[string][]*v1.Pod | ||
HostSyscalls map[string]map[string]bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a comment explaining what's keyed, and what's valued?
Commented in another thread, in Golang we don't use map[string]bool
to represent a set; instead, we use map[string]struct{}
. And you even don't need to implement you self - reuse Set
as well as its Untion/Intersect functions.
pkg/sysched/sysched.go
Outdated
clientSet v1alpha1.SPOV1Alpha1Interface | ||
HostToPods map[string][]*v1.Pod | ||
HostSyscalls map[string]map[string]bool | ||
CritSyscalls map[string][]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also comment it.
test/integration/sysched_test.go
Outdated
} | ||
cfg.Profiles[0].Plugins.Score = schedapi.PluginSet{ | ||
Enabled: []schedapi.Plugin{{Name: sysched.Name}}, | ||
//Disabled: []schedapi.Plugin{{Name: "*"}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be disabling it. Otherwise score of other plugins may impact the eventual result.
test/integration/sysched_test.go
Outdated
_, err = SPOCreate(extClient, &fullseccompSPOCR, metav1.CreateOptions{}) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
_, err = SPOCreate(extClient, &badSPOCR, metav1.CreateOptions{}) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
_, err = SPOCreate(extClient, &good1SPOCR, metav1.CreateOptions{}) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
_, err = SPOCreate(extClient, &good2SPOCR, metav1.CreateOptions{}) | ||
if err != nil { | ||
t.Fatal(err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
De-duple the code by using a for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you make the test more table-driven? you can refer to other unit tests like pkg/capacityscheduling/capacityscheduling_test.go
pkg/sysched/sysched_test.go
Outdated
package sysched | ||
|
||
import ( | ||
//"fmt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove useless comments please (applies elsewhere).
pkg/sysched/sysched_test.go
Outdated
v1 "k8s.io/api/core/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/runtime/schema" | ||
informers "k8s.io/client-go/informers" | ||
clientsetfake "k8s.io/client-go/kubernetes/fake" | ||
"k8s.io/client-go/kubernetes/scheme" | ||
restclient "k8s.io/client-go/rest" | ||
restfake "k8s.io/client-go/rest/fake" | ||
"k8s.io/kubernetes/pkg/scheduler/framework" | ||
"k8s.io/kubernetes/pkg/scheduler/framework/plugins/defaultbinder" | ||
"k8s.io/kubernetes/pkg/scheduler/framework/plugins/queuesort" | ||
frameworkruntime "k8s.io/kubernetes/pkg/scheduler/framework/runtime" | ||
st "k8s.io/kubernetes/pkg/scheduler/testing" | ||
"net/http" | ||
pluginconfig "sigs.k8s.io/scheduler-plugins/apis/config" | ||
v1alpha1 "sigs.k8s.io/scheduler-plugins/pkg/sysched/clientset/v1alpha1" | ||
"sigs.k8s.io/security-profiles-operator/api/seccompprofile/v1beta1" | ||
"strings" | ||
"testing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please sort the imports by groups: (applies elsewhere)
- native imports (followed an empty line)
- upstream imports (followed by an empty line)
- imports in this repo
Thanks @Huang-Wei for taking the time to review our patch. We should be able to address all of your concerns in the next PR. |
4e9cd1e
to
871e7d1
Compare
✅ Deploy Preview for kubernetes-sigs-scheduler-plugins canceled.
|
@Huang-Wei Thanks for all your comments so far. We've updated the code to cover most of the comments, with some exceptions as explained in the comments above. Main items we have not changed:
Please let us know your thoughts on the changes. |
@Huang-Wei Hi! Wondering if you'll have a chance to take a look at this PR in the coming days? Looking forward to see if we can get this closer to being merged. |
@mvle could you rebase to resolve the CI failure? I will take a look afterwards. |
22dad6e
to
bf69b05
Compare
@Huang-Wei we've rebased. when you get a chance, please provide feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you eliminate the in-house clientset? We already have mature semantics to interact with any kind of CR objects.
Also, given this PR introduced a CRD, could you also update hack/verify-crdgen.sh
to include the source types.go of security-profiles-operator.x-k8s.io_seccompprofiles.yaml?
scheduler-plugins/hack/verify-crdgen.sh
Lines 32 to 33 in c752c54
# Generate CRD | |
api_paths="./apis/scheduling/v1alpha1/...;./vendor/github.com/k8stopologyawareschedwg/noderesourcetopology-api/pkg/apis/...;./vendor/github.com/diktyo-io/appgroup-api/pkg/apis/...;./vendor/github.com/diktyo-io/networktopology-api/pkg/apis/..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you follow other crds to make it symbolic linked from ../seccompprofile/crd.yaml
?:
pr/568 ⇒ ll manifests/crds
total 12K
lrwxr-xr-x 1 weih 20 Apr 14 2023 appgroup.diktyo.x-k8s.io_appgroups.yaml -> ../appgroup/crd.yaml
lrwxr-xr-x 1 weih 27 Apr 14 2023 networktopology.diktyo.x-k8s.io_networktopologies.yaml -> ../networktopology/crd.yaml
lrwxr-xr-x 1 weih 30 Apr 14 2023 scheduling.x-k8s.io_elasticquotas.yaml -> ../capacityscheduling/crd.yaml
lrwxr-xr-x 1 weih 24 Apr 14 2023 scheduling.x-k8s.io_podgroups.yaml -> ../coscheduling/crd.yaml
-rw-r--r-- 1 weih 9.6K Jan 27 09:51 seccompprofiles.security-profiles-operator.x-k8s.io_sysched.yaml
lrwxr-xr-x 1 weih 32 May 25 2022 topology.node.k8s.io_noderesourcetopologies.yaml -> ../noderesourcetopology/crd.yaml
test/integration/sysched_test.go
Outdated
@@ -0,0 +1,322 @@ | |||
/* | |||
Copyright 2020 The Kubernetes Authors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2024
pkg/sysched/sysched.go
Outdated
// Currently, this function does not change score as the weight is set to 1, and | ||
// no critical syscalls are given as input | ||
func (sc *SySched) calcScore(syscalls []string) int { | ||
tot_crit := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment was given on tot_crit := 0
, not the function name.
RestClient rest.Interface | ||
} | ||
|
||
func NewForConfig(c *rest.Config) (*SPOV1Alpha1Client, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this as this should have been abstracted via generic (dynamic) clientset. You can just use the generic clientset in pkg/sysched/sysched.go w/o composing pkg/sysched/clientset at all.
Example:
scheduler-plugins/pkg/coscheduling/coscheduling.go
Lines 69 to 73 in c752c54
scheme := runtime.NewScheme() | |
_ = clientscheme.AddToScheme(scheme) | |
_ = v1.AddToScheme(scheme) | |
_ = v1alpha1.AddToScheme(scheme) | |
client, err := client.New(handle.KubeConfig(), client.Options{Scheme: scheme}) |
test/integration/sysched_test.go
Outdated
err := client.RestClient. | ||
Get(). | ||
Namespace("default"). | ||
Resource("seccompprofiles"). | ||
Name(name). | ||
VersionedParams(&opts, scheme.ParameterCodec). | ||
Do(context.TODO()). | ||
Into(&result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. We don't directly manipulate the client in this manner.
- apiGroups: ["security-profiles-operator.x-k8s.io"] | ||
resources: ["seccompprofiles", "profilebindings"] | ||
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you comment it out? for newly introduced plugin, we don't enable it until it's tested enough.
- apiGroups: ["security-profiles-operator.x-k8s.io"] | ||
resources: ["seccompprofiles", "profilebindings"] | ||
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Comment them out.
@@ -17,7 +17,7 @@ controller: | |||
# as they need extra RBAC privileges on metrics.k8s.io. | |||
|
|||
plugins: | |||
enabled: ["Coscheduling","CapacityScheduling","NodeResourceTopologyMatch","NodeResourcesAllocatable"] | |||
enabled: ["Coscheduling","CapacityScheduling","NodeResourceTopologyMatch","NodeResourcesAllocatable", "SySched"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Don't enable it by default.
Hi @Huang-Wei, we have addressed all the comments and removed the in-house clientset. Please take a look and let us know your feedback. Thanks! |
Hi @Huang-Wei, when you get a chance, please provide us feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work. It's very close to merge, just a new nits.
BTW: could you squash the commits into 3:
- API files (
apis/
) - manually added by you - API codegen files (
apis/../zz_*.go
) - Other logic files - i.e., all files exclude 1,2 and 4
vendor/
,go.mod
,go.sum
- name: SySched | ||
args: | ||
defaultProfileNamespace: "default" | ||
defaultProfileName: "full-seccomp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment these out.
manifests/install/all-in-one.yaml
Outdated
- apiGroups: ["security-profiles-operator.x-k8s.io"] | ||
resources: ["seccompprofiles", "profilebindings"] | ||
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment these out (to be consistent with default values.yaml in Helm)
manifests/install/all-in-one.yaml
Outdated
- apiGroups: ["security-profiles-operator.x-k8s.io"] | ||
resources: ["seccompprofiles", "profilebindings"] | ||
verbs: ["get", "list", "watch", "create", "delete", "update", "patch"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment these out (to be consistent with default values.yaml in Helm)
pkg/sysched/sysched.go
Outdated
|
||
type SySched struct { | ||
handle framework.Handle | ||
//clientSet v1alpha1.SPOV1Alpha1Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this line.
pkg/sysched/sysched.go
Outdated
// NOTE: not used at this time | ||
// Key: category of critical system call, e.g., CVE based, admin policy, etc. | ||
// Value: list of system calls | ||
//CritSyscalls map[string][]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Remove this line.
pkg/sysched/sysched.go
Outdated
// SCMP_ACT_TRACE --> ActTrace, seccomp.ActNotify | ||
if element.Action == seccomp.ActAllow || element.Action == seccomp.ActLog { | ||
syscalls = syscalls.Union(sets.New[string](element.Names...)) | ||
//syscalls = unionList(syscalls, element.Names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. Let's remove this line.
pkg/sysched/sysched.go
Outdated
// NOTE: weight W is hardcoded for now | ||
// TODO: add critical/cve syscalls | ||
// TODO: adjust weight W for critical/cve syscalls | ||
tot_crit := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change it to totCrit
? to be Go-idiomatic.
pkg/sysched/sysched.go
Outdated
return 0, nil | ||
} | ||
|
||
//diffSyscalls := setSubtract(hostSyscalls, podSyscalls) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line please.
8ae9391
to
f3c4d5a
Compare
Manually update the API files for the implementation of SySched, a system call-aware scheduler(KEP kubernetes-sigs#399). SySched is a new scoring plugin that ranks feasible nodes based on the relative risks of pods' system call usage.
f3c4d5a
to
4b6ea59
Compare
Hi @Huang-Wei, we made the changes accordingly. Please take a look when you a chance. Thanks a lot! |
@salmanyam thanks! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Huang-Wei, salmanyam The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks a lot, @Huang-Wei! We greatly appreciate your time and effort on this PR. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implementation of SySched, a system call-aware scheduler. The KEP of this PR is KEP #399.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Thank you very much for reviewing this PR.
Please let us know if there are questions or comments.
Your comments and feedback are appreciated.
Does this PR introduce a user-facing change?