Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CNF-9173: e2e: cgroups: introduce cgroup package #906

Merged
merged 3 commits into from Feb 5, 2024

Conversation

Tal-or
Copy link
Contributor

@Tal-or Tal-or commented Jan 4, 2024

This package provides a Getter which is cgroup version agnostic.
It provides a unified api for testing the mixedcpus feature for both v1 and v2.

The package can be expand in the future for testing additional features that requires cgroups inspection. (for example runc vs crun)

How to use:

getter := BuildGetter(context.TODO(), controllerRuntimeClient, k8sClient)
cpusetCfg := &controller.CpuSet{}
err := getter.Container(context.TODO(), pod, containerName, cpusetCfg)
// error handling
fmt.Println(cpusetCfg.Cpus)

@Tal-or
Copy link
Contributor Author

Tal-or commented Jan 4, 2024

/cc @mrniranjan

@openshift-ci openshift-ci bot requested a review from mrniranjan January 4, 2024 16:09
@Tal-or
Copy link
Contributor Author

Tal-or commented Jan 4, 2024

This PR was part of #892 which we decide to extract in order merge it more quickly since it needed for the cgroupv2 testing

@openshift-ci openshift-ci bot requested review from ffromani and yanirq January 4, 2024 16:15
@Tal-or Tal-or changed the title e2e: cgroups: introduce cgroup package CNF-9173: e2e: cgroups: introduce cgroup package Jan 4, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 4, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 4, 2024

@Tal-or: This pull request references CNF-9173 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

In response to this:

This package provides a Getter which is cgroup version agnostic.
It provides a unified api for testing the mixedcpus feature for both v1 and v2.

The package can be expand in the future for testing additional features that requires cgroups inspection. (for example runc vs crun)

How to use:

getter := BuildGetter(context.TODO())
cfg := getter.GetConfig(context.TODO(), client, pod,containerName)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks nice, but we need some form of testing, at least partial

Period string
}

type Getter interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually get rid of this interface entirely.
Plus, in golang better to define interface on the consumer side: https://www.thoughtworks.com/insights/blog/programming-languages/mistakes-to-avoid-when-coming-from-an-object-oriented-language

Copy link
Contributor Author

@Tal-or Tal-or Jan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the useful information, it really helps understand the principles better.

So in our case it means the interface would be implemented in the test files themselves, i.e. 11_mixedcpus/mixedcpus.go and optionally 7_performance_kubelet_node/cgroups.go.
But what if both packages want to use the exact interface? should both declare it?

And my second questions is how do you suggest the BuildGetter to return a struct, while it suppose to return an abstraction for getting the cgroup information.

Of course we can implement the BuildGetter on the consumer side (where there it would make sense to expect interface), but again it's a duplication of code, since at least two packages (from what I know of) would need to implement it to retrieve the existing cgroup configuration on the node.

CPUSet string
Quota string
Period string
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define structure in a more hierarchial fashion i.e: Controllers->Controller-interfacefiles

type CgroupController struct {
          //suported cgroup Controllers
           cpuset
           cpu
           // any future cgroup controllers 
   }

Then created methods that return interface files of that particular controller ?


func (m *Manager) GetConfig(ctx context.Context, c *kubernetes.Clientset, pod *corev1.Pod, containerName string) (*config.Config, error) {
var cmd []string
cfg := &config.Config{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetConfig here supports getting here only cpuset.cpus, cpu.max, but there are other interface files that equally important and we do need them in future like
cpuset.cpu.partition, cpuset.cpus.effective, cpuset.mems (this is specifically required for memory manager to know which numa interface the pod has been assigned).

@mrniranjan
Copy link
Contributor

This PR is a good starting point, i think we can improve it in further iteration

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 10, 2024

@Tal-or: This pull request references CNF-9173 which is a valid jira issue.

In response to this:

This package provides a Getter which is cgroup version agnostic.
It provides a unified api for testing the mixedcpus feature for both v1 and v2.

The package can be expand in the future for testing additional features that requires cgroups inspection. (for example runc vs crun)

How to use:

getter := BuildGetter(context.TODO(), controllerRuntimeClient, k8sClient)
cfg := getter.GetConfig(context.TODO(), client, pod,containerName)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 10, 2024

@Tal-or: This pull request references CNF-9173 which is a valid jira issue.

In response to this:

This package provides a Getter which is cgroup version agnostic.
It provides a unified api for testing the mixedcpus feature for both v1 and v2.

The package can be expand in the future for testing additional features that requires cgroups inspection. (for example runc vs crun)

How to use:

getter := BuildGetter(context.TODO(), controllerRuntimeClient, k8sClient)
cpusetCfg := controller.CpuSet{}
err := getter.Container(context.TODO(), client, pod, containerName, cpusetCfg)
// error handling
cpusetCfg.Cpus

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 10, 2024

@Tal-or: This pull request references CNF-9173 which is a valid jira issue.

In response to this:

This package provides a Getter which is cgroup version agnostic.
It provides a unified api for testing the mixedcpus feature for both v1 and v2.

The package can be expand in the future for testing additional features that requires cgroups inspection. (for example runc vs crun)

How to use:

getter := BuildGetter(context.TODO(), controllerRuntimeClient, k8sClient)
cpusetCfg := &controller.CpuSet{}
err := getter.Container(context.TODO(), client, pod, containerName, cpusetCfg)
// error handling
cpusetCfg.Cpus

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 10, 2024

@Tal-or: This pull request references CNF-9173 which is a valid jira issue.

In response to this:

This package provides a Getter which is cgroup version agnostic.
It provides a unified api for testing the mixedcpus feature for both v1 and v2.

The package can be expand in the future for testing additional features that requires cgroups inspection. (for example runc vs crun)

How to use:

getter := BuildGetter(context.TODO(), controllerRuntimeClient, k8sClient)
cpusetCfg := &controller.CpuSet{}
err := getter.Container(context.TODO(), pod, containerName, cpusetCfg)
// error handling
cpusetCfg.Cpus

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 10, 2024

@Tal-or: This pull request references CNF-9173 which is a valid jira issue.

In response to this:

This package provides a Getter which is cgroup version agnostic.
It provides a unified api for testing the mixedcpus feature for both v1 and v2.

The package can be expand in the future for testing additional features that requires cgroups inspection. (for example runc vs crun)

How to use:

getter := BuildGetter(context.TODO(), controllerRuntimeClient, k8sClient)
cpusetCfg := &controller.CpuSet{}
err := getter.Container(context.TODO(), pod, containerName, cpusetCfg)
// error handling
fmt.Println(cpusetCfg.Cpus)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Tal-or
Copy link
Contributor Author

Tal-or commented Jan 10, 2024

@mrniranjan IMO the existing implementation is good enough for now and we can expand it later

@ffromani
Copy link
Contributor

it's probabably wasteful to retry until we can consume cri-o/cri-o#7643 in our CI clusters

}

func (cm *ControllersManager) Cpu(ctx context.Context, pod *corev1.Pod, containerName, runtimeType string) (*controller.Cpu, error) {
cfg := &controller.Cpu{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are passing runtimeType but we are not using this variable in Cpu function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I need to learn the details, i just passed the argument for now

return &ControllersManager{client: c, k8sClient: k8sClient}
}

func (cm *ControllersManager) CpuSet(ctx context.Context, pod *corev1.Pod, containerName, runtimeType string) (*controller.CpuSet, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are passing runtimeType but we are not using this variable in CpuSet function

@mrniranjan
Copy link
Contributor

LGTM from my side, we can improve this further in future iterations.

@Tal-or
Copy link
Contributor Author

Tal-or commented Jan 15, 2024

it's probabably wasteful to retry until we can consume cri-o/cri-o#7643 in our CI clusters

This is why the updating lane is falling?

@Tal-or
Copy link
Contributor Author

Tal-or commented Jan 16, 2024

/test-required

@Tal-or
Copy link
Contributor Author

Tal-or commented Jan 16, 2024

/test required

Copy link
Contributor

openshift-ci bot commented Jan 16, 2024

@Tal-or: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test e2e-aws-operator
  • /test e2e-aws-ovn
  • /test e2e-aws-ovn-techpreview
  • /test e2e-gcp-pao
  • /test e2e-gcp-pao-updating-profile
  • /test e2e-gcp-pao-workloadhints
  • /test e2e-hypershift
  • /test e2e-no-cluster
  • /test e2e-upgrade
  • /test images
  • /test unit
  • /test verify
  • /test vet

The following commands are available to trigger optional jobs:

  • /test lint

Use /test all to run all jobs.

In response to this:

/test required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

In case no name provided, it takes the name from
the first container in the pod.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 18, 2024
@Tal-or
Copy link
Contributor Author

Tal-or commented Jan 18, 2024

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 18, 2024
@Tal-or
Copy link
Contributor Author

Tal-or commented Jan 21, 2024

/retest

This package provides a ControllerGetter which is cgroup version agnostic.
It provides a unified api for testing the mixedcpus feature for both v1 and v2.

The package can be expand in the future for testing additional features
that requires cgroups inspection.

Also in the future this package should be runtime (crun/runc) agnostic.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
@Tal-or
Copy link
Contributor Author

Tal-or commented Jan 29, 2024

/retest

The exec command might return empty value.
In order to have better control of what we get
we should read the files in separate exec calls.

Signed-off-by: Talor Itzhak <titzhak@redhat.com>
@mrniranjan
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 1, 2024
@Tal-or Tal-or requested a review from ffromani February 1, 2024 08:10
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
nice addition

@@ -148,7 +148,7 @@ var _ = Describe("[performance]Hugepages", Ordered, func() {
Expect(err).ToNot(HaveOccurred())

cmd2 := []string{"/bin/bash", "-c", "tmux new -d 'LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes top -b > /dev/null'"}
_, err = pods.ExecCommandOnPod(testclient.K8sClient, testpod, cmd2)
_, err = pods.ExecCommandOnPod(testclient.K8sClient, testpod, "", cmd2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this works. An alternative option could have been to add a variant function which accepts a container name and reimplement the existing ExecCommandOnPod (with the current signature) on top of it. But we can keep this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is an option as well.

@@ -99,3 +93,25 @@ func (cm *ControllersManager) Child(ctx context.Context, pod *corev1.Pod, contai
// TODO
return nil
}

func (cm *ControllersManager) execAndStore(pod *corev1.Pod, containerName, dirPath string, store map[string]*string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of mutating arguments (and of maps whose values are pointers) but I see why you are doing like this and I don't have compelling suggestions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, it was the only way to generalize the calls per each controller file

@@ -98,3 +92,24 @@ func (cm *ControllersManager) Child(ctx context.Context, pod *corev1.Pod, contai
}
return nil
}
func (cm *ControllersManager) execAndStore(pod *corev1.Pod, containerName, dirPath string, store map[string]*string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems one of the instances on which seems better to generalize the function. But I think you did not because the v1 and v2 package have no common package to share library functions, and adding one doesn't seem so great. If that's the case I agree to keep it like this for starters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. We may consider shared library in the future in case of additional common functions.

@ffromani
Copy link
Contributor

ffromani commented Feb 5, 2024

/approve

we want this change in

@ffromani
Copy link
Contributor

ffromani commented Feb 5, 2024

/approve

Copy link
Contributor

openshift-ci bot commented Feb 5, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, Tal-or

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2024
Copy link
Contributor

openshift-ci bot commented Feb 5, 2024

@Tal-or: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 51e073f into openshift:master Feb 5, 2024
15 checks passed
@Tal-or Tal-or deleted the e2e_cgroups_util branch February 5, 2024 18:18
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

This PR has been included in build cluster-node-tuning-operator-container-v4.16.0-202402051840.p0.g51e073f.assembly.stream for distgit cluster-node-tuning-operator.
All builds following this will include this PR.

@Tal-or
Copy link
Contributor Author

Tal-or commented Feb 8, 2024

/cherry-pick release-4.15

Needed for mixed-cpus e2e testing

@openshift-cherrypick-robot

@Tal-or: #906 failed to apply on top of branch "release-4.15":

Applying: e2e: change `ExecCommandOnPod` to accept container name
Using index info to reconstruct a base tree...
M	test/e2e/performanceprofile/functests/1_performance/hugepages.go
M	test/e2e/performanceprofile/functests/7_performance_kubelet_node/cgroups.go
M	test/e2e/performanceprofile/functests/utils/pods/pods.go
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/performanceprofile/functests/utils/pods/pods.go
CONFLICT (content): Merge conflict in test/e2e/performanceprofile/functests/utils/pods/pods.go
Auto-merging test/e2e/performanceprofile/functests/7_performance_kubelet_node/cgroups.go
Auto-merging test/e2e/performanceprofile/functests/1_performance/hugepages.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 e2e: change `ExecCommandOnPod` to accept container name
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-4.15

Needed for mixed-cpus e2e testing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants