New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force PAO installation on master nodes #351
Conversation
/retest |
Depends on #350 |
if err := testclient.Client.List(context.TODO(), pods, opts); err != nil { | ||
return nil, err | ||
} | ||
if len(pods.Items) != 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should just use len(pods.Items) < 1 (just in case we eventually add HA and leader election). Or maybe it would be premature to change the test now, so just asking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, I prefer to keep the test as-is and see it break first when/if we add the HA/leader election.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks mostly OK. Most (relatively minor) concerns about the e2e test and the GetPerformanceProfilePod
function, changes to CSV looks good
@@ -118,6 +118,13 @@ spec: | |||
labels: | |||
name: performance-operator | |||
spec: | |||
affinity: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems OCP control plane is using
nodeSelector:
node-role.kubernetes.io/master: ""
but the end result is the same and affinity rules are more powerful than nodeSelector, so it seems is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I preferred affinity since is is reflected in the official docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is affinity reflected in the openshift official docs or the kubernetes official docs? or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both
if len(pods.Items) != 1 { | ||
return nil, fmt.Errorf("incorrect performance operator pods count: %d", len(pods.Items)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any way to check that pod is indeeded running performance-operator
and not any random pod which is happen to have the same label?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the same namespace as PAO "openshift-performance-addon" ? I think is a relatively safe bet...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still a bet, meaning is still anecdotal evidence, and we need quite some of it to gain confidence. Checking the image name seems stronger (does it contain 'performance-operator'). Feel free to add more checks, or to lookup better ones.
The best way would probably be to check a PAO endpoint and to verify it is behaving correctly; that's sufficient proof the service is there. It's probably complex, maybe too complex to be done here, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are also checking we have only one pod in the namespace. The check is the first step in the func tests, I suppose the next steps will fail if for some reason the PAO is not in the namespace, but another pod with same label.
While I agree that theoretically is not enough, is it the place to add more checks?
I can add a check the actual pod name starts with performace-operator-{something}, but this is a side effect on how k8s names pods.Or maybe would add to confidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check the performance operator pod name (not label) starts with "performance-operator"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should increase the confidence in the bet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not completely happy with this solution, but good enough for now.
56ae514
to
ac70c96
Compare
@@ -9,6 +9,7 @@ import ( | |||
|
|||
. "github.com/onsi/ginkgo" | |||
. "github.com/onsi/gomega" | |||
. "github.com/onsi/gomega/gstruct" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used by Reject()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. And BTW this is exacty why I dislike the dot imports.
ac70c96
to
3b7b70a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
good enough for now
/retest |
1 similar comment
/retest |
@marcel-apf Please fix CI job
|
PAO is meant to run on "control plane". Use podAffinity to schedule PAO only on master nodes. (https://docs.openshift.com/container-platform/4.5/nodes/scheduling/nodes-scheduler-pod-affinity.html) Use taints tolerations to allow PAO scheduling on the master nodes. (https://docs.openshift.com/container-platform/4.5/nodes/scheduling/nodes-scheduler-taints-tolerations.html) Update the manifests according to the above. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Straight forward test that checks the node PAO is running on is a master node. Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
3b7b70a
to
414cb91
Compare
Done, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fromanirh, marcel-apf The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
/cherry-pick release-4.6 |
@cynepco3hahue: once the present PR merges, I will cherry-pick it on top of release-4.6 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/override ci/prow/e2e-gcp-operator-upgrade |
@fromanirh: Overrode contexts on behalf of fromanirh: ci/prow/e2e-gcp-operator-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest |
The single flakiness because of
|
/override ci/prow/e2e-gcp |
@cynepco3hahue: Overrode contexts on behalf of cynepco3hahue: ci/prow/e2e-gcp In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
ok, for the record I agree with the evalutation and I was about to override myself. @cynepco3hahue was just faster. |
/retest |
@marcel-apf: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@marcel-apf The failure connects to the functional test under the PR
|
The new test runs before the deployment of the PAO, so it not surprising that the test fails. I propose to make the check where the PAO pod run as part of the deployment test. |
This means moving the test from the |
let's see: c61e2d1 I'll take care of this PR and have it merged ASAP. |
/retest |
actually obsoleted because #373 got merged |
yay!, closing this one |
PAO is meant to run on "control plane".
Use podAffinity to schedule PAO only on master nodes.
(https://docs.openshift.com/container-platform/4.5/nodes/scheduling/nodes-scheduler-pod-affinity.html)
Use taints tolerations to allow PAO scheduling on the master nodes.
(https://docs.openshift.com/container-platform/4.5/nodes/scheduling/nodes-scheduler-taints-tolerations.html)
Add a test verifying PAO runs in master nodes
Straight forward test that queries the node PAO is running on
is a master node.