-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-2793: deployment: ignore defaultmode for volumes, accept unsolicited containers #176
OCPBUGS-2793: deployment: ignore defaultmode for volumes, accept unsolicited containers #176
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alebedev87 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
085c21c
to
fe120f4
Compare
/assign |
@alebedev87 can you add a Jira link for this? @Miciah is going to contact you about this informal story on this one too. |
@candita : I was wondering whether we need a formal Jira bug for things we noticed ourselves which weren't reported by the customer and don't have real bugs (performance improvement at best). Maybe I missed this, is this a requirement now? |
Yes, we all create our own Jira bugs for things we've noticed ourselves. Without it, there is no qe, docs, release note, etc. On all the other repos there is a requirement for a valid bugzilla in the tide output. Not sure why it wasn't set up on this repo. But it does require docs-approved, px-approved, and qe-approved labels, which you won't get without a Jira. Update - I notice you usually add those labels yourself! |
for currName, currCont := range currentContMap { | ||
// if the current container is expected: check its fields | ||
if expCont, found := expectedContMap[currName]; found { | ||
// ensure all expected containers are present, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but have you considered using a deep copy instead of assignment in line 368, where you reset them all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could do a deep copy, that would be a little more cpu cycles but a bit safer in terms of the expected containers modifications done via updated
variable by mistake. But here, we were safe as 1) no other function touches containers and 2) we return immediately so the same function doesn't have a chance to modify updated
variable.
However your comment made me think about the if
condition which you pointed into. Actually it prevents the extra containers from being kept in case the lengths of the current pod.spec.containers
is not the same as expected, so in a lot of cases. I removed this condition to enable the injection of the extra container(s) in any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
externalDNSDeploymentChanged
already makes a deep copy of current
to make update
, so it seems to me that it would be superfluous to do another deep copy here.
testConfigMapVolume("extravolume", "extra", "files", "/etc"), | ||
), | ||
mutate: func(dep *appsv1.Deployment) { | ||
dep.Spec.Template.Spec.Volumes = []corev1.Volume{testSecretVolume("testcreds", "testsecret", "creds", "/run/secrets")} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to append a new volume here? Or is this a test of adding a duplicate volume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The testcase is weirdly designed. It makes my head turn a little every time I approach it. Which is a sign to rewrite it.
originalDeployment
is current state, mutate
function changes the current state into the desired deployment. Then the current and desired deployments are passed to externalDNSDeploymentChanged
which says if the current state needs to be updated and to which state (expectedDeployment
).
So here, we have 2 volumes in the current (cluster) state and 1 volume as desired by the operator and we expect no changes to the current state as we now allow unsolicited volumes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you now allow unsolicited containers. Is it unsolicited volumes are allowed as well as unsoliciated containers? And neither of them are marked as changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we expect no changes to the current state as we now allow unsolicited volumes.
But shouldn't you expect the expectedDeployment to only have one volume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Miciah could you take a quick look at this test case and let me know if you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You take the current, mutate it, then compare it to the expected state after the mutation
No. The mutation creates the desired (by the operator) state.
but if you have applied the mutate function to the current set of volumes, it should have removed that volume. The expected state should only have the single volume you set in the mutate function.
Desired deployment != expected deployment. The desired deployment is what the operator desires to see. The expected deployment is what we expect to be in the cluster after the operator made the reconciliation. The enforcement of the desired state is done in the soft manner - only the volumes mentioned in the desired deployment has to be present, the rest is left intact. This test case demonstrates exactly this: we have 2 volumes in the cluster and one of them is exactly as desired, so we expect
no change (false
). expectedDepolyment
field here is set but it's not even used as we only check it if expect
is true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Andrey's explanations in #176 (comment) and #176 (comment) make sense to me. original
represents whatever is in etcd, tc.mutate(original.DeepCopy())
represents what the operator expects, and tc.expect
is the result of reconciling those two things, and since the goal is for the operator to ignore volumes and sidecar containers that some external agent may inject, tc.expect
should include any additional sidecars or volumes.
The unsolicited volumes were accepted even before this PR, that was done to cover the cases when, for instance, the kube api token is added by the api server.
Those only get injected into the pods, not into the deployment's pod template spec.
The testcase is weirdly designed. It makes my head turn a little every time I approach it.
This whole discussion reminds me of another: openshift/cluster-ingress-operator#611 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unsolicited volumes were accepted even before this PR, that was done to cover the cases when, for instance, the kube api token is added by the api server.
Those only get injected into the pods, not into the deployment's pod template spec.
Yes, you are right, I saw the api token volume in the POD, not in the deployment. Do you think we don't need to be that soft for the volumes?
I personally like the concept of the hard enforcement only of those things which can change the behavior of the operated PODs like args
command
, image
etc. The volumes, in my opinion, can be approached softer - letting the cluster user to add some if it's needed.
The testcase is weirdly designed. It makes my head turn a little every time I approach it.
This whole discussion reminds me of another: openshift/cluster-ingress-operator#611 (comment)
I agree more with Steve's ordering but that's more religious than factual. Let me remove the TODO comment though and add your explanation to the test case, I think it helps understand the idea behind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right, I saw the api token volume in the POD, not in the deployment. Do you think we don't need to be that soft for the volumes?
I don't think we need to be that permissive about volumes, but it's fine to leave it permissive in this PR (which is focused on allowing sidecars).
I personally like the concept of the hard enforcement only of those things which can change the behavior of the operated PODs like
args
command
,image
etc. The volumes, in my opinion, can be approached softer - letting the cluster user to add some if it's needed.
The risk of loose enforcement is that someone might start modifying the deployment in unexpected ways, and if some future change in external-dns-operator breaks that customization (for example, adding a conflicting volume with the same name), then that person will complain. However, we can leave things permissive for the purposes of getting this PR merged.
expect: false, | ||
originalDeployment: testDeploymentWithVolumes(testSecretVolume("testcreds", "testsecret", "creds", "/run/secrets")), | ||
mutate: func(dep *appsv1.Deployment) { | ||
dep.Spec.Template.Spec.Volumes[0].VolumeSource.Secret.DefaultMode = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dep.Spec.Template.Spec.Volumes[0].VolumeSource.Secret.DefaultMode = nil | |
dep.Spec.Template.Spec.Volumes[0].VolumeSource.Secret.DefaultMode = somethingthat'snotnil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it comes form the misleading test design. Here we are saying that the desired deployment doesn't have the defaultMode set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test where the defaultMode is set to something other than nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
expect: false, | ||
originalDeployment: testDeploymentWithVolumes(testConfigMapVolume("testcerts", "testcerts", "key", "/etc/pki/trust")), | ||
mutate: func(dep *appsv1.Deployment) { | ||
dep.Spec.Template.Spec.Volumes[0].VolumeSource.ConfigMap.DefaultMode = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dep.Spec.Template.Spec.Volumes[0].VolumeSource.ConfigMap.DefaultMode = nil | |
dep.Spec.Template.Spec.Volumes[0].VolumeSource.ConfigMap.DefaultMode = somethingnotnil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nil
seems correct to me. Maybe the test case would be clearer if it were named something like "if the API sets a default mode on a configmap volume".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the name to the suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just a few comments.
fe120f4
to
c2fd2a8
Compare
Failed GCP installation. /retest |
@alebedev87: This pull request references Jira Issue OCPBUGS-2793, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@alebedev87: This pull request references Jira Issue OCPBUGS-2793, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
updated.Spec.Template.Spec.Containers = expected.Spec.Template.Spec.Containers | ||
return true | ||
} | ||
|
||
currentContMap := buildIndexedContainerMap(current.Spec.Template.Spec.Containers) | ||
expectedContMap := buildIndexedContainerMap(expected.Spec.Template.Spec.Containers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to not have a container map for the updated.Spec.Template.Spec.Containers
? You are mixing assignment by index with append in L397, within the same loop - are you sure that's going to work for every scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to not have a container map for the
updated.Spec.Template.Spec.Containers
?
Yes, the current and the updated are the same in terms of the containers (what this function is supposed to check) at the beginning of the function. That's actually the whole point: updated == current + updated things
.
You are mixing assignment by index with append in L397, within the same loop - are you sure that's going to work for every scenario?
That's an interesting finding, didn't think of it indeed. Good enough it doesn't impact the algorithm: the indexes are still pointing to the right containers as we append to the end of the container list and the indexed container map is not refreshed after each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay
c2fd2a8
to
eba1f54
Compare
Infoblox license renewed (3 months have flied by). /retest |
api/v1alpha1/groupversion_info.go
Outdated
@@ -15,8 +15,8 @@ limitations under the License. | |||
*/ | |||
|
|||
// Package v1alpha1 contains API Schema definitions for the operator v1alpha1 API group | |||
//+kubebuilder:object:generate=true | |||
//+groupName=externaldns.olm.openshift.io | |||
// +kubebuilder:object:generate=true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to regenerate generated code after the api changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was just golang 1.19 new formatting of the comment markers, now they need a whitespace after //
. No need to regenerate anything, if there would be anything we would catch it in the CI: it runs the markers to code and code to manifests generators to see if there is anything not committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already in a935fda#diff-dab150f060822d42b5ee452169a22a98dca6669fef3b58f23a116caf26084955. Presumably this change and the one in api/v1beta1/groupversion_info.go
would vanish with a rebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, rebased from main
.
/label qe-approved |
/label docs-approved |
eba1f54
to
7bdb49e
Compare
// deepequal is fine here as we don't have more than 1 item | ||
// neither in the secret nor in the configmap | ||
if !reflect.DeepEqual(currVol.Volume, expVol.Volume) { | ||
if !cmp.Equal(currVol.Volume, expVol.Volume, ignoreFieldsConfigMap, ignoreFieldsSecret) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we changed cluster-ingress-operator to hash the deployment, cluster-ingress-operator similarly used go-cmp to compare volumes—it might be worth a look to see whether there are any logic or unit tests worth adapting to external-dns-operator:
https://github.com/openshift/cluster-ingress-operator/blob/5a82b658c4952e8af5d4658d13e8802b93ded870/pkg/operator/controller/ingress/deployment.go#L359
openshift/cluster-ingress-operator@48c17ff
openshift/cluster-ingress-operator@d32578a
openshift/cluster-ingress-operator@5a82b65
For example, the "if the volumes change ordering" test case might be a good one to add to external-dns-operator.
Cluster-ingress-operator's cmpSecretVolumeSource
and cmpConfigMapVolumeSource
also explicitly tested each field in the volume source types, which would reduce the likelihood of problems resulting from defaulting if upstream added new fields in these types. Explicitly testing each field might not be worth the bother though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, now I see where Steve was inspired from :)
For example, the "if the volumes change ordering" test case might be a good one to add to external-dns-operator.
The order of the volumes doesn't matter in this comparison as we construct maps from the volume slices and compare by keys, added the test case to confirm that though.
Cluster-ingress-operator's cmpSecretVolumeSource and cmpConfigMapVolumeSource also explicitly tested each field in the volume source types, which would reduce the likelihood of problems resulting from defaulting if upstream added new fields in these types. Explicitly testing each field might not be worth the bother though.
To me, the real added value of cmpSecretVolumeSource
and cmpConfigMapVolumeSource
is the check of the defaultMode
if it's set in the desired deployment. EquateEmpty
options works well even if specified on the top level of the comparison chain, so I added it on the Volume
level. I'll keep defaultMode
ignored for the moment, let's revisit the comparison once we'll need to set the desired value for it.
// TODO: refactor this test case to remove the mutation logic as it's misleading. | ||
// This test case should have current/desired/expected fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, it makes the test cases easier to read when each test case only specifies what changes, which is what the mutate
func shows. I'm biased though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird, though, that some test cases also specify the original deployment and the expected deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think mutate()
indeed may keep test cases shorter and clearer (clearer what changed, I use the test case name for this but it's less accurate). However I expected the mutate function to modify the desired (by the operator) state to get the current state while it's the opposite now.
The expected deployment may be different from the desired (e.g. unsolicited containers, volumes). I think it makes sense to verify what externalDNSDeploymentChanged
sends to the API server at the end, that's why we need expectedDeployment
.
expect: true, | ||
expect: false, | ||
expectedDeployment: testDeploymentWithContainers([]corev1.Container{ | ||
testContainer(), | ||
testContainerWithName("extra"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If expect
is false, why specify expectedDeployment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason, removed it in all the test cases to avoid confusion.
f5ccc50
to
77bdc43
Compare
…olicited containers
77bdc43
to
0342aad
Compare
/lgtm |
/label px-approved As per Chris Fields' statement for this release of ExtDNS Operator - the docs are enough. |
@alebedev87: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@alebedev87: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-2793 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Some lessons learnt from the other operators:
defaultMode
for volumes is not part of the expected specification however it's added by the defaulting mechanism which leads to the fight between the operator and API serverOperator
type: unused fields removed