New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-13830: Add support for overriding container resources #136
OCPBUGS-13830: Add support for overriding container resources #136
Conversation
@arkadeepsen: This pull request references Jira Issue OCPBUGS-13830, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/jira refresh |
@arkadeepsen: This pull request references Jira Issue OCPBUGS-13830, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @thejasn |
e8edfac
to
8293e70
Compare
/test all |
1 similar comment
/test all |
/test alpha-e2e-operator |
/test e2e-operator |
Can PR description and subject be updated? Maybe https://hypershift-docs.netlify.app/contribute/ is of help. |
@arkadeepsen: This pull request references Jira Issue OCPBUGS-13830, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few suggestions, mostly looks good.
Thanks!
@@ -483,6 +579,54 @@ spec: | |||
additionalProperties: | |||
type: string | |||
type: object | |||
overrideResources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please have the API change and the implementation in separate commits?
@@ -664,7 +664,10 @@ spec: | |||
image: openshift.io/cert-manager-operator:latest | |||
imagePullPolicy: Always | |||
name: cert-manager-operator | |||
resources: {} | |||
resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, by default the operator would end up adding resources.requests with CPU 10m and 32Mi RAM, but no resources.limits for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting to myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the operator container resource requests/limits customizable in any way?
We're setting it to these values here in the CSV which is a default. Asking this because, the original bug report is also about the operator resources and not just for operand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are only setting the requests
so should be ok, but yes limits need to be customizable. Can be done using subscription.spec.config
so not an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, we need to document it later (like we did for logLevels).
+doc-req, cc: @xenolinux
return corev1.ResourceRequirements{ | ||
Limits: mergedResourceLimits, | ||
Requests: mergedResourceRequests, | ||
Claims: mergedResourceClaims, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cert-manager-operator/vendor/k8s.io/api/core/v1/types.go
Lines 2317 to 2329 in 585e060
// Claims lists the names of resources, defined in spec.resourceClaims, | |
// that are used by this container. | |
// | |
// This is an alpha field and requires enabling the | |
// DynamicResourceAllocation feature gate. | |
// | |
// This field is immutable. | |
// | |
// +listType=map | |
// +listMapKey=name | |
// +featureGate=DynamicResourceAllocation | |
// +optional | |
Claims []ResourceClaim `json:"claims,omitempty" protobuf:"bytes,3,opt,name=claims"` |
Not completely sure about ResourceClaims (and if it is supported in OpenShift by default). The struct documentation says it is an alpha field and isn't enabled by default. Probably, for now we shouldn't support this right away is what am leaning towards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#resource-requests-and-limits-of-pod-and-container doesn't seem to have claims :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I had thought as the field is included, better to support the merge functionality. Will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a follow-up comment in #136 (comment).
@@ -76,6 +76,74 @@ func parseArgMap(argMap map[string]string, args []string) { | |||
} | |||
} | |||
|
|||
// mergeContainerResources merges source container resources with that | |||
// provided as override resources. | |||
func mergeContainerResources(sourceResources corev1.ResourceRequirements, overrideResources corev1.ResourceRequirements) corev1.ResourceRequirements { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make use of https://github.com/kubernetes/kubernetes/blob/7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647/pkg/api/v1/resource/helpers.go#L377?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The func signature in this MergeContainerResourceLimits
fn uses v1.Container
type and two things to note:
- it doesn't set
hugepage
related resources - it doesn't set the overriden limit in case the limit was already set in the container (i.e. if we use this and in the future we end up adding resource limits to our operand deployment (which is currently not present) we have to change this implementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signature change shouldn't matter to us, but anyways we can't reuse.
it doesn't set hugepage related resources
We do not want to support hugepages either since this will involve overriding volumes and volumemounts. So until someone files a bug/rfe.
it doesn't set the overriden limit in case the limit was already set in the container (i.e. if we use this and in the future we end up adding resource limits to our operand deployment (which is currently not present) we have to change this implementation)
Yeah, this might be an issue.
if sourceResourceList == nil && overrideResourceList == nil { | ||
return nil | ||
} | ||
|
||
destResourceList := corev1.ResourceList{} | ||
parseResourceList(destResourceList, sourceResourceList) | ||
parseResourceList(destResourceList, overrideResourceList) | ||
|
||
return destResourceList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels redundant as ResourceList
is already a map[resourceName]Quantity. Why can't we just explicitly set values for the Memory
and CPU
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if sourceResourceList == nil && overrideResourceList == nil { | |
return nil | |
} | |
destResourceList := corev1.ResourceList{} | |
parseResourceList(destResourceList, sourceResourceList) | |
parseResourceList(destResourceList, overrideResourceList) | |
return destResourceList | |
if sourceResourceList == nil && overrideResourceList == nil { | |
return nil | |
} | |
destResourceList := corev1.ResourceList{} | |
if !sourceResourceList[corev1.ResourceCPU].IsZero() { | |
destResourceList[corev1.ResourceCPU] = sourceResourceList[corev1.ResourceCPU].DeepCopy() | |
} | |
if !sourceResourceList[corev1.ResourceMemory].IsZero() { | |
destResourceList[corev1.ResourceMemory] = sourceResourceList[corev1.ResourceMemory].DeepCopy() | |
} | |
return destResourceList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if sourceResourceList == nil && overrideResourceList == nil { | |
return nil | |
} | |
destResourceList := corev1.ResourceList{} | |
parseResourceList(destResourceList, sourceResourceList) | |
parseResourceList(destResourceList, overrideResourceList) | |
return destResourceList | |
if overrideResourceList == nil { | |
return sourceResourceList | |
} | |
if sourceResourceList == nil { | |
sourceResourceList := corev1.ResourceList{} | |
} | |
if !overrideResourceList[corev1.ResourceCPU].IsZero() { | |
sourceResourceList[corev1.ResourceCPU] = overrideResourceList[corev1.ResourceCPU].DeepCopy() | |
} | |
if !overrideResourceList[corev1.ResourceMemory].IsZero() { | |
sourceResourceList[corev1.ResourceMemory] = overrideResourceList[corev1.ResourceMemory].DeepCopy() | |
} | |
return sourceResourceList | |
Would this serve the purpose?
|
||
// mergeContainerResourceList merges source resource list with that | ||
// provided as override resource list. | ||
func mergeContainerResourceList(sourceResourceList corev1.ResourceList, overrideResourceList corev1.ResourceList) corev1.ResourceList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically not a merge in case of ResourceList
since we are basically overriding only here. So might as well rename this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't we merge the overriding resources with the source's resources if, say, only limits is provided in the overriding resources and not requests, whereas source's resources has both limits and requests?
If we override then whatever is in the overriding resources, even if either of empty limits or requests, will override the source's limits and requests even if something is already set there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding resources takes priority so we set whatever is set there as the final value. What you are suggesting would be the case if one could update source's resources via a different channel as compared overrideResources
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the overrideResources is empty would we remove the default resources, though currently we don't set any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will never have default resources, so yes. And if overrideResources
is nil
nothing to do but if overrideResources
is {}
then yes that will take precedence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, it is possible that API server endpoints use a cert-manager issued certificate and in that case if the operator/cert-manager operand goes down it could be a SLO problem. I'd suggest preferring to refrain from adding any resource limits as of now (unless user themselves configures as such: which with this implementation they can do through overrideResources for operand and through Subscription object for the operator), also partly because upstream cert-manager also doesn't enforce these limits yet so we don't know what are good enough sane values for clusters with large certificate usage.
We might not set it right now, but the implementation should take into account the changes we might have in the future. So if we decide to add some default requests, an empty overrideResources
will always end up removing that if we decide that it will override and not merge with default resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, implementation should be in a way that it takes care of such scenarios.
I would say,
originalDeployment:
resources:
requests: X
limits: nil
overrides:
resources:
resources: nil
limits: Y
should translate to:
finalDeployment:
resources:
requests: X
limits: Y
Hope it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whereas,
originalDeployment:
resources:
requests: X
limits: nil
overrides:
resources:
resources: Z
limits: Y
should still translate to:
finalDeployment:
resources:
requests: Z
limits: Y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this scenario?
originalDeployment:
resources:
requests:
cpu: X
limits: nil
overrides:
resources:
requests:
memory: Y
limits: nil
Should it translate to:
finalDeployment:
resources:
requests:
memory: Y
limits: nil
OR:
finalDeployment:
resources:
requests:
cpu: X
memory: Y
limits: nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was meaning to say, we stick to:
finalDeployment:
resources:
requests:
cpu: X
memory: Y
limits: nil
in such scenarios (which involves merging individual resource types, iff not nil
).
|
||
// +kubebuilder:validation:Optional | ||
// +optional | ||
OverrideResources corev1.ResourceRequirements `json:"overrideResources,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this type to use a self-created struct that just contains Resources and Limits (essentially stripping out Claims as well, which would be counter-intuitive for discouraging users about adding any Claims as well).
It's better to not rely upon upstream type, rather have a copied one with only the necessary fields, xref: openshift/api#1387 (comment)
…r, and cert-manager-webhook deploymemts. Add resource requests for cert-manager-operator deployment. Add e2e test for OverrideResources field in DeploymentConfig. Signed-off-by: arkadeepsen <arsen@redhat.com>
be33077
to
2bae4c7
Compare
}, | ||
}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a add another or a couple test case(s) to test the behaviour we discussed in case of: #136 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkadeepsen: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest-required |
}, | ||
{ | ||
name: "override resources limits and requests merges with source resource limits and requests respectively", | ||
sourceResources: corev1.ResourceRequirements{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add test cases for testing limits and request as nil explicitly?
Similar to:
{
name: "override resources limits and requests merges with source resource limits and requests respectively",
sourceResources: corev1.ResourceRequirements{
Limits: nil,
Requests: corev1.ResourceList{
corev1.ResourceCPU: k8sresource.MustParse("10m"),
},
},
overrideResources: v1alpha1.CertManagerResourceRequirements{
Limits: corev1.ResourceList{
corev1.ResourceCPU: k8sresource.MustParse("400m"),
},
Requests: corev1.ResourceList{
corev1.ResourceMemory: k8sresource.MustParse("64Mi"),
},
},
expected: corev1.ResourceRequirements{
Limits: corev1.ResourceList{
corev1.ResourceCPU: k8sresource.MustParse("400m"),
},
Requests: corev1.ResourceList{
corev1.ResourceCPU: k8sresource.MustParse("10m"),
corev1.ResourceMemory: k8sresource.MustParse("64Mi"),
},
},
}
t.Fatal("Informer did not get the added cert manager object") | ||
} | ||
|
||
err = withContainerResourcesValidateHook(certManagerInformers, tc.deploymentName)(nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkadeepsen I believe we are not really testing anything with this function correct me if I'm mistaken. Since you passing nil
as args to the returned function which doesn't use any of the arguments in the returned function. I don't see the value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can ditch this in favour of covering this as part of e2e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkadeepsen I believe we are not really testing anything with this function correct me if I'm mistaken. Since you passing nil as args to the returned function which doesn't use any of the arguments in the returned function. I don't see the value here.
The function which is returned doesn't use any of the arguments. Thus passing anything in the call wouldn't make any difference. Also the withContainerResourcesValidateHook
function is similar to the existing ValidateHook
functions.
We can ditch this in favour of covering this as part of e2e.
The unit test is added to test whether this function is working as expected. We are anyway testing the overall functionality through the e2e test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unit test is added to test whether this function is working as expected.
Yes, but if what is the critical business logic that you are testing here? All the functions that are constructed inside withContainerResourcesValidateHook()
are not dependent on the args and is only dependent on the cluster object. validateResources
is the actual function you we want to test (and this function is pretty much static if we remove the dep on the informer), I'd suggest to update the scope and make it testable directly. WDYT? This would reduce the boiler plate required for tests and also tests the critical bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refer to my other comment on #136 (comment), ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll change the scope of validateResources
and add unit test for it. I'll remove the unit test for withContainerResourcesValidateHook
.
}, | ||
}, | ||
}, | ||
overrideFn: func(_ certmanagerinformer.CertManagerInformer, _ string) (v1alpha1.CertManagerResourceRequirements, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arkadeepsen I don't the value running withContainerResourcesOverrideHook()
in the test if are also passing the override test function. We can ditch this in favour of covering this as part of e2e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unit test is for checking whether the withContainerResourcesOverrideHook
is working correctly or not with the inputs provided. The e2e test will anyway test the overall functionality of the feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all deps/critical functions mocked, it is not really testing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for removing this, since that'd favour us having less redundant tests to maintain in the future.
The e2e test will anyway test the overall functionality of the feature.
Yup, right. @arkadeepsen as discussed previously, (for future/follow-up) let's stick to our plan of adding an extensible function that inputs (overrideX, expectedDeployment) and e2e tests just that. It'd help test overrideArgs, overrideEnv, overrideResources in a neat way with all test cases in a unified place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll remove the unit test for withContainerResourcesOverrideHook
.
}, | ||
}, | ||
{ | ||
name: "override resources limits and requests merges with source resource limits and requests respectively", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit: would you mind changing to:
"override resources replace source limits and requests" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name: "override resources limits and requests merges with source resource limits and requests respectively", | |
name: "override resources replace source limits and requests", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test actually merges. There's not replace of existing values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, then I don’t honestly have a better naming (wanted to remove “respectively”, but fine for now :). Please leave it as is.
} | ||
} | ||
|
||
func TestGetOverrideResourcesFor(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn’t remove this one, but rather keep it. Cause, it effectively tests the switch case in the actual func that switches between different deployments (in the past we’ve seen a bug where we accidentally made mistakes in such For functions, so better to retain this).
… TestWithContainerResourcesValidateHook and TestWithContainerResourcesOverrideHook.
/lgtm |
/cc @davemulford TIA all! |
/label docs-approved |
2 similar comments
/label docs-approved |
/label docs-approved |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: arkadeepsen, swghosh, thejasn The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/label px-approved |
/label qe-approved |
@arkadeepsen: Jira Issue OCPBUGS-13830: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-13830 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This PR adds the OverrideResources field to DeploymentConfig. This would add the support of overriding the resources fields of the cert-manager, the cert-manager-cainjector, and the cert-manager-webhook deploymemts. Additionally, the resource requests for cert-manager-operator deployment is added.