New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MGMT-4066 Add enabled operator CPU and memory requirements to host validation #1167
MGMT-4066 Add enabled operator CPU and memory requirements to host validation #1167
Conversation
Can one of the admins verify this patch? |
15f8a5b
to
fc8fa9c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the spirit of the https://issues.redhat.com/browse/MGMT-4066 story was to take operator requirements into account while checking general OCP memory and cpu requirements in the host validation (like:
id: HasCPUCoresForRole, |
id: HasMemoryForRole, |
GetCPURequirementFor*
and GetMemoryRequirementFor*
Operator
methods.
@gamli75 - is the above description correct?
internal/cluster/validator.go
Outdated
// sum(Node usable memory) > sum (minimum memory required for each enabled operator) | ||
func (v *clusterValidator) sufficientMemoryAvailableForOperators(c *clusterPreprocessContext) validationStatus { | ||
|
||
opMgr := operators.NewManager(v.log) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the manager that's created in main.go
- the OCS validator is stateful and that may in the future have impact on this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Will do.
internal/cluster/validator.go
Outdated
return boolValue(true) | ||
} | ||
|
||
// gather the total memory usable available in the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about extracting total cluster memory collection to a method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look at the code to see if this logic is being used elsewhere. Thanks for the suggestion 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: moved to its own struct method
internal/cluster/validator.go
Outdated
|
||
// gather the total CPU available in the cluster | ||
var tca int64 | ||
for _, h := range c.cluster.Hosts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved as well.
internal/cluster/validation_id.go
Outdated
) | ||
|
||
func (v validationID) category() (string, error) { | ||
switch v { | ||
case IsMachineCidrDefined, isMachineCidrEqualsToCalculatedCidr, isApiVipDefined, isApiVipValid, isIngressVipDefined, isIngressVipValid, | ||
isClusterCidrDefined, isServiceCidrDefined, noCidrOverlapping, networkPrefixValid, IsDNSDomainDefined, IsNtpServerConfigured: | ||
return "network", nil | ||
case AllHostsAreReadyToInstall, SufficientMastersCount: | ||
case AllHostsAreReadyToInstall, SufficientMastersCount, SufficientMemoryAvailable, SufficientCPUAvailable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wonder if this shouldn't be as part of "operators" instead of "hosts-data". Because it's actually operators
related. But I would leave this decision to someone from assisted team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I'd also like to hear @gamli75 's feedback on this.
Agreed. I will refactor the code to align with this idea. |
fc8fa9c
to
0f365e3
Compare
b44b1c1
to
d6dae62
Compare
Yes, this is correct. |
@avishayt What do you think about adding host disk requirement validation? OCS has such requirements and we can add them to the CPU and memory validations. |
Understood, I will refactor the code to do the validation per host instead. |
d6dae62
to
1a82b33
Compare
@priyanka19-98 can you please take a look at this PR? One thing that I found puzzling is that the minimum memory validation for hosts uses the Physical memory (here and here), but OCS and I see CNV as well now, use the Usable memory instead. I think they should all use the usable memory but I leave it up to the maintainers to decide. /cc @gamli75 |
@pkliczewski @jakub-dzon @machacekondra @masayag can you take another look? |
233d5ce
to
47a8e90
Compare
} | ||
|
||
// getHostCPURequirement returns the minimum amount of CPU core count the cluster must have to install the OCS operator | ||
func (o *ocsOperator) getPerHostCPURequirement(cluster *common.Cluster) (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The per host requirements are different and yet to be implemented. As we only perform aggregate resource validations,
for eg, if in compact mode (3 masters), its not compulsary that per host must have atleast 12 cpu, we rely on aggregate 36 cpu and can be present as 7,15,14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you suggest at this point in time? We are implementing per host validation to ensure that all operators can be deployed based on the available hardware.
var cpu, count int64 | ||
switch depType { | ||
case compact: | ||
cpu = o.ocsValidatorConfig.OCSRequiredCompactModeCPUCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OCSRequiredCompactModeCPUCount
, are aggregate resources, and not per host.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're looking at per host validation on this PR, what do you suggest we should do for OCS at this point?
return string(b) | ||
} | ||
|
||
func GenerateInventoryWithResources(cpu, memory int64, hostname string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we use this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's being used to generate a host manifest for testing purposes in the /internal/host/transition_test.go test file
default: | ||
v.log.Errorf("Unexpected role %s", c.host.Role) | ||
return ValidationError | ||
if c.host.Role == models.HostRoleMaster || c.host.Role == models.HostRoleWorker || c.host.Role == models.HostRoleAutoAssign { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to check whether c.host.Role != models.HostRoleBootstrap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not successful at creating a test case with this role type, the logic didn't seem to support a state transition with bootstrap. It might be that there is a specific state in which it is supported, but after looking at the OCS validation code, I did not see any mention of the bootstrap as a type of role to check, so I dropped it altogether.
2c844a5
to
bd052cd
Compare
/test assisted-service-aws |
@jordigilh: The
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test assisted-service-aws |
bd052cd
to
6233a9f
Compare
6233a9f
to
3ac034e
Compare
3ac034e
to
fd696be
Compare
internal/operators/api/api.go
Outdated
// GetMemoryRequirementForMaster provides master memory requirements for the operator in MB | ||
GetMemoryRequirementForMaster(ctx context.Context, cluster *common.Cluster) (int64, error) | ||
GetMemoryRequirementForMaster(cluster *common.Cluster) (int64, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency please either restore context parameter in the four methods above or remove it from GetDisksRequirement*
methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that these methods will be used by an API endpoint (https://issues.redhat.com/browse/MGMT-4477), so context.Context
parameter will be needed to allow for request-id-bound logging; please revert the removal of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood.
fd696be
to
165c230
Compare
/test e2e-metal-assisted |
/test assisted-service-aws |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gamli75, jordigilh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@jordigilh please talk with @lalon4 - we need to add e2e test cases for this validation when operators are enabled. |
165c230
to
7a2bd57
Compare
7a2bd57
to
cb38226
Compare
cb38226
to
6078837
Compare
/lgtm |
This PR updates the CPU and memory validations for the host to add the resources request for the enabled operators to check that the cluster's CPU and usable memory are sufficient to deploy the selected operators (LSO, CNV or OCS) in that node.
@machacekondra @masayag @pkliczewski @jakub-dzon @gamli75 can you take a look in the meanwhile?
/Jordi