Skip to content

[release-4.13] OCPBUGS-14432: Check that number of replicas matches hosts#7221

Closed
openshift-cherrypick-robot wants to merge 1 commit intoopenshift:release-4.13from
openshift-cherrypick-robot:cherry-pick-7059-to-release-4.13
Closed

[release-4.13] OCPBUGS-14432: Check that number of replicas matches hosts#7221
openshift-cherrypick-robot wants to merge 1 commit intoopenshift:release-4.13from
openshift-cherrypick-robot:cherry-pick-7059-to-release-4.13

Conversation

@openshift-cherrypick-robot
Copy link
Copy Markdown

This is an automated cherry-pick of #7059

/assign bfournie

If the number of compute replicas is not configured, a default
of 3 will be used. If the number of baremetal hosts configured
does not match this, a failure will occur at installation time.

Add a couple messages when the ISO is created to help catch
config problems with the replicas before installation.
- An info message with the number of replicas being used which is
useful if not configured
- A warning message if the number of replicas does not match
the number of configured hosts
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@openshift-cherrypick-robot: Jira Issue OCPBUGS-10342 has been cloned as Jira Issue OCPBUGS-14432. Will retitle bug to link to clone.
/retitle [release-4.13] OCPBUGS-14432: Check that number of replicas matches hosts

Details

In response to this:

This is an automated cherry-pick of #7059

/assign bfournie

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot changed the title [release-4.13] OCPBUGS-10342: Check that number of replicas matches hosts [release-4.13] OCPBUGS-14432: Check that number of replicas matches hosts Jun 1, 2023
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 1, 2023
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@openshift-cherrypick-robot: This pull request references Jira Issue OCPBUGS-14432, which is invalid:

  • expected dependent Jira Issue OCPBUGS-10342 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), but it is MODIFIED instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This is an automated cherry-pick of #7059

/assign bfournie

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot requested review from andfasano and bfournie June 1, 2023 14:43
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 1, 2023

@openshift-cherrypick-robot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-disruptive cb02bfe link false /test e2e-aws-ovn-disruptive

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@andfasano
Copy link
Copy Markdown
Contributor

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 6, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andfasano

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2023
numWorkers++
}
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fail spuriously in a case like:

- role: ""
- role: ""
- role: master
- role: master
- role: master

We need to iterate over the hosts twice, once to count the explicit roles and once for the implicit ones.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the original validation was taken from the #5205, where the very same test scenario was addressed by having the hosts sorted by role before applying the validation.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andfasano, yes it was based on the baremetal validation. As this list will be sorted I think we can use the one loop as it currently has and not need to iterate over the hosts a 2nd time.

// If not defined, the roles will be matched to replicas
if numMasters < numRequiredMasters {
numMasters++
} else if numWorkers < numRequiredWorkers {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be unconditional if we want to catch the case where there are more hosts defined than replicas.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. As this PR is auto cherry-pick of #7059 I'll create a new master PR/bug and fix it there, then cherry-pick that.

if numMasters != numRequiredMasters {
logrus.Warnf("The number of hosts configured as masters (%d) does not match the master replicas (%d)", numMasters, numRequiredMasters)
}
if numWorkers != numRequiredWorkers {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that defining hosts is completely optional, I don't think we should have a warning in the case where numWorkers == 0 (and likewise for numMasters == 0 above).
It would be really unusual to define host-specific data for only some hosts in a particular role, so I think in those cases it makes sense to have a warning.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a check for numWorkers/numMasters != 0. Per the previous comment I'll add this a new master PR.

@bfournie
Copy link
Copy Markdown
Contributor

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 26, 2023
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@bfournie: This pull request references Jira Issue OCPBUGS-14432, which is valid.

6 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.13.z) matches configured target version for branch (4.13.z)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
  • dependent bug Jira Issue OCPBUGS-10342 is in the state Verified, which is one of the valid states (VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE))
  • dependent Jira Issue OCPBUGS-10342 targets the "4.14.0" version, which is one of the valid target versions: 4.14.0
  • bug has dependents

Requesting review from QA contact:
/cc @mhanss

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot requested a review from mhanss June 26, 2023 17:44
@zaneb
Copy link
Copy Markdown
Member

zaneb commented Jun 26, 2023

/hold Let's fix the issues with this on master first, and backport everything at the same time.

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2023
@bfournie
Copy link
Copy Markdown
Contributor

This backport is now included in #7423 along with the backport for #7268. Closing this
/close

@openshift-ci openshift-ci Bot closed this Aug 15, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Aug 15, 2023

@bfournie: Closed this PR.

Details

In response to this:

This backport is now included in #7423 along with the backport for #7268. Closing this
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants