Skip to content

[DNM] increase compute nodes for serverless lp-interop jobs#79864

Closed
maschmid wants to merge 5 commits into
openshift:mainfrom
maschmid:maschmid-serverless-lp-interop-investigation
Closed

[DNM] increase compute nodes for serverless lp-interop jobs#79864
maschmid wants to merge 5 commits into
openshift:mainfrom
maschmid:maschmid-serverless-lp-interop-investigation

Conversation

@maschmid

@maschmid maschmid commented May 29, 2026

Copy link
Copy Markdown
Contributor

test only,

see if increasing the node count helps with the serverless lp-interop job stability

Summary by CodeRabbit

This PR updates OpenShift CI job configuration for the OpenShift Knative serverless-operator lp-interop jobs (release 1.37 on OCP 4.22) to investigate stability by changing how AWS test jobs are provisioned.

What changed, in practical terms:

  • ci-operator configuration file updated: ci-operator/config/openshift-knative/serverless-operator/openshift-knative-serverless-operator-release-1.37__ocp-4.22-lp-interop.yaml
  • Two AWS-based lp-interop jobs (the cr-operator-e2e-aws and aws-fips variants) now:
    • Set COMPUTE_NODE_REPLICAS: "6" — increase compute node count for the job.
    • Set ZONES_COUNT: "1" — constrain the job to a single availability zone.
    • Append SKIP_SPOT_INSTANCES=true to relevant test/command steps to avoid spot instances.

Why:

  • Test-only change to determine whether increasing compute node count, avoiding spot instances, and restricting to a single zone improves stability of the serverless lp-interop CI runs.

Status and notes:

  • PR is on hold by the author and rehearse runs have been triggered for the affected periodic job to validate the changes.
  • Configuration-only YAML changes; no code or public API changes.

@maschmid

Copy link
Copy Markdown
Contributor Author

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 29, 2026
@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds COMPUTE_NODE_REPLICAS: "6" and ZONES_COUNT: "1" to two AWS test jobs and appends SKIP_SPOT_INSTANCES=true to several e2e test command lines in the serverless-operator release CI YAML.

Changes

AWS Test Job Environment and Command Updates

Layer / File(s) Summary
cr-operator-e2e-aws env and command updates
ci-operator/config/openshift-knative/serverless-operator/openshift-knative-serverless-operator-release-1.37__ocp-4.22-lp-interop.yaml
Adds COMPUTE_NODE_REPLICAS: "6" and ZONES_COUNT: "1" to the cr-operator-e2e-aws job env and appends SKIP_SPOT_INSTANCES=true to multiple operator-e2e/e2e test commands.
aws-fips env and command updates
ci-operator/config/openshift-knative/serverless-operator/openshift-knative-serverless-operator-release-1.37__ocp-4.22-lp-interop.yaml
Adds COMPUTE_NODE_REPLICAS: "6" and ZONES_COUNT: "1" to the aws-fips job env and appends SKIP_SPOT_INSTANCES=true to multiple operator-e2e/e2e test commands.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

suggested labels: rehearsals-ack

🚥 Pre-merge checks | ✅ 5 | ❌ 10

❌ Failed checks (10 inconclusive)

Check name Status Explanation Resolution
Stable And Deterministic Test Names ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Test Structure And Quality ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Microshift Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Single Node Openshift (Sno) Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Topology-Aware Scheduling Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Ote Binary Stdout Contract ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Ipv6 And Disconnected Network Test Compatibility ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
No-Weak-Crypto ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
Container-Privileges ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
No-Sensitive-Data-In-Logs ❓ Inconclusive Repository clone failed, so this custom check could not run with code access. Retry the review run. If this persists, inspect pre-merge custom-check logs for infrastructure or agent runtime failures.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[DNM] increase compute nodes for serverless lp-interop jobs' accurately reflects the main change: updating compute node configuration (COMPUTE_NODE_REPLICAS: "6") in the lp-interop YAML file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from creydr and matzew May 29, 2026 10:18
@openshift-ci

openshift-ci Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: maschmid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 29, 2026
@maschmid

Copy link
Copy Markdown
Contributor Author

/pj-rehearse

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@maschmid: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

so that SO `use_spot_instances` does not need to replace the machinesets
@maschmid

maschmid commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@maschmid: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@maschmid

maschmid commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-knative-serverless-operator-release-1.37-ocp-4.22-lp-interop-cr-operator-e2e-aws

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@maschmid: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@maschmid

maschmid commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@maschmid: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@maschmid

maschmid commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@maschmid: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@maschmid

maschmid commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@maschmid: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ci-operator/config/openshift-knative/serverless-operator/openshift-knative-serverless-operator-release-1.37__ocp-4.22-lp-interop.yaml (1)

174-174: Note: Single-zone deployment reduces redundancy.

Setting ZONES_COUNT: "1" means all 6 compute nodes will be provisioned in a single availability zone, eliminating multi-AZ redundancy. This is reasonable for a stability investigation (reduces network variability), but be aware that this configuration differs from typical production multi-zone deployments.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/config/openshift-knative/serverless-operator/openshift-knative-serverless-operator-release-1.37__ocp-4.22-lp-interop.yaml`
at line 174, The manifest currently sets ZONES_COUNT: "1", which forces all
compute nodes into a single availability zone and removes multi-AZ redundancy;
if this was accidental, change ZONES_COUNT to "3" (or the cluster's expected AZ
count) to restore multi-zone redundancy, otherwise explicitly document the
intent by adding a comment/annotation near the ZONES_COUNT setting indicating
this is a deliberate single-zone configuration for stability testing so
reviewers know it is intentional.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@ci-operator/config/openshift-knative/serverless-operator/openshift-knative-serverless-operator-release-1.37__ocp-4.22-lp-interop.yaml`:
- Line 174: The manifest currently sets ZONES_COUNT: "1", which forces all
compute nodes into a single availability zone and removes multi-AZ redundancy;
if this was accidental, change ZONES_COUNT to "3" (or the cluster's expected AZ
count) to restore multi-zone redundancy, otherwise explicitly document the
intent by adding a comment/annotation near the ZONES_COUNT setting indicating
this is a deliberate single-zone configuration for stability testing so
reviewers know it is intentional.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 63536e7c-3097-42a0-8b97-bcadd5934aab

📥 Commits

Reviewing files that changed from the base of the PR and between 60798f6 and 6631bf1.

📒 Files selected for processing (1)
  • ci-operator/config/openshift-knative/serverless-operator/openshift-knative-serverless-operator-release-1.37__ocp-4.22-lp-interop.yaml

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@maschmid: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-knative-serverless-operator-release-1.37-ocp-4.22-lp-interop-aws-fips N/A periodic Ci-operator config changed
periodic-ci-openshift-knative-serverless-operator-release-1.37-ocp-4.22-lp-interop-cr-operator-e2e-aws N/A periodic Ci-operator config changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@maschmid

maschmid commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

/pj-rehearse

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@maschmid: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci

openshift-ci Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@maschmid: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@maschmid maschmid closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant