Skip to content

OCPSTRAT-2321:Adding TLS profile observed ci#77236

Open
gangwgr wants to merge 1 commit intoopenshift:mainfrom
gangwgr:tls-ci-observer
Open

OCPSTRAT-2321:Adding TLS profile observed ci#77236
gangwgr wants to merge 1 commit intoopenshift:mainfrom
gangwgr:tls-ci-observer

Conversation

@gangwgr
Copy link
Copy Markdown
Contributor

@gangwgr gangwgr commented Apr 1, 2026

Adding ci jobs for openshift/origin#30801

Summary by CodeRabbit

  • Tests
    • Added TLS observed-config test suite for standard and Hypershift environments.
    • Enabled periodic automated testing (72-hour intervals) for TLS configuration validation.
    • Added pre-submission validation checks for TLS configuration across main and release branches.
    • Expanded test coverage for default TLS and TLS 1.3 conformance scenarios.

@openshift-ci openshift-ci Bot requested review from p0lyn0mial and sjenning April 1, 2026 11:33
@gangwgr gangwgr changed the title Adding TLS profile observed test cases Adding TLS profile observed ci Apr 1, 2026
@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 1, 2026

/testwith openshift/origin/main/e2e-aws-tls-observed-config openshift/origin#30801

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 1, 2026

@gangwgr, testwith: could not generate prow job. ERROR:

BUG: test 'e2e-aws-tls-observed-config' not found in injected config

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 2, 2026

/p-rehearse

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 2, 2026

/pj-rehearse ack

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci-robot openshift-ci-robot added the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Apr 2, 2026
@gangwgr gangwgr changed the title Adding TLS profile observed ci OCPSTRAT-2321:Adding TLS profile observed ci Apr 2, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 2, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 2, 2026

@gangwgr: This pull request references OCPSTRAT-2321 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the outcome to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Adding ci jobs for openshift/origin#30801

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown
Contributor

@wangke19 wangke19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: OCPSTRAT-2321 - Adding TLS profile observed CI

Summary

This PR adds CI jobs for testing TLS configuration propagation across OpenShift components. While the YAML configuration is syntactically correct and follows OpenShift CI conventions, there are critical resource efficiency concerns that should be addressed before merging.


❌ Critical Issues

1. Resource Waste: Dedicated Cluster for Small Test Suite

The hypershift-aws-conformance workflow defaults to running the full openshift/conformance/parallel suite, but this PR overrides it to run only openshift/tls-observed-config.

Impact:

  • Provisions entire HyperShift cluster (~10-15 min)
  • Runs only ~90 TLS-specific tests (~40 min)
  • Tears down cluster (~5 min)
  • Total: ~1 hour of infrastructure for a small subset of tests

Test suite breakdown (from origin#30801):

  • 11 target components (image-registry, controller-manager, kube-apiserver, etc.)
  • 8 test cases per target (ObservedConfig, ConfigMap injection, env vars, wire-level TLS)
  • 2 cluster-wide disruptive tests (Modern/Custom TLS profile changes)
  • Total: ~90 test cases

Questions:

  1. Why does openshift/tls-observed-config need dedicated cluster infrastructure instead of running as part of existing conformance jobs?
  2. Have you considered the cost/benefit ratio of provisioning entire clusters for this test suite?

Recommendations:

  • Option A (Preferred): Include openshift/tls-observed-config in the regular openshift/conformance/parallel suite
  • Option B: Combine multiple test suites in one job to amortize cluster provisioning costs
  • Option C: If isolation is required, document the justification in the PR description

2. Inefficient Test Architecture: In-Test Profile Switching

The test suite includes 2 disruptive tests that:

  1. Change cluster TLS profile to Modern
  2. Wait for cluster rollout (~20-30 min)
  3. Validate all targets
  4. Restore original profile
  5. Wait for rollout again (~20-30 min)
  6. Repeat for Custom profile

This is extremely wasteful. Instead of changing profiles mid-test and waiting for rollouts, consider pre-configuring different test environments with different TLS profiles:

# Job 1: Default/Intermediate profile
- as: e2e-aws-tls-observed-config
  env:
    TEST_SUITE: openshift/tls-observed-config

# Job 2: Modern TLS profile (pre-configured)
- as: e2e-aws-tls-observed-config-modern
  env:
    TEST_SUITE: openshift/tls-observed-config
  # Add pre-install step to configure Modern TLS profile

# Job 3: Custom TLS profile (pre-configured)
- as: e2e-aws-tls-observed-config-custom
  env:
    TEST_SUITE: openshift/tls-observed-config
  # Add pre-install step to configure Custom TLS profile

Benefits:

  • Parallel execution - All 3 profiles tested simultaneously
  • No disruptive changes - Cluster pre-configured, no rollout waiting
  • Faster execution - Each job ~20 min instead of 60+ min
  • Better isolation - Profile-specific failures isolated
  • Easier debugging - Clear separation of concerns

Note: OpenShift CI already has precedent for this - see openshift-e2e-aws-ovn-tls-13-workflow.yaml which uses a tls-13 pre-install step.


⚠️ Questions & Clarifications Needed

3. Observer Configuration Inconsistency

# e2e-aws-tls-observed-config
observers:
  enable:
  - observers-resource-watch

# e2e-hypershift-tls-observed-config
# NO observers section

Question: Why is observers-resource-watch only enabled on the AWS job and not on HyperShift? Is this intentional or an oversight?


4. Resource Allocation

Both Prow jobs specify:

resources:
  requests:
    cpu: 10m

Question: Is 10m CPU sufficient for e2e tests? This seems very low for test execution that provisions clusters and runs 90+ test cases.


📝 Recommendations Summary

Before merging, please address:

  1. Justify dedicated cluster infrastructure or modify to run in existing conformance jobs
  2. Consider pre-configured TLS profile environments instead of in-test profile switching
  3. Clarify observer configuration difference between AWS and HyperShift
  4. Verify CPU resource allocation is appropriate

Optional improvements:

  • Add comments in YAML explaining why tests are optional: true and always_run: false
  • Document relationship to OCPSTRAT-2321 and the test architecture decisions

Related References

  • Upstream test implementation: openshift/origin#30801
  • Existing TLS profile workflow: openshift-e2e-aws-ovn-tls-13-workflow.yaml

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2026
@openshift-ci openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 7, 2026
@openshift-ci-robot openshift-ci-robot removed the rehearsals-ack Signifies that rehearsal jobs have been acknowledged label Apr 7, 2026
@gangwgr gangwgr force-pushed the tls-ci-observer branch 4 times, most recently from dace9dc to 268539c Compare April 7, 2026 16:04
steps:
cluster_profile: openshift-org-aws
env:
TEST_SUITE: openshift/tls-observed-config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 77. We needed to start using large compute nodes so that we could process the scans faster by running them in parallel.

Did you experience issues with long run times when you tried this?

steps:
cluster_profile: hypershift-aws
env:
TEST_SUITE: openshift/tls-observed-config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment here.

@richardsonnick
Copy link
Copy Markdown
Contributor

Is there any reason against using this over the tls-scanner CI step? The test suite looks like it would be more declarative than the tls-scanner CI step that observes the entire cluster.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 10, 2026

Is there any reason against using this over the tls-scanner CI step? The test suite looks like it would be more declarative than the tls-scanner CI step that observes the entire cluster.

The tls-observed-config suite is complementary to the tls-scanner, not a replacement. The scanner does a broad cluster-wide audit of actual TLS versions/ciphers on every pod. The observed-config tests validate the propagation chain — that operators correctly observe the APIServer TLS profile, inject it into ConfigMaps, set deployment env vars, and actually serve the right TLS version at the wire level. It also tests dynamic config changes (switching to Modern/Custom profiles and verifying propagation + self-healing). The scanner wouldn't catch a broken propagation path if the default TLS version happens to be correct.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 10, 2026

@richardsonnick also we can add one step in your job to run these cases

@richardsonnick
Copy link
Copy Markdown
Contributor

@richardsonnick also we can add one step in your job to run these cases

Adding a new ci job is probably best, since this would add a considerable runtime overhead cost for scanning the whole cluster + running these more targeted tests

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 14, 2026

@richardsonnick also we can add one step in your job to run these cases

Adding a new ci job is probably best, since this would add a considerable runtime overhead cost for scanning the whole cluster + running these more targeted tests

can you please approve this pr? so we can merge this pr

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/pj-rehearse periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/pj-rehearse periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config-hypershift

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config-hypershift

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/pj-rehearse periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/test periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift openshift/origin#31046

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

@gangwgr: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test app-ci-config-dry
/test boskos-config
/test boskos-config-generation
/test build03-dry
/test build04-dry
/test build05-dry
/test build06-dry
/test build07-dry
/test build08-dry
/test build09-dry
/test build10-dry
/test build11-dry
/test check-gh-automation
/test check-gh-automation-tide
/test check-trigger-trusted-apps
/test ci-operator-config
/test ci-operator-config-metadata
/test ci-operator-registry
/test ci-secret-bootstrap-config-validation
/test ci-testgrid-allow-list
/test clusterimageset-validate
/test config
/test core-ci-config-dry
/test core-valid
/test generated-config
/test generated-dashboards
/test hosted-mgmt-dry
/test image-mirroring-config-validation
/test jira-lifecycle-config
/test labels
/test openshift-image-mirror-mappings
/test ordered-prow-config
/test owners
/test pr-reminder-config
/test prow-config
/test prow-config-filenames
/test prow-config-semantics
/test pylint
/test release-config
/test release-controller-config
/test rover-groups-config-validation
/test secret-generator-config-valid
/test services-valid
/test stackrox-stackrox-stackrox-stackrox-check
/test step-registry-metadata
/test step-registry-shellcheck
/test sync-rover-groups
/test verified-config
/test vsphere02-dry
/test yamllint

The following commands are available to trigger optional jobs:

/test check-cluster-profiles-config

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-release-check-gh-automation
pull-ci-openshift-release-main-ci-operator-config
pull-ci-openshift-release-main-ci-operator-config-metadata
pull-ci-openshift-release-main-ci-operator-registry
pull-ci-openshift-release-main-config
pull-ci-openshift-release-main-core-valid
pull-ci-openshift-release-main-generated-config
pull-ci-openshift-release-main-ordered-prow-config
pull-ci-openshift-release-main-owners
pull-ci-openshift-release-main-prow-config-filenames
pull-ci-openshift-release-main-prow-config-semantics
pull-ci-openshift-release-main-release-controller-config
pull-ci-openshift-release-openshift-image-mirror-mappings
pull-ci-openshift-release-yamllint
Details

In response to this:

/test periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift openshift/origin#31046

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/testwith periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift openshift/origin#31046

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

@gangwgr, testwith: Error processing request. ERROR:

could not determine job runs: requested job is invalid. needs to be formatted like: <org>/<repo>/<branch>/<variant?>/<job>. instead it was: periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 21, 2026

/testwith openshift/tls-scanner/main/periodic-tls-observed-config-hypershift openshift/origin#31046

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 21, 2026

@gangwgr, testwith: could not generate prow job. ERROR:

BUG: test 'periodic-tls-observed-config-hypershift' not found in injected config

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 22, 2026

/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@gangwgr
Copy link
Copy Markdown
Contributor Author

gangwgr commented Apr 22, 2026

/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 22, 2026

@gangwgr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config-hypershift 955ace7 link unknown /pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config-hypershift
ci/rehearse/periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift 955ace7 link unknown /pj-rehearse periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift
ci/rehearse/periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config 955ace7 link unknown /pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Add tls-observed-config presubmit and periodic jobs to tls-scanner
for main and release-4.22. Add hypershift base images required by
hypershift-aws-conformance workflow.

Made-with: Cursor
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@gangwgr: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-tls-scanner-main-tls-observed-config openshift/tls-scanner presubmit Presubmit changed
pull-ci-openshift-tls-scanner-main-tls-observed-config-hypershift openshift/tls-scanner presubmit Presubmit changed
pull-ci-openshift-tls-scanner-release-4.22-tls-observed-config openshift/tls-scanner presubmit Presubmit changed
pull-ci-openshift-tls-scanner-release-4.22-tls-observed-config-hypershift openshift/tls-scanner presubmit Presubmit changed
pull-ci-openshift-tls-scanner-main-default-pqc-readiness openshift/tls-scanner presubmit Ci-operator config changed
pull-ci-openshift-tls-scanner-main-default-tls openshift/tls-scanner presubmit Ci-operator config changed
pull-ci-openshift-tls-scanner-main-images openshift/tls-scanner presubmit Ci-operator config changed
pull-ci-openshift-tls-scanner-main-tls13-conformance openshift/tls-scanner presubmit Ci-operator config changed
pull-ci-openshift-tls-scanner-main-tls13-pqc-readiness openshift/tls-scanner presubmit Ci-operator config changed
pull-ci-openshift-tls-scanner-main-unit openshift/tls-scanner presubmit Ci-operator config changed
pull-ci-openshift-tls-scanner-release-4.22-default-tls openshift/tls-scanner presubmit Ci-operator config changed
pull-ci-openshift-tls-scanner-release-4.22-images openshift/tls-scanner presubmit Ci-operator config changed
pull-ci-openshift-tls-scanner-release-4.22-tls13-conformance openshift/tls-scanner presubmit Ci-operator config changed
pull-ci-openshift-tls-scanner-release-4.22-unit openshift/tls-scanner presubmit Ci-operator config changed
periodic-ci-openshift-tls-scanner-main-periodic-default-tls N/A periodic Ci-operator config changed
periodic-ci-openshift-tls-scanner-main-periodic-tls13-conformance N/A periodic Ci-operator config changed
periodic-ci-openshift-tls-scanner-release-4.22-periodic-default-tls N/A periodic Ci-operator config changed
periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config N/A periodic Periodic changed
periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config-hypershift N/A periodic Periodic changed
periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift N/A periodic Periodic changed
periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config N/A periodic Periodic changed
periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls13-conformance N/A periodic Ci-operator config changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants