OCPSTRAT-2321:Adding TLS profile observed ci#77236
OCPSTRAT-2321:Adding TLS profile observed ci#77236gangwgr wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
/testwith openshift/origin/main/e2e-aws-tls-observed-config openshift/origin#30801 |
|
@gangwgr, |
|
/p-rehearse |
|
/pj-rehearse ack |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@gangwgr: This pull request references OCPSTRAT-2321 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the outcome to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Code Review: OCPSTRAT-2321 - Adding TLS profile observed CI
Summary
This PR adds CI jobs for testing TLS configuration propagation across OpenShift components. While the YAML configuration is syntactically correct and follows OpenShift CI conventions, there are critical resource efficiency concerns that should be addressed before merging.
❌ Critical Issues
1. Resource Waste: Dedicated Cluster for Small Test Suite
The hypershift-aws-conformance workflow defaults to running the full openshift/conformance/parallel suite, but this PR overrides it to run only openshift/tls-observed-config.
Impact:
- Provisions entire HyperShift cluster (~10-15 min)
- Runs only ~90 TLS-specific tests (~40 min)
- Tears down cluster (~5 min)
- Total: ~1 hour of infrastructure for a small subset of tests
Test suite breakdown (from origin#30801):
- 11 target components (image-registry, controller-manager, kube-apiserver, etc.)
- 8 test cases per target (ObservedConfig, ConfigMap injection, env vars, wire-level TLS)
- 2 cluster-wide disruptive tests (Modern/Custom TLS profile changes)
- Total: ~90 test cases
Questions:
- Why does
openshift/tls-observed-configneed dedicated cluster infrastructure instead of running as part of existing conformance jobs? - Have you considered the cost/benefit ratio of provisioning entire clusters for this test suite?
Recommendations:
- Option A (Preferred): Include
openshift/tls-observed-configin the regularopenshift/conformance/parallelsuite - Option B: Combine multiple test suites in one job to amortize cluster provisioning costs
- Option C: If isolation is required, document the justification in the PR description
2. Inefficient Test Architecture: In-Test Profile Switching
The test suite includes 2 disruptive tests that:
- Change cluster TLS profile to Modern
- Wait for cluster rollout (~20-30 min)
- Validate all targets
- Restore original profile
- Wait for rollout again (~20-30 min)
- Repeat for Custom profile
This is extremely wasteful. Instead of changing profiles mid-test and waiting for rollouts, consider pre-configuring different test environments with different TLS profiles:
# Job 1: Default/Intermediate profile
- as: e2e-aws-tls-observed-config
env:
TEST_SUITE: openshift/tls-observed-config
# Job 2: Modern TLS profile (pre-configured)
- as: e2e-aws-tls-observed-config-modern
env:
TEST_SUITE: openshift/tls-observed-config
# Add pre-install step to configure Modern TLS profile
# Job 3: Custom TLS profile (pre-configured)
- as: e2e-aws-tls-observed-config-custom
env:
TEST_SUITE: openshift/tls-observed-config
# Add pre-install step to configure Custom TLS profileBenefits:
- ✅ Parallel execution - All 3 profiles tested simultaneously
- ✅ No disruptive changes - Cluster pre-configured, no rollout waiting
- ✅ Faster execution - Each job ~20 min instead of 60+ min
- ✅ Better isolation - Profile-specific failures isolated
- ✅ Easier debugging - Clear separation of concerns
Note: OpenShift CI already has precedent for this - see openshift-e2e-aws-ovn-tls-13-workflow.yaml which uses a tls-13 pre-install step.
⚠️ Questions & Clarifications Needed
3. Observer Configuration Inconsistency
# e2e-aws-tls-observed-config
observers:
enable:
- observers-resource-watch
# e2e-hypershift-tls-observed-config
# NO observers sectionQuestion: Why is observers-resource-watch only enabled on the AWS job and not on HyperShift? Is this intentional or an oversight?
4. Resource Allocation
Both Prow jobs specify:
resources:
requests:
cpu: 10mQuestion: Is 10m CPU sufficient for e2e tests? This seems very low for test execution that provisions clusters and runs 90+ test cases.
📝 Recommendations Summary
Before merging, please address:
- Justify dedicated cluster infrastructure or modify to run in existing conformance jobs
- Consider pre-configured TLS profile environments instead of in-test profile switching
- Clarify observer configuration difference between AWS and HyperShift
- Verify CPU resource allocation is appropriate
Optional improvements:
- Add comments in YAML explaining why tests are
optional: trueandalways_run: false - Document relationship to OCPSTRAT-2321 and the test architecture decisions
Related References
- Upstream test implementation: openshift/origin#30801
- Existing TLS profile workflow:
openshift-e2e-aws-ovn-tls-13-workflow.yaml
dace9dc to
268539c
Compare
| steps: | ||
| cluster_profile: openshift-org-aws | ||
| env: | ||
| TEST_SUITE: openshift/tls-observed-config |
There was a problem hiding this comment.
See line 77. We needed to start using large compute nodes so that we could process the scans faster by running them in parallel.
Did you experience issues with long run times when you tried this?
| steps: | ||
| cluster_profile: hypershift-aws | ||
| env: | ||
| TEST_SUITE: openshift/tls-observed-config |
|
Is there any reason against using this over the tls-scanner CI step? The test suite looks like it would be more declarative than the tls-scanner CI step that observes the entire cluster. |
The tls-observed-config suite is complementary to the tls-scanner, not a replacement. The scanner does a broad cluster-wide audit of actual TLS versions/ciphers on every pod. The observed-config tests validate the propagation chain — that operators correctly observe the APIServer TLS profile, inject it into ConfigMaps, set deployment env vars, and actually serve the right TLS version at the wire level. It also tests dynamic config changes (switching to Modern/Custom profiles and verifying propagation + self-healing). The scanner wouldn't catch a broken propagation path if the default TLS version happens to be correct. |
|
@richardsonnick also we can add one step in your job to run these cases |
Adding a new ci job is probably best, since this would add a considerable runtime overhead cost for scanning the whole cluster + running these more targeted tests |
can you please approve this pr? so we can merge this pr |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config-hypershift |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config-hypershift |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/test periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift openshift/origin#31046 |
|
@gangwgr: The specified target(s) for The following commands are available to trigger optional jobs: Use DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/testwith periodic-ci-openshift-tls-scanner-main-periodic-tls-observed-config-hypershift openshift/origin#31046 |
|
@gangwgr, |
|
/testwith openshift/tls-scanner/main/periodic-tls-observed-config-hypershift openshift/origin#31046 |
|
@gangwgr, |
|
/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse periodic-ci-openshift-tls-scanner-release-4.22-periodic-tls-observed-config |
|
@gangwgr: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
@gangwgr: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add tls-observed-config presubmit and periodic jobs to tls-scanner for main and release-4.22. Add hypershift base images required by hypershift-aws-conformance workflow. Made-with: Cursor
955ace7 to
b696148
Compare
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
Adding ci jobs for openshift/origin#30801
Summary by CodeRabbit