Skip to content

SREP-715: Using the new --dump-guest-cluster-through-kube-service option of the hypershift CLI when dumping dataplane content for must-gathers#883

Open
Nikokolas3270 wants to merge 1 commit intoopenshift:masterfrom
Nikokolas3270:SREP-715

Conversation

@Nikokolas3270
Copy link
Copy Markdown
Contributor

@Nikokolas3270 Nikokolas3270 commented Apr 30, 2026

Related changes:

What this PR does / why we need it:

Port forwarding is not possible as debug handlers are disabled on MC clusters. As a result data plane content (including worker nodes logs) is currently failing to be dumped.

The new --dump-guest-cluster-through-kube-service can be used to by-pass that limitation by targeting the kube-apiserver service exposed by HCP namespaces.
Remark that it is only suitable to use that option when:

Within a MC cluster (i.e. in a pod which has access to the service)
The MC cluster has the debug handlers disabled

Which issue(s) this PR fixes
Contributes to SREP-715

Disclaimer

Remark that using the quay.io/stolostron/must-gather:latest image is only a temporary solution. Long term fix is to use the registry.redhat.io/rhacm2/acm-must-gather-rhel9:v2.17 image which will at least hard code the code branch to use, sadly this "release" image does not exist yet and the proposed image is the only one with an acceptable lifetime.

Long term fix is tracked there:
https://redhat.atlassian.net/browse/SREP-4777

Summary by CodeRabbit

Summary by CodeRabbit

  • Chores
    • Updated the container image used for must-gather diagnostics to use the latest image by default. The hosted/HyperShift must-gather now follows the configured must-gather image, ensuring consistent diagnostic image selection across environments.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 30, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 30, 2026

@Nikokolas3270: This pull request references SREP-715 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Related changes:

What this PR does / why we need it:

Port forwarding is not possible as debug handlers are disabled on MC clusters. As a result data plane content (including worker nodes logs) is currently failing to be dumped.

The new --dump-guest-cluster-through-kube-service can be used to by-pass that limitation by targeting the kube-apiserver service exposed by HCP namespaces.
Remark that it is only suitable to use that option when:

Within a MC cluster (i.e. in a pod which has access to the service)
The MC cluster has the debug handlers disabled

Which issue(s) this PR fixes
Contributes to SREP-715

Remark that using the quay.io/stolostron/must-gather:latest image is only a temporary solution. Long term fix is to use the registry.redhat.io/rhacm2/acm-must-gather-rhel9:v2.17 image which will at least hard code the code branch to use, sadly this "release" image does not exist yet and the proposed image is the only one with an acceptable lifetime.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 91332a5c-f62e-4f89-91f6-2dc65d43f801

📥 Commits

Reviewing files that changed from the base of the PR and between 8d3cdc3 and 9b6734e.

📒 Files selected for processing (1)
  • cmd/hcp/mustgather/mustGather.go

Walkthrough

Change the default --acm_image from a pinned must-gather tag to quay.io/stolostron/must-gather:latest, and make the HCP gather path pass mg.acmMustGatherImage to createMustGather so the --acm_image flag controls the image used for HCP must-gather.

Changes

Must-Gather Image Update

Layer / File(s) Summary
Default value
cmd/hcp/mustgather/mustGather.go
defaultAcmImage changed from a pinned snapshot tag to quay.io/stolostron/must-gather:latest.
HCP invocation wiring
cmd/hcp/mustgather/mustGather.go
In the hcp case, createMustGather now receives --image= set to mg.acmMustGatherImage instead of a hardcoded acmHyperShiftImage, so the --acm_image flag affects HCP must-gather image selection.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions the --dump-guest-cluster-through-kube-service option and hypershift CLI changes, but the actual code changes update the --acm_image default value and how it controls the HCP must-gather image, which is not the primary focus of the title. Update the title to reflect the actual changes: updating the default --acm_image value and ensuring the flag controls the HCP must-gather image, or provide clarification on how the title relates to these specific code modifications.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed No Ginkgo test definitions found in modified files; standard Go testing package used instead.
Test Structure And Quality ✅ Passed The custom check assesses Ginkgo test code quality, but this PR uses standard Go testing with testify assertions, not Ginkgo.
Microshift Test Compatibility ✅ Passed PR does not add new Ginkgo e2e tests; changes are limited to command-line tool argument handling in mustGather.go.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR adds only standard Go unit tests with testify assertions; no Ginkgo e2e tests found that could have cluster topology assumptions.
Topology-Aware Scheduling Compatibility ✅ Passed This pull request modifies a CLI diagnostic tool, not deployment manifests or controllers. Changes only update a default container image URL and pass it as a flag to external commands. No Kubernetes workload scheduling logic is introduced.
Ote Binary Stdout Contract ✅ Passed The OTE Binary Stdout Contract check does not apply to osdctl, a standalone CLI tool not part of openshift-tests or OTE infrastructure.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not add any new Ginkgo e2e tests. The modified test file contains only unit tests using Go's standard testing package, not Ginkgo e2e tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 30, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Nikokolas3270

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 30, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/hcp/mustgather/mustGather.go`:
- Around line 198-200: The HCP must-gather call hardcodes acmHyperShiftImage
("quay.io/stolostron/must-gather:latest") instead of using the configurable
image from the flag; change the image passed to createMustGather to use the
existing configuration value mg.acmMustGatherImage (the same value used earlier
for other gather targets) so the --acm_image flag is honored; update the
acmHyperShiftImage assignment or inline the value when building the args for
createMustGather (which is called with mcRestCfg, mcK8sCli,
[]string{"--dest-dir="+destDir, "--image="+<use mg.acmMustGatherImage>,
gatherScript}) to reference mg.acmMustGatherImage and leave gatherScript,
hcNamespace, hcName, and destDir unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 79676da4-8332-45c8-bfe0-0af1e1f796b0

📥 Commits

Reviewing files that changed from the base of the PR and between a875df0 and 8d3cdc3.

📒 Files selected for processing (1)
  • cmd/hcp/mustgather/mustGather.go

Comment thread cmd/hcp/mustgather/mustGather.go Outdated
Comment thread cmd/hcp/mustgather/mustGather.go Outdated

// TODO(ACM-16170): replace this with an official ACM release image once it's available
acmHyperShiftImage := "quay.io/rokejungrh/must-gather:v2.13.0-33-linux"
acmHyperShiftImage := "quay.io/stolostron/must-gather:latest"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rolandmkunkel: to be checked.

Here is the script run through the image:
https://github.com/stolostron/must-gather/blob/main/collection-scripts/gather_spoke_logs

At some point this script will call this function:
https://github.com/stolostron/must-gather/blob/main/collection-scripts/gather_utils#L111

To be checked if CAD does the same or at least something similar.
I guess it would be nice to see if we could converge on what to do for must-gathers for CAD and osdctl. Of course this exceeds the scope of SREP-715 so I will create a new ticket to study that further.

…ion of the hypershift CLI when dumping dataplane content for must-gathers
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 4, 2026

@Nikokolas3270: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/verify-docs 9b6734e link true /test verify-docs

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants