Skip to content

ROSAENG-2066: Add osdctl account aws-creds command#913

Open
nephomaniac wants to merge 1 commit into
openshift:masterfrom
nephomaniac:ROSAENG-2066-aws-creds-diagnostic
Open

ROSAENG-2066: Add osdctl account aws-creds command#913
nephomaniac wants to merge 1 commit into
openshift:masterfrom
nephomaniac:ROSAENG-2066-aws-creds-diagnostic

Conversation

@nephomaniac

@nephomaniac nephomaniac commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR is a continuation of the work originally started in #601 (July 2024) to port the aws-creds-rotate.sh bash script into osdctl. Since then, #828 added multi-environment OCM/backplane support (--hive-ocm-url), and #878 extracted SimulatePrincipalPolicy, --dry-run, --cluster-id, and SyncSet-based credential syncing into pkg/controller/rotatesecret.go. This PR refactors the original #601 work to incorporate the updates from #828 and #878, adding a diagnostic-first aws-creds command group with additional robustness and UX improvements.

Note that #878 already addresses the core requirements of ROSAENG-2066 and ROSAENG-2067. Reviewers should assess whether the enhancements below justify the added scope.

References: ROSAENG-2066, ROSAENG-2067

Built on: #601 (original), #828 (multi-env OCM), #878 (SREP-726, SimulatePrincipalPolicy)

What this PR adds beyond #878

Adds osdctl account aws-creds with two subcommands:

  • snapshot — read-only diagnostic report of AWS IAM credentials, Hive secrets, CredentialRequests, and IAM permission simulation
  • rotate — credential rotation for osdManagedAdmin and/or osdCcsAdmin with pre-flight checks, interactive key management, sequential CR secret recreation with CCO health verification, and post-rotation validation

The existing rotate-secret command remains functional with a runtime warning pointing users to aws-creds.

Capability rotate-secret (post-#878) aws-creds (this PR)
Accept cluster ID Yes (--cluster-id) Yes (-C)
SimulatePrincipalPolicy preflight Yes Yes
Dry-run mode Yes Yes
Read-only diagnostic snapshot No Yes
Backplane AWS credentials (no --aws-profile) No — requires profile Yes
Interactive key deletion (max 2 keys) No — errors out Yes
CCS-only rotation (independent of managed-admin) No Yes
CR secret staleness detection (key ID comparison) No Yes
Sequential CR secret recreation with CCO verification No — bulk delete Yes
Post-rotation validation No Yes
Sensitive data redaction in logs No Yes
IO stream plumbing for testability No Yes
Context propagation for cancellation No Yes

Change Scope

Category Files Lines Added Lines Removed
New Go files 5 (aws-creds.go, aws-creds-rotate.go, aws-creds-snapshot.go, awscreds.go, awscreds_test.go) 3,373 0
Edited Go files 4 (cmd.go, rotate-secret.go, rotatesecret.go, rotatesecret_test.go) 1,215 226
New docs (md) 3 (aws-creds.md, aws-creds_rotate.md, aws-creds_snapshot.md) 180 0
Edited docs (md) 3 (README.md, osdctl_account.md, rotate-secret.md) 127 32
Total 15 files 4,895 258

Usage

# Read-only snapshot (safe, no changes)
osdctl account aws-creds snapshot -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --hive-ocm-url prod

# CR secrets only (fastest, no AWS client needed)
osdctl account aws-creds snapshot -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --hive-ocm-url prod --cr-secrets

# Dry-run rotation
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --hive-ocm-url prod --managed-admin --dry-run

# Rotate osdManagedAdmin
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --managed-admin

# Rotate CCS admin only
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --ccs-admin

# Rotate both
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --managed-admin --ccs-admin

# Refresh CR secrets only (no key rotation)
osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "OHSS-12345" --refresh-secrets

Example Output

osdctl account aws-creds snapshot
$ osdctl account aws-creds snapshot -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "ROSAENG-2066 testing" --hive-ocm-url prod

INFO Creating OCM connection
INFO Fetching cluster info from OCM    cluster=1abc2def3ghi4jkl5mno6pqr7stu8vwx
INFO Cluster resolved                  id=1abc2def3ghi4jkl5mno6pqr7stu8vwx name=my-test-cluster

==========================================================================
 Cluster: my-test-cluster (1abc2def3ghi4jkl5mno6pqr7stu8vwx)
 External ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
 CCS/BYOC: Yes
 Reason: ROSAENG-2066 testing
==========================================================================
Continue? (y/N): y
INFO Connecting to Hive cluster
INFO Account claim resolved           claim=osd-creds-mgmt-abc123
INFO Setting up AWS client
INFO Using backplane for AWS credentials (no --aws-profile specified)
INFO Connecting to managed cluster via backplane

IAM Permission Check (SimulatePrincipalPolicy)
  SimulatePrincipalPolicy for user: osdManagedAdmin-abc123

  Rotation tooling (IAM key management):
  ACTION                         RESULT
  ------------------------------ ----------
  iam:CreateAccessKey            Allow
  iam:DeleteAccessKey            Allow
  iam:GetUser                    Allow
  iam:ListAccessKeys             Allow
  ...                            (10 actions, all Allow)

  CredentialRequest actions (required by cluster operators):
  ACTION                                     RESULT     CREDENTIAL REQUEST(S)
  ------------------------------------------ ---------- ----------------------------------------
  ec2:RunInstances                           Allow      openshift-machine-api-aws
  s3:GetObject                               Allow      openshift-image-registry, velero-iam-credentials
  kms:Encrypt                                Allow      aws-ebs-csi-driver-operator, openshift-machine-api-aws
  route53:ChangeResourceRecordSets           Allow      openshift-ingress, cloud-ingress-operator-credentials-aws
  ...                                                   (90+ actions, all Allow)

  SimulatePrincipalPolicy for user: osdCcsAdmin (parent user, manages osdManagedAdmin)
  [OK] osdCcsAdmin has required rotation and CR permissions.

  Additional OSD CCS required services (SCP coverage):
  [OK] SCP allows all required OSD CCS services.

  [OK] No SCP or IAM policy restrictions detected.

==========================================================================
 AWS Account: 123456789012    Account CR: osd-creds-mgmt-abc123
==========================================================================

Credential Request Secrets (managed cluster)
  CREDENTIAL REQUEST                       NAMESPACE                    SECRET                   AGE      KEY STATUS
  ---------------------------------------- ---------------------------- ------------------------ -------- --------------
  aws-ebs-csi-driver-operator              openshift-cluster-csi-driv.. ebs-cloud-credentials    6m       CURRENT
  cloud-credential-operator-iam-ro         openshift-cloud-credential.. cloud-credential-opera.. 5m       CURRENT
  openshift-image-registry                 openshift-image-registry     installer-cloud-creden.. 5m       CURRENT
  openshift-ingress                        openshift-ingress-operator   cloud-credentials        5m       CURRENT
  openshift-machine-api-aws                openshift-machine-api        aws-cloud-credentials    5m       CURRENT
  ...                                                                                                     (9 CRs total)

AWS Credentials
  USER                     KEY ID           AGE    STATUS   REF'd by HIVE SECRET(S)                SYNC
  ------------------------ ---------------- ------ -------- -------------------------------------- ------
  osdManagedAdmin-abc123  AKIA...XXXX      25m    Active   osd-creds-mgmt-abc123-secret, aws             OK
  osdManagedAdmin-abc123  AKIA...YYYY      2h     Active   (not referenced by this cluster)      --
  osdCcsAdmin              AKIA...ZZZZ      24m    Active   byoc                                   OK
  osdCcsAdmin              AKIA...WWWW      2h     Active   (not referenced by this cluster)        --

 Summary:
 [OK] osdManagedAdmin-abc123 has adequate permissions to fulfill CRs.
 [OK] osdCcsAdmin has adequate permissions.
 [OK] SCP allows all required OSD CCS services.
 No issues found during pre-rotation checks.
==========================================================================
osdctl account aws-creds rotate --dry-run
$ osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "ROSAENG-2066 testing" \
    --hive-ocm-url prod --managed-admin --ccs-admin --dry-run

INFO Creating OCM connection
INFO Cluster resolved                  id=1abc2def3ghi4jkl5mno6pqr7stu8vwx name=my-test-cluster
Continue? (y/N): y
INFO Running pre-rotation diagnostic snapshot
  ... (snapshot output as above) ...
INFO Dry-run mode — no changes will be made

[Dry Run] Would create a new IAM access key for user: osdManagedAdmin-abc123
[Dry Run] [OK] AWS: ListAccessKeys for osdManagedAdmin-abc123
[Dry Run] Would update secret aws-account-operator/osd-creds-mgmt-abc123-secret with new credentials
[Dry Run] [OK] Hive: GET secret aws-account-operator/osd-creds-mgmt-abc123-secret
[Dry Run] [OK] Hive: auth can-i update secrets in aws-account-operator
[Dry Run] Would update secret uhc-production-abc123/aws with new credentials
[Dry Run] [OK] Hive: auth can-i update secrets in uhc-production-abc123
[Dry Run] Would create SyncSet uhc-production-abc123/aws-sync to sync credentials to cluster
[Dry Run] [OK] Hive: LIST ClusterDeployments in uhc-production-abc123
[Dry Run] [OK] Hive: auth can-i create syncsets in uhc-production-abc123
[Dry Run] [OK] Managed cluster: LIST CredentialRequests (all namespaces)
[Dry Run] Would delete secret openshift-cluster-csi-drivers/ebs-cloud-credentials (CredentialRequest aws-ebs-csi-driver-operator)
[Dry Run] [OK] Managed cluster: GET + auth can-i delete secrets
  ... (9 credential secrets verified) ...
[Dry Run] Would create a new IAM access key for user: osdCcsAdmin
[Dry Run] [OK] AWS: ListAccessKeys for osdCcsAdmin
[Dry Run] Would update secret uhc-production-abc123/byoc with new osdCcsAdmin credentials
[Dry Run] [OK] Hive: auth can-i update secrets in uhc-production-abc123

[Dry Run] OK All pre-flight checks passed. No changes were made.
INFO Credential rotation completed successfully
osdctl account aws-creds rotate --managed-admin --ccs-admin
$ osdctl account aws-creds rotate -C 1abc2def3ghi4jkl5mno6pqr7stu8vwx --reason "ROSAENG-2066 testing" \
    --hive-ocm-url prod --managed-admin --ccs-admin

INFO Creating OCM connection
INFO Cluster resolved                  id=1abc2def3ghi4jkl5mno6pqr7stu8vwx name=my-test-cluster
Continue? (y/N): y
INFO Running pre-rotation diagnostic snapshot
  ... (snapshot output) ...

Proceed with credential rotation?
Continue? (y/N): y
INFO Starting credential rotation

==========================================================================
 Phase 1: Rotating osdManagedAdmin-abc123 credentials
==========================================================================

  IAM User: osdManagedAdmin-abc123
  #    KEY ID           AGE    STATUS   REF'd by HIVE SECRET(S)                SYNC
  ---- ---------------- ------ -------- -------------------------------------- ------
  1 *  AKIA...YYYY      2h     Active   (not referenced by this cluster)       --
  2    AKIA...XXXX      25m    Active   osd-creds-mgmt-abc123-secret, aws              OK

  * Key #1 is not referenced by this cluster and is the recommended candidate for deletion.

Enter the number of the key to delete (or 'q' to quit): 1
[OK] Deleted access key AKIA...YYYY

INFO Updating AAO account secret on Hive
INFO Updating cluster namespace secret on Hive
INFO Syncing credentials to cluster via SyncSet
Sync completed...
Successfully rotated access keys for osdManagedAdmin-abc123

Deleting credential secrets one at a time, verifying CCO recreates each before continuing...

  [1/9] CredentialRequest: aws-ebs-csi-driver-operator
    Deleted secret openshift-cluster-csi-drivers/ebs-cloud-credentials
    Waiting for CCO to recreate secret (first secret — validating CCO health)...
    [OK] Secret recreated by CCO
  [2/9] CredentialRequest: cloud-credential-operator-iam-ro
    [OK] Secret recreated by CCO
  ... (9/9 all recreated successfully) ...

Deleted and verified 9 credential secret(s).

==========================================================================
 Phase 2: Rotating osdCcsAdmin credentials
==========================================================================

Updating Hive secret uhc-production-abc123/byoc with new osdCcsAdmin credentials...
[OK] Updated Hive secret uhc-production-abc123/byoc
Successfully rotated credentials for osdCcsAdmin

Post-rotation: verifying cluster health...
  [OK] Hive secret osd-creds-mgmt-abc123-secret contains the new key
  [OK] Hive secret uhc-production-abc123/aws contains the new key
  [OK] Hive secret uhc-production-abc123/byoc contains the new key
  [OK] Post-rotation checks passed.

  Suggested follow-up steps:
  - Remove old access key(s) from the AWS account
  - Verify CCO is healthy:
      oc logs -n openshift-cloud-credential-operator deploy/cloud-credential-operator --tail=50
  - Verify credential secrets are recreated:
      oc get credentialsrequests -A -o wide

INFO Credential rotation completed successfully

Test plan

  • make lint — 0 issues
  • make format — clean
  • go build — compiles
  • Unit tests — 40 controller tests (26 awscreds + 14 rotatesecret), all passing
  • 13 integration tests (expect script) covering snapshot, rotate, dry-run, CCS-only, cancellation, validation
  • Tested on staging CCS cluster with --hive-ocm-url prod

🤖 Generated with Claude Code

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 6, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 6, 2026

Copy link
Copy Markdown

@nephomaniac: This pull request references ROSAENG-2066 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

Adds osdctl account aws-creds with two subcommands:

  • snapshot — read-only diagnostic report of AWS IAM credentials, Hive secrets, CredentialRequests, and IAM permission simulation
  • rotate — credential rotation for osdManagedAdmin and/or osdCcsAdmin with pre-flight checks, interactive key management, sequential CR secret recreation with CCO health verification, and post-rotation validation

The existing rotate-secret command remains functional with a runtime warning pointing users to aws-creds.

Resolves: ROSAENG-2066, ROSAENG-2067

Extends prior work from: SREP-726 (SimulatePrincipalPolicy), ROSAENG-2119 (multi-env OCM)

Change Scope

Category Files Lines Added Lines Removed
New Go files 5 (aws-creds.go, aws-creds-rotate.go, aws-creds-snapshot.go, awscreds.go, awscreds_test.go) 3,296 0
Edited Go files 4 (cmd.go, rotate-secret.go, rotatesecret.go, rotatesecret_test.go) 1,024 210
New docs (md) 3 (aws-creds.md, aws-creds_rotate.md, aws-creds_snapshot.md) 180 0
Edited docs (md) 3 (README.md, osdctl_account.md, rotate-secret.md) 132 27
Total 15 files 4,791 253

Key Features

  • Cluster-ID-first UX: accepts -C <cluster-id> instead of requiring account CR name
  • Backplane credentials by default: no --aws-profile needed, uses OCM/backplane for AWS access
  • Diagnostic-first: default snapshot mode shows full read-only report before any mutation
  • Two-part IAM permission simulation: checks both rotation tooling actions AND all CredentialRequest actions via SimulatePrincipalPolicy, with per-CR attribution
  • CCS/BYOC support: separate osdCcsAdmin checks including SCP coverage validation against OSD CCS minimum requirements
  • Independent CCS-only path: --ccs-admin without --managed-admin skips all managed-admin artifacts and permissions
  • Interactive key management: when max keys (2) reached, shows key table with Hive secret references and recommends which to delete
  • Sequential CR secret recreation: deletes one secret at a time, verifies CCO recreates (UID-based) before proceeding, with timeout/pause/manual-completion flow
  • Pre-flight and post-rotation validation: verifies client connections, secret existence, RBAC (auth can-i), and post-rotation key/secret sync
  • State tracking on failure: reports exactly which steps completed before a failure, with re-run safety guidance
  • --refresh-secrets: delete/recreate CR secrets without rotating keys
  • --cr-secrets on snapshot: fast CR-only view using Hive secret timestamp for staleness (no AWS client needed)
  • Reason validation: extracts Jira tickets and PD incidents from --reason, cross-references with cluster
  • Colored output: green/yellow/red/blue status indicators, structured logging to stderr via logrus
  • --hive-ocm-url: multi-environment OCM support for staging/integration testing

Usage

# Read-only credential status report
osdctl account aws-creds snapshot -C $CLUSTER_ID --reason "$TICKET"

# Quick CR secret status (no AWS client needed)
osdctl account aws-creds snapshot -C $CLUSTER_ID --reason "$TICKET" --cr-secrets

# Rotate osdManagedAdmin credentials
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --managed-admin

# Rotate both users
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --managed-admin --ccs-admin

# Rotate CCS admin only (no managed-admin dependency)
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --ccs-admin

# Refresh CR secrets only (no key rotation)
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --refresh-secrets

# Dry-run
osdctl account aws-creds rotate -C $CLUSTER_ID --reason "$TICKET" --managed-admin --dry-run

Test plan

  • make lint — 0 issues
  • make format — clean
  • go build — compiles
  • Unit tests — 40 controller tests (26 awscreds + 14 rotatesecret), all passing
  • snapshot --cr-secrets — lightweight path, no AWS client
  • snapshot — full diagnostic report
  • rotate --managed-admin --dry-run — pre-flight checks only
  • rotate --ccs-admin --dry-run — CCS-only preflight (no managed-admin checks)
  • rotate --managed-admin --ccs-admin --dry-run — both preflight checks
  • rotate --refresh-secrets — CR secret deletion/recreation only
  • rotate --managed-admin — single user rotation
  • rotate --ccs-admin — CCS-only rotation (no managed-admin dependency)
  • rotate --managed-admin --ccs-admin — both users with Phase 1/2 banners
  • User cancellation at rotation and refresh prompts — graceful exit
  • Validation: rotate with no flags → error
  • Validation: --refresh-secrets --managed-admin → error
  • Tested on staging CCS cluster with --hive-ocm-url prod

Follow-up items

  • Update credential rotation SOPs to reference aws-creds instead of rotate-secret
  • Investigate SimulatePrincipalPolicy target principal (currently simulates against osdManagedAdmin, rotation tooling runs as assumed role)
  • Add CCO log health check to post-rotation validation

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Walkthrough

Adds an aws-creds CLI with snapshot and rotate subcommands, a diagnostic/reporting library, rotation controller refactors for interactive key creation/retry and credential-secret deletion polling, extensive tests, and documentation updates.

Changes

AWS Credential Diagnostics and Rotation

Layer / File(s) Summary
CLI core and options
cmd/account/aws-creds.go
Top-level aws-creds command, cluster/account resolution, reason validation, Hive/AWS client construction, and mapping to controller.AWSCredsInput.
Rotate command & orchestration
cmd/account/aws-creds-rotate.go
Implements aws-creds rotate: flags validation, pre-rotation diagnostic snapshot, strict-YES gating on permission failures, dry-run/confirm flows, and delegating to controller.RotateSecret or credential-secret refresh logic.
Snapshot command
cmd/account/aws-creds-snapshot.go
Implements aws-creds snapshot: runs DiagnoseCredentials or DiagnoseCRSecrets and renders reports or CredentialRequest-only tables.
Command registration & deprecation
cmd/account/cmd.go, cmd/account/rotate-secret.go
Registers aws-creds subcommand and updates rotate-secret help to reference aws-creds (plus stderr logging tweak).
Documentation
docs/*
Adds aws-creds docs and examples, updates README and rotate-secret docs with SEE ALSO references and usage examples.
Diagnostic library
pkg/controller/awscreds.go
Adds DiagnosticReport model and functions: Hive secret checks, IAM access-key diagnostics, CredentialRequest secret diagnostics, SimulatePrincipalPolicy permission simulation, and human-readable report rendering.
Diagnostic tests
pkg/controller/awscreds_test.go
Tests covering healthy/failing diagnostics, CR-secret staleness, permission simulation behavior, and report rendering.
Rotation controller updates
pkg/controller/rotatesecret.go
Refactors RotateSecret API (diagnostic input, flags), adds createAccessKeyWithRetry (fallbacks/limit handling), preflight/post-rotation checks, interactive selection helpers, and rewritten credential-secret deletion with dry-run RBAC checks and polling.
Rotation controller tests
pkg/controller/rotatesecret_test.go
Updated and new tests: caller-identity mocking, relaxed AWS mock expectations, CCS-only flows, credential-secret deletion/polling timeouts, and post-rotation validation cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • clcollins

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error Untruncated AWS Access Key ID logged in error message at pkg/controller/rotatesecret.go:446, while other uses of the same variable are properly truncated with truncateKeyID(). Wrap keyToDelete.AccessKeyID in truncateKeyID() call on line 446, or avoid logging the key ID entirely in the error message.
Docstring Coverage ⚠️ Warning Docstring coverage is 43.48% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (13 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ROSAENG-2066: Add osdctl account aws-creds command' clearly and accurately summarizes the main change: introducing a new aws-creds command group with subcommands for AWS credential diagnostics and rotation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains no Ginkgo tests. All 40 test functions use stable, descriptive static names with standard Go testing framework (testing.T).
Test Structure And Quality ✅ Passed Custom check targets Ginkgo tests specifically, but PR adds standard Go tests (testing.T) to pkg/controller. Check is not applicable; tests follow codebase patterns and quality standards.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. All new test code uses standard Go testing package (Test* functions); the check applies only when new Ginkgo e2e tests are added.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR adds only standard Go unit tests using the testing package, not Ginkgo e2e tests. No SNO compatibility checks are needed.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds CLI commands and controller business logic only; no deployment manifests, pod specs, or scheduling constraints introduced. Check is not applicable.
Ote Binary Stdout Contract ✅ Passed osdctl is not an OTE extension binary; this custom check applies only to OpenShift Tests Extension binaries that communicate via JSON stdout with openshift-tests framework.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e tests were added in this PR. All tests are standard Go unit tests using mocked clients with no IPv4 assumptions or external connectivity requirements.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms, custom implementations, or non-constant-time secret comparisons found. AccessKeyId comparisons involve public identifiers only, never secret keys.
Container-Privileges ✅ Passed PR contains only Go source code, tests, and documentation—no container/Kubernetes manifests to check for privilege escalation.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from dustman9000 and sam-nguyen7 June 6, 2026 02:11
@openshift-ci

openshift-ci Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nephomaniac
Once this PR has been reviewed and has the lgtm label, please assign petrkotas for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@nephomaniac

Copy link
Copy Markdown
Contributor Author

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2026
@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/account/aws-creds-rotate.go`:
- Around line 60-63: The code currently discards Cobra's cancellation context by
creating context.TODO() in callers; update the signatures of runRotate and
runRefreshSecrets to accept a context.Context parameter (e.g., func
runRotate(ctx context.Context, ops *... ) error), replace any use of
context.TODO() with the passed ctx (cmd.Context()) at each call site, and thread
that ctx through into DeleteCredentialSecrets and any AWS/Kubernetes or polling
functions so they all honor cancellation/timeouts; ensure callers in the Cobra
Run function call runRotate(cmd.Context(), ops) and propagate ctx through
intermediate helpers.

In `@cmd/account/aws-creds-snapshot.go`:
- Around line 45-48: The snapshot flow currently uses context.TODO() and needs a
cancelable timed context: obtain cmd.Context(), wrap it with context.WithTimeout
(choose a reasonable timeout such as 30s) and defer cancel; pass that ctx into
runSnapshot instead of using TODO, and propagate it into all backend calls by
updating signatures and call sites for runSnapshot, identifyCluster, any
resolve* functions and Diagnose* functions so they accept context.Context and
use it for OCM/Hive/managed-cluster calls; ensure every function that performs
network/backend operations takes and forwards the ctx to enable cancellation and
timeouts.

In `@cmd/account/aws-creds.go`:
- Around line 661-668: After fetching the Account CR with GetAWSAccount (where
rc.account and rc.accountID are set), add the same ManualSTSMode guard used
earlier (the check around ManualSTSMode/DiganoseCRSecrets path) to reject
STS-mode accounts for the --cr-secrets flow: inspect account.Spec.Mode (the
ManualSTSMode enum/constant) and return an error if it equals the ManualSTSMode
value so that DiagnoseCRSecrets is not run for STS clusters and the misleading
missing-claim-secret warning/report is avoided.

In `@pkg/controller/awscreds.go`:
- Around line 327-331: The remediation command in the appended Finding is
outdated; update the Guidance string in the report.Findings append so it uses
the new aws-creds rotate subcommand. Replace the current guidance
fmt.Sprintf("... aws-creds %s --reason <ticket> --rotate-managed-admin",
report.ClusterID) with the rotate form, e.g. fmt.Sprintf("Credentials are out of
sync. Rotate to fix:\n       osdctl account aws-creds rotate --name %s --reason
<ticket> --rotate-managed-admin", report.ClusterID), referencing the same
report.ClusterID in the fmt.Sprintf call used in the report.Findings append.
- Around line 983-1021: The current success message uses optimistic defaults
(rotationPermsOK and credReqPermsOK initialized true) and can report [OK] when
no permissions were evaluated; update the success gating to require actual
evaluation: either (A) use report.AllPermissionsOK instead of rotationPermsOK &&
credReqPermsOK, or (B) add explicit evaluation flags (e.g., rotationEvaluated
and credReqEvaluated set to true inside the switch when encountering "rotation"
and "credreq") and change the success condition to rotationPermsOK &&
credReqPermsOK && rotationEvaluated && credReqEvaluated; also ensure the failure
branches still report missing categories when evaluated false.
- Around line 372-420: The code currently uses Secret.CreationTimestamp to mark
a CredentialRequest secret stale; instead, find the active managed admin key ID
from report.Keys (the same place you already derive the Hive key) and compare
that key ID to the CR secret's aws_access_key_id value. Replace the age-based
logic around managedAdminKeyAge and the block that sets crs.Age/NeedsRecreation
with: extract the active key ID into a variable (e.g. managedAdminKeyID) from
report.Keys (the entry with HiveMatch && UserName != "osdCcsAdmin"), read
secret.Data["aws_access_key_id"] (convert to string), and set
crs.NeedsRecreation = true when they differ (set Exists/Err fields unchanged);
apply the same replacement for the other occurrence mentioned (lines ~728-779)
so both paths compare access key IDs instead of creation times.

In `@pkg/controller/rotatesecret.go`:
- Around line 88-96: The controller struct exposes Out but lacks an In reader,
so createAccessKeyWithRetry() uses fmt.Scanln (process stdin) and ignores
injected IOStreams.In; add an In io.Reader field to the controller (alongside
Report, Log, Out) and update createAccessKeyWithRetry() to read prompts from
that In reader (e.g., using fmt.Fscanln or bufio.Reader against the
controller.In) instead of calling fmt.Scanln so automated tests and injected CLI
IOStreams work correctly.
- Around line 131-135: The managed-admin rotation path is re-running
VerifyRotationPermissions via resolveAdminUsername and blocks the operator's
strict-YES override; fix by threading the precomputed permission simulation
result (or an explicit "operator confirmed despite denied simulation" boolean)
into RotateSecret and resolveAdminUsername so you don't re-run the simulation.
Update the RotateSecret signature to accept the precomputed report/confirmation
flag and change callers (including where adminUsername is resolved) to pass that
value, then alter resolveAdminUsername (or add a wrapper) to skip calling
VerifyRotationPermissions when the confirmation flag/report indicates the
operator already overrode the denial. Ensure references to resolveAdminUsername,
RotateSecret, adminUsername, input.AwsClient and accountIDSuffixLabel are used
to locate and update the logic.
- Around line 235-243: The deferred cleanup currently ignores the return value
of input.AwsClient.DeleteAccessKey (inside the deferred closure that checks
rotationCommitted and uses adminUsername/newKeyID), which can leave an orphaned
key without surfacing the cleanup failure; update that defer to capture the
DeleteAccessKey error, log it at error level with context (including
adminUsername and newKeyID) and propagate it so the caller/operator is notified
— e.g., record the cleanup error into the function's named error return (or
wrap/append it to the primary error) rather than discarding it, ensuring
DeleteAccessKey's error is never ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a88be861-94fb-4999-b132-7907e53cd407

📥 Commits

Reviewing files that changed from the base of the PR and between 486780d and 3ec12df.

📒 Files selected for processing (15)
  • cmd/account/aws-creds-rotate.go
  • cmd/account/aws-creds-snapshot.go
  • cmd/account/aws-creds.go
  • cmd/account/cmd.go
  • cmd/account/rotate-secret.go
  • docs/README.md
  • docs/osdctl_account.md
  • docs/osdctl_account_aws-creds.md
  • docs/osdctl_account_aws-creds_rotate.md
  • docs/osdctl_account_aws-creds_snapshot.md
  • docs/osdctl_account_rotate-secret.md
  • pkg/controller/awscreds.go
  • pkg/controller/awscreds_test.go
  • pkg/controller/rotatesecret.go
  • pkg/controller/rotatesecret_test.go

Comment thread cmd/account/aws-creds-rotate.go
Comment thread cmd/account/aws-creds-snapshot.go
Comment thread cmd/account/aws-creds.go
Comment thread pkg/controller/awscreds.go
Comment thread pkg/controller/awscreds.go Outdated
Comment thread pkg/controller/awscreds.go
Comment thread pkg/controller/rotatesecret.go
Comment thread pkg/controller/rotatesecret.go Outdated
Comment thread pkg/controller/rotatesecret.go Outdated
@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from 3ec12df to 0d45d32 Compare June 6, 2026 03:05
@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai resolve

1 similar comment
@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai resolve

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

1 similar comment
@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from 0d45d32 to 809f194 Compare June 6, 2026 03:39

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controller/rotatesecret.go`:
- Around line 792-796: The confirmation prompt currently calls
utils.ConfirmPrompt() which reads from os.Stdin and ignores the injected
RotateSecretInput.In; change ConfirmPrompt (or add a new helper like
ConfirmPromptWithIO) to accept an io.Reader (and optionally an io.Writer) and
read the response from that reader (use bufio.Reader, TrimSpace, check first
rune for 'y'/'Y' and treat other input/EOF as false). Then update the call site
in RotateSecret (the fmt.Fprintf prompt and the ConfirmPrompt call) to pass
RotateSecretInput.In (and RotateSecretInput.Out) so tests/automation can inject
input; ensure the function returns false on EOF or non-affirmative input.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ee68c51a-fcba-4cfd-b1b3-ef00378cd2e6

📥 Commits

Reviewing files that changed from the base of the PR and between 3ec12df and 809f194.

📒 Files selected for processing (15)
  • cmd/account/aws-creds-rotate.go
  • cmd/account/aws-creds-snapshot.go
  • cmd/account/aws-creds.go
  • cmd/account/cmd.go
  • cmd/account/rotate-secret.go
  • docs/README.md
  • docs/osdctl_account.md
  • docs/osdctl_account_aws-creds.md
  • docs/osdctl_account_aws-creds_rotate.md
  • docs/osdctl_account_aws-creds_snapshot.md
  • docs/osdctl_account_rotate-secret.md
  • pkg/controller/awscreds.go
  • pkg/controller/awscreds_test.go
  • pkg/controller/rotatesecret.go
  • pkg/controller/rotatesecret_test.go
✅ Files skipped from review due to trivial changes (2)
  • docs/osdctl_account.md
  • docs/osdctl_account_rotate-secret.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • cmd/account/rotate-secret.go
  • cmd/account/aws-creds-snapshot.go
  • pkg/controller/awscreds_test.go
  • cmd/account/aws-creds-rotate.go
  • cmd/account/aws-creds.go
  • pkg/controller/awscreds.go

Comment thread pkg/controller/rotatesecret.go Outdated
@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from 809f194 to c899f9c Compare June 6, 2026 04:00
@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai resolve

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai resolve

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Comments resolved. Approval is disabled; enable reviews.request_changes_workflow to allow explicit top-level @coderabbitai resolve or @coderabbitai approve commands.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@nephomaniac nephomaniac force-pushed the ROSAENG-2066-aws-creds-diagnostic branch from c899f9c to fe26067 Compare June 6, 2026 04:26
@openshift-ci

openshift-ci Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

@nephomaniac: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@nephomaniac

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 6, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@nephomaniac

Copy link
Copy Markdown
Contributor Author

/hold cancel

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2026

@bergmannf bergmannf left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good direction - I think the main issue for me that we don't want to keep an outdated command around and just remove it completely.

Also seems there is some duplication between the two controllers which should be removed.

without rotating AWS keys or modifying Hive secrets. This is useful when
CCO needs to re-provision secrets with existing credentials.

AWS credentials are obtained via backplane by default. Use --aws-profile

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AWS credentials are obtained via backplane by default. Use --aws-profile
AWS credentials are obtained via backplane by default. Use --aws-profile

I think the local profiles aren't really used anymore with rh-aws-saml-login.

So the command also has to work without profile and without backplane (which would be the call after logging in via rh-aws-saml-login).

Comment on lines +205 to +208
if !utils.ConfirmPrompt() {
o.log.Info("Refresh cancelled by user")
return nil
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if !utils.ConfirmPrompt() {
o.log.Info("Refresh cancelled by user")
return nil
}
if !utils.ConfirmPrompt() {
o.log.Info("Refresh cancelled by user")
return nil
}

Should we really give this option at all instead of stopping immediately?

Any issue will likely result in one or multiple operators on the cluster breaking after the rotation - maybe we should instead call out opening a proactive case / responding in-case with the problems first?

Comment thread cmd/account/aws-creds.go
}

var (
jiraTicketPattern = regexp.MustCompile(`[A-Z]+-\d+`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jiraTicketPattern = regexp.MustCompile(`[A-Z]+-\d+`)
jiraTicketPattern = regexp.MustCompile(`[A-Z]+-\d+`)

Not a big issue, but this might report some false positives by accident.

"Rotating cluster D-1" would match this one but not be a ticket - I think we could just match for OHSS-*.

Comment on lines +36 to +41
NOTE: Consider using 'osdctl account aws-creds' instead. The newer command:
- Accepts a cluster ID (no need to look up the account CR name)
- Provides a diagnostic snapshot before rotation
- Handles account claim resolution automatically
- Supports interactive key management and CR secret health verification
- Validates IAM permissions via SimulatePrincipalPolicy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NOTE: Consider using 'osdctl account aws-creds' instead. The newer command:
- Accepts a cluster ID (no need to look up the account CR name)
- Provides a diagnostic snapshot before rotation
- Handles account claim resolution automatically
- Supports interactive key management and CR secret health verification
- Validates IAM permissions via SimulatePrincipalPolicy
NOTE: Consider using 'osdctl account aws-creds' instead. The newer command:
- Accepts a cluster ID (no need to look up the account CR name)
- Provides a diagnostic snapshot before rotation
- Handles account claim resolution automatically
- Supports interactive key management and CR secret health verification
- Validates IAM permissions via SimulatePrincipalPolicy

The new command is a complete superset - I think it would be better to just remove this one and ensure that only the new command is referenced in any SOPs that might use it.

Keeping these old commands tends to be something that is forgotten to be removed and there seems to be no reason to keep a worse version around.

@@ -0,0 +1,1171 @@
package controller

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
package controller
package controller

Thanks for sticking with the controller having (most) logic - it's something we try to move osdctl to, so we don't have all logic in the cobra commands.


// isAWSCredReqForDiag is a broader filter for diagnostics — includes all
// CredentialRequests with AWSProviderSpec regardless of name prefix.
func isAWSCredReqForDiag(cr *ccov1.CredentialsRequest) bool {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func isAWSCredReqForDiag(cr *ccov1.CredentialsRequest) bool {
func isAWSCredReqForDiag(cr *ccov1.CredentialsRequest) bool {

I think the function name can be 'isAWSCredentialRequestForDiagnostic' - the current one sounds a bit C-like.

Comment on lines +499 to +510
rotationActions := []string{
"iam:CreateAccessKey",
"iam:CreateUser",
"iam:DeleteAccessKey",
"iam:DeleteUser",
"iam:DeleteUserPolicy",
"iam:GetUser",
"iam:GetUserPolicy",
"iam:ListAccessKeys",
"iam:PutUserPolicy",
"iam:TagUser",
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rotationActions := []string{
"iam:CreateAccessKey",
"iam:CreateUser",
"iam:DeleteAccessKey",
"iam:DeleteUser",
"iam:DeleteUserPolicy",
"iam:GetUser",
"iam:GetUserPolicy",
"iam:ListAccessKeys",
"iam:PutUserPolicy",
"iam:TagUser",
}
rotationActions := []string{
"iam:CreateAccessKey",
"iam:CreateUser",
"iam:DeleteAccessKey",
"iam:DeleteUser",
"iam:DeleteUserPolicy",
"iam:GetUser",
"iam:GetUserPolicy",
"iam:ListAccessKeys",
"iam:PutUserPolicy",
"iam:TagUser",
}

Duplicating this list between this controller and the older rotate-secret one.

Let's extract this to a shared const so we can't accidentally forgot to update one occurrence if ever needed.

}

// simulateActions runs SimulatePrincipalPolicy for a set of IAM actions and records allow/deny results in the report.
func simulateActions(awsClient awsprovider.Client, policySourceArn string, actions []string, category string, actionToCRs map[string][]string, report *DiagnosticReport) error {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func simulateActions(awsClient awsprovider.Client, policySourceArn string, actions []string, category string, actionToCRs map[string][]string, report *DiagnosticReport) error {
func simulateActions(awsClient awsprovider.Client, policySourceArn string, actions []string, category string, actionToCRs map[string][]string, report *DiagnosticReport) error {

This one should probably replace the function in the other controller (or vice versa):

func VerifyRotationPermissions(out io.Writer, awsClient awsprovider.Client, accountID string, username string) error {

No need to have two functions doing the same thing really.

fmt.Fprintf(out, " CR secret staleness detection will be skipped.\n\n")
} else {
if !accountSecret.CreationTimestamp.IsZero() {
lastRotationAge = time.Since(accountSecret.CreationTimestamp.Time)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lastRotationAge = time.Since(accountSecret.CreationTimestamp.Time)
lastRotationAge = time.Since(accountSecret.CreationTimestamp.Time)

The secrets are/were update with updateSecret which would not update the CreationTimestamp so I think this will not work as intended.

An update operation will not update creationTimestamp.


// renderKeySelectionTable prints an interactive numbered table of IAM keys with
// Hive secret references, highlighting the recommended deletion candidate.
func renderKeySelectionTable(keys []KeyStatus, report *DiagnosticReport, out io.Writer) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a useful piece of information to decide which key to remove is generally seeing when it was last used ( https://docs.aws.amazon.com/cli/latest/reference/iam/get-access-key-last-used.html)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants