Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-16684: Set cr.status.provisioned=false on syncErr path #583

Merged
merged 5 commits into from Aug 10, 2023

Conversation

bentito
Copy link
Contributor

@bentito bentito commented Jul 26, 2023

I think this cures the bug as long as I tracked from the correct part of the code that was setting cr.status.provisioned to true despite a failure to actually provision the Secret.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jul 26, 2023
@openshift-ci-robot
Copy link
Contributor

@bentito: This pull request references Jira Issue OCPBUGS-16684, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @fxierh

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

I think this cures the bug as long as I tracked from the correct part of the code that was setting cr.status.provisioned to true despite a failure to actually provision the Secret.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from fxierh July 26, 2023 18:10
@bentito
Copy link
Contributor Author

bentito commented Jul 26, 2023

@fxierh I think I could have fixed it with this change, but I'm not totally sure. If you can run this PR, set conditions again, and we'll see if it is fixed. If it's not fixed, I'll see how I can debug on this PR, perhaps with the STS workflow with clusterbot.

@codecov
Copy link

codecov bot commented Jul 26, 2023

Codecov Report

Merging #583 (eb6ef9b) into master (b4d1c3d) will increase coverage by 0.23%.
Report is 19 commits behind head on master.
The diff coverage is 0.00%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #583      +/-   ##
==========================================
+ Coverage   48.15%   48.38%   +0.23%     
==========================================
  Files          94       94              
  Lines       11543    11641      +98     
==========================================
+ Hits         5558     5633      +75     
- Misses       5377     5398      +21     
- Partials      608      610       +2     
Files Changed Coverage Δ
pkg/aws/actuator/actuator.go 63.90% <0.00%> (-0.16%) ⬇️
...redentialsrequest/credentialsrequest_controller.go 53.25% <0.00%> (+4.32%) ⬆️

... and 5 files with indirect coverage changes

@@ -688,20 +688,17 @@ func (r *ReconcileCredentialsRequest) Reconcile(ctx context.Context, request rec

var syncErr error
syncErr = r.CreateOrUpdateOnCredsExist(ctx, credsExists, syncErr, cr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When CR.spec.providerSpec.stsIAMRoleARN is unset, syncErr is still nil

if awsSTSIAMRoleARN == "" {
logger.Debug("CredentialsRequest has no awsSTSIAMRoleARN, no reason to sync")
return nil
}

So we fall into the else branch below where we call

In other words CR.status.provisioned is still set to true when CR.spec.providerSpec.stsIAMRoleARN is unset and there is no Secret provisioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above aligns with what I observed today in a test (on this PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think my next commit fixes it then.

@fxierh
Copy link
Contributor

fxierh commented Jul 27, 2023

For debugging IMO you can try to install an STS cluster, scale down CVO & CCO and then run the controllers locally with the latest commit.

@fxierh
Copy link
Contributor

fxierh commented Jul 27, 2023

Btw just wanted to check if you're planning to address the comment I left under https://issues.redhat.com/browse/OCPBUGS-16684 in this PR (or I can open another card for it if you want them to be separated) ?

@bentito
Copy link
Contributor Author

bentito commented Jul 27, 2023

For debugging IMO you can try to install an STS cluster, scale down CVO & CCO and then run the controllers locally with the latest commit.

Yeah, that's basically what's happening with clusterbot, but with automation, but it's way slow for a quick test to see if I got something wrong or right.

@bentito
Copy link
Contributor Author

bentito commented Jul 27, 2023

Btw just wanted to check if you're planning to address the comment I left under https://issues.redhat.com/browse/OCPBUGS-16684 in this PR (or I can open another card for it if you want them to be separated) ?

If you could make a new card about the lastSyncGeneration problems that would be good. Thanks.

Comment on lines 342 to -344
if awsSTSIAMRoleARN == "" {
logger.Debug("CredentialsRequest has no awsSTSIAMRoleARN, no reason to sync")
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested your latest commit in action yet but it appears to me that:
if the stsIAMRoleARN field is unset, this awsSTSIAMRoleARN string would be empty and err would be nil. So returning err would be equivalent to returning nil.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, new commit, didn't think that through very well. Thanks!

@fxierh
Copy link
Contributor

fxierh commented Jul 28, 2023

@bentito With the current implementation the .status field of CRs without .stsIAMRoleARN (e.g. the ones extracted via oc adm release extract $IMG --cloud=aws --credentials-requests) looks like the following:

status:
  conditions:
  - lastProbeTime: "2023-07-28T16:08:52Z"
    lastTransitionTime: "2023-07-28T16:08:52Z"
    message: 'failed to grant creds: an empty roleARN was found so no Secret was created'
    reason: CredentialsProvisionFailure
    status: "True"
    type: CredentialsProvisionFailure
  lastSyncGeneration: 0
  lastSyncTimestamp: "2023-07-28T16:16:09Z"
  provisioned: false

Just wanted to make sure this is expected behavior.

@fxierh
Copy link
Contributor

fxierh commented Jul 28, 2023

On the other hand, the aforementioned failing condition (CredentialsProvisionFailure) is not reflected at the CO level (i.e. we won't see "%d of %d credentials requests are failing to sync" when calling oc get co cloud-credential -o yaml:

if operatorIsDisabled {
return conditions

This makes me wonder if we should display Secret provision failures on CO/cloud-credential in STS mode just like in mint.
cc. @abutcher

@fxierh
Copy link
Contributor

fxierh commented Jul 28, 2023

If you could make a new card about the lastSyncGeneration problems that would be good. Thanks.

Done.

@bentito
Copy link
Contributor Author

bentito commented Aug 1, 2023

/test e2e-aws-ovn

@bentito
Copy link
Contributor Author

bentito commented Aug 1, 2023

Just wanted to make sure this is expected behavior.
@fxierh Yes, I think this is as desired...
And if so, @abutcher , I think this can get an LGTM?

Co-authored-by: Andrew Butcher <abutcher@redhat.com>
@abutcher
Copy link
Member

abutcher commented Aug 9, 2023

I'll make a card for the CO level status.
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 9, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 9, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abutcher, bentito

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 9, 2023
@openshift-ci-robot
Copy link
Contributor

/retest-required

Remaining retests: 0 against base HEAD 4031723 and 2 for PR HEAD eb6ef9b in total

@bentito
Copy link
Contributor Author

bentito commented Aug 9, 2023

@abutcher ci/prow/e2e-azure-manual-oidc is showing up as a required test here and it definitely shouldn't be here on an AWS related PR I'd think. Or at least not as a required...

@abutcher
Copy link
Member

abutcher commented Aug 9, 2023

Related to a recent change we made to the tests which should have made e2e-azure-manual-oidc required for changes which impact specific pathing but are instead making the test required.

/override ci/prow/e2e-azure-manual-oidc

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 9, 2023

@abutcher: Overrode contexts on behalf of abutcher: ci/prow/e2e-azure-manual-oidc

In response to this:

Related to a recent change we made to the tests which should have made e2e-azure-manual-oidc required for changes which impact specific pathing but are instead making the test required.

/override ci/prow/e2e-azure-manual-oidc

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bentito
Copy link
Contributor Author

bentito commented Aug 10, 2023

/test e2e-hypershift

3 similar comments
@bentito
Copy link
Contributor Author

bentito commented Aug 10, 2023

/test e2e-hypershift

@bentito
Copy link
Contributor Author

bentito commented Aug 10, 2023

/test e2e-hypershift

@bentito
Copy link
Contributor Author

bentito commented Aug 10, 2023

/test e2e-hypershift

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 10, 2023

@bentito: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit f5b6c10 into openshift:master Aug 10, 2023
10 checks passed
@openshift-ci-robot
Copy link
Contributor

@bentito: Jira Issue OCPBUGS-16684: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-16684 has been moved to the MODIFIED state.

In response to this:

I think this cures the bug as long as I tracked from the correct part of the code that was setting cr.status.provisioned to true despite a failure to actually provision the Secret.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants