Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/destroy/aws: Set 'matched' on tag-pagination errors #1129

Conversation

wking
Copy link
Member

@wking wking commented Jan 25, 2019

Fixing a bug from e24c7dc (#1039). Before this commit, we were setting loopError, so we'd still take another pass through the loop. But we weren't setting matched, because fetch errors (e.g. because the caller lacked tag:GetResources) would not have returned any resources. The deletion code would wrongly assume that there were no matching resources behind that tagClient and remove the client from tagClients. Run would end up exiting non-zero despite having abandoned the resources behind that tagClient.

With this commit, we no longer prune the tagClient. And since we don't distinguish between fatal and non-fatal errors, we'll just loop forever until the caller notices the problem and kills us. That's not great, but with permission pre-checks in the pipe via install-time credential operator calls, I don't know if it's worth putting in a fatal/nonfatal distinction now.

CC @dgoodwin

Fixing a bug from e24c7dc (pkg/destroy/aws: Use the resource-groups
service for tag->ARN lookup, 2019-01-10, openshift#1039).  Before this commit,
we were setting loopError, so we'd still take another pass through the
loop.  But we weren't setting 'matched', because fetch errors
(e.g. because the caller lacked tag:GetResources) would not have
returned *any* resources.  The deletion code would wrongly assume that
there were no matching resources behind that tagClient and remove the
client from tagClients.  'Run' would end up exiting non-zero despite
having abandoned the resources behind that tagClient.

With this commit, we no longer prune the tagClient.  And since we
don't distinguish between fatal and non-fatal errors, we'll just loop
forever until the caller notices the problem and kills us.  That's not
great, but with permission pre-checks in the pipe via install-time
credential operator calls, I don't know if it's worth putting in a
fatal/nonfatal distinction now.
@openshift-ci-robot openshift-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 25, 2019
@wking wking force-pushed the aws-destroy-tag-search-error-handling branch from 3b6f748 to 6a32dcf Compare January 25, 2019 19:08
@wking wking added kind/bug Categorizes issue or PR as related to a bug. platform/aws labels Jan 25, 2019
@dgoodwin
Copy link
Contributor

Thanks, I won't LGTM... but LGTM.

@abhinavdahiya
Copy link
Contributor

Can error check for Unauthorized help us short-circuit ?

@wking
Copy link
Member Author

wking commented Jan 25, 2019

Can error check for Unauthorized help us short-circuit ?

Yes, but see the last paragraph in my topic post for why I didn't bother ;). Did you want me to bother?

@wking
Copy link
Member Author

wking commented Jan 25, 2019

Also linking #1100, which is the cred pre-checker. There would still be a possibility for "destroy run with different permissions than create", but it seems like a low probability.

Edit: never mind, #1100 is something else. I head @joelddiaz is working on cred-operator pre-checks.

@wking wking added this to the 0.11.0 milestone Jan 25, 2019
@wking
Copy link
Member Author

wking commented Jan 25, 2019

e2e-aws:

Flaky tests:

[sig-auth] ServiceAccounts should allow opting out of API token automount  [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]

Failing tests:

[Feature:DeploymentConfig] deploymentconfigs when run iteratively [Conformance] should immediately start a new deployment [Suite:openshift/conformance/parallel/minimal]

/retest

@wking
Copy link
Member Author

wking commented Jan 25, 2019

e2e-aws:


Flaky tests:

[Feature:DeploymentConfig] deploymentconfigs with failing hook [Conformance] should get all logs from retried hooks [Suite:openshift/conformance/parallel/minimal]
[sig-storage] In-tree Volumes [Driver: hostPath] [Testpattern: Inline-volume (default fs)] subPath should support readOnly directory specified in the volumeMount [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-storage] In-tree Volumes [Driver: nfs] [Testpattern: Pre-provisioned PV (default fs)] subPath should support non-existent path [Suite:openshift/conformance/parallel] [Suite:k8s]

Failing tests:

[sig-storage] Volume limits should verify that all nodes have volume limits [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@wking
Copy link
Member Author

wking commented Jan 26, 2019

e2e-aws:

Failing tests:

[sig-storage] Volume limits should verify that all nodes have volume limits [Suite:openshift/conformance/parallel] [Suite:k8s]

So... close... :p

/retest

@crawford
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jan 26, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crawford, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

8 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Member Author

wking commented Jan 26, 2019

/retest

Pick up openshift/origin#21867.

@wking
Copy link
Member Author

wking commented Jan 26, 2019

e2e-aws:

Failing tests:

[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@wking
Copy link
Member Author

wking commented Jan 27, 2019

e2e-aws:

Flaky tests:

[Feature:DeploymentConfig] deploymentconfigs when run iteratively [Conformance] should immediately start a new deployment [Suite:openshift/conformance/parallel/minimal]

Failing tests:

[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should not deadlock when a pod's predecessor fails [Suite:openshift/conformance/parallel] [Suite:k8s]
[sig-apps] StatefulSet [k8s.io] Basic StatefulSet functionality [StatefulSetBasic] should provide basic identity [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@wking
Copy link
Member Author

wking commented Jan 27, 2019

e2e-aws:

Failing tests:

[Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes [Suite:openshift/conformance/parallel]
[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should report telemetry if a cloud.openshift.com token is present [Suite:openshift/conformance/parallel/minimal]
[Feature:Prometheus][Conformance] Prometheus when installed on the cluster should start and expose a secured proxy and unsecured metrics [Suite:openshift/conformance/parallel/minimal]
[sig-storage] Dynamic Provisioning DynamicProvisioner should provision storage with different parameters [Suite:openshift/conformance/parallel] [Suite:k8s]

/retest

@openshift-merge-robot openshift-merge-robot merged commit f8a946e into openshift:master Jan 27, 2019
@wking wking deleted the aws-destroy-tag-search-error-handling branch January 27, 2019 15:54
wking added a commit to wking/openshift-installer that referenced this pull request Jan 27, 2019
Through f8a946e (Merge pull request openshift#1129 from
wking/aws-destroy-tag-search-error-handling, 2019-01-27).
wking added a commit to wking/openshift-installer that referenced this pull request Jan 27, 2019
Through f8a946e (Merge pull request openshift#1129 from
wking/aws-destroy-tag-search-error-handling, 2019-01-27).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. platform/aws size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants