New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HOSTEDCP-1051: addition of grace period for aws infra destruction #2967
Conversation
cmd/infra/aws/destroy.go
Outdated
@@ -77,9 +80,14 @@ func NewDestroyCommand() *cobra.Command { | |||
return cmd | |||
} | |||
|
|||
func (o *DestroyInfraOptions) Run(ctx context.Context) error { | |||
func (o *DestroyInfraOptions) Run(ctx context.Context, timeout time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to pass the timeout as an argument here?
you can just get it from the DestroyInfraOptions struct as o.AwsInfraGracePeriod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't, good spot.
06b9821
to
fbfec64
Compare
✅ Deploy Preview for hypershift-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/retest-required |
cmd/infra/aws/destroy.go
Outdated
@@ -78,8 +81,13 @@ func NewDestroyCommand() *cobra.Command { | |||
} | |||
|
|||
func (o *DestroyInfraOptions) Run(ctx context.Context) error { | |||
destroyInfraCtx, destroyInfraCtxCancel := context.WithTimeout(ctx, o.AwsInfraGracePeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably need to check if AwsInfraGracePeriod
is set here and use a default if not set before using it.
Setting the default value in the args/flags is not enough as this code could be called from somewhere else like we do in our e2e here https://github.com/openshift/hypershift/blob/main/test/e2e/util/fixture.go#L81-L87
/retest-required |
cmd/infra/aws/destroy.go
Outdated
destroyInfraCtx, destroyInfraCtxCancel := context.WithTimeout(ctx, o.AwsInfraGracePeriod) | ||
defer destroyInfraCtxCancel() | ||
|
||
o.Log.Info(fmt.Sprintf("waiting %d s", int(o.AwsInfraGracePeriod.Seconds()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Carefully with the log entries, we usually set most of the things on conditions to avoid the "too much verbosity" issue. Maybe make sense to explore to update the reason of InfrastructureReady
condition saying something like Deprovisioning... Grace Period remaining X
or something like that (IMHO). WDYT @muraee ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah makes sense. Will I remove it for now and if needs be I can look into reporting it in the condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logs and conditions are signal for different problem spaces. Conditions are contractual signal for consumers subject to API support policies. Logs are non contractual UX for humans.
External clients like the cli should no often manipulate status for resources owned by controllers. Also at the time of deleting the infra, all the CRs should be gone anyways.
Agree on wanting to avoid unnecessary logs, but as far as I cans see this line would only log once which is fine. I'd consider even adding a line with the remaining timeout along the existing o.Log.Info("WARNING: error during destroy, will retry", "error", err.Error()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the log to be more descriptive so it now looks like Infra destruction timeout set to 5 s
. I think when someone sets a timeout and the timeout duration set is reported back to him should be enough don't think there's a need for a countdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the timeout is met? I'd expect we return an err and communicate deletion is moving ahead without infra deletion succeeding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Atm it exits out with an error and doesn’t proceed with the destruction. Don’t think implementing what you explained above would be too much work however I think it would lead to confusion as there would be resources left over after a successful destruction. I think it makes more sense to leave it as is but I’m open to discussion.
@@ -60,6 +61,7 @@ func NewDestroyCommand() *cobra.Command { | |||
cmd.Flags().StringVar(&opts.Name, "name", opts.Name, "A name for the cluster") | |||
cmd.Flags().StringVar(&opts.BaseDomain, "base-domain", opts.BaseDomain, "The ingress base domain for the cluster") | |||
cmd.Flags().StringVar(&opts.BaseDomainPrefix, "base-domain-prefix", opts.BaseDomainPrefix, "The ingress base domain prefix for the cluster, defaults to cluster name. se 'none' for an empty prefix") | |||
cmd.Flags().DurationVar(&opts.AwsInfraGracePeriod, "aws-infra-grace-period", opts.AwsInfraGracePeriod, "Timeout for destroying infrastructure in minutes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we comment include what default is in the flag description?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flag is optional and there is no default value
Thanks! can you please squash commits and update the message to follow conventions in https://hypershift-docs.netlify.app/contribute/? |
4a34b10
to
7283d58
Compare
@@ -32,6 +31,7 @@ func NewDestroyCommand(opts *core.DestroyOptions) *cobra.Command { | |||
cmd.Flags().StringVar(&opts.AWSPlatform.BaseDomain, "base-domain", opts.AWSPlatform.BaseDomain, "Cluster's base domain; inferred from the hosted cluster by default") | |||
cmd.Flags().StringVar(&opts.AWSPlatform.BaseDomainPrefix, "base-domain-prefix", opts.AWSPlatform.BaseDomainPrefix, "Cluster's base domain prefix; inferred from the hosted cluster by default") | |||
cmd.Flags().StringVar(&opts.CredentialSecretName, "secret-creds", opts.CredentialSecretName, "A Kubernetes secret with a platform credential, pull-secret and base-domain. The secret must exist in the supplied \"--namespace\"") | |||
cmd.Flags().DurationVar(&opts.AWSPlatform.AwsInfraGracePeriod, "aws-infra-grace-period", opts.AWSPlatform.AwsInfraGracePeriod, "Timeout for destroying infrastructure in minutes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be added to the product CLI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, addressed in latest commit
/retest-required |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Patryk-Stefanski, sjenning The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Patryk-Stefanski: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retitle HOSTEDCP-1051: addition of grace period for aws infra destruction |
What this PR does / why we need it:
Adds InfraGracePeriod timeout flag for deleting AWS infrastructure resources so that if something goes wrong with deleting the infrastructure resources, the deletion process is not stuck.
Which issue(s) this PR fixes
Fixes #
HOSTEDCP-1051
Checklist