Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HOSTEDCP-1051: addition of grace period for aws infra destruction #2967

Merged
merged 2 commits into from Oct 11, 2023

Conversation

Patryk-Stefanski
Copy link
Contributor

@Patryk-Stefanski Patryk-Stefanski commented Aug 31, 2023

What this PR does / why we need it:

Adds InfraGracePeriod timeout flag for deleting AWS infrastructure resources so that if something goes wrong with deleting the infrastructure resources, the deletion process is not stuck.

Which issue(s) this PR fixes
Fixes #

HOSTEDCP-1051

Checklist

  • Subject and description added to both, commit and PR.
  • Relevant issues have been referenced.
  • This change includes docs.
  • This change includes unit tests.

@openshift-ci openshift-ci bot added area/cli Indicates the PR includes changes for CLI and removed do-not-merge/needs-area labels Aug 31, 2023
@@ -77,9 +80,14 @@ func NewDestroyCommand() *cobra.Command {
return cmd
}

func (o *DestroyInfraOptions) Run(ctx context.Context) error {
func (o *DestroyInfraOptions) Run(ctx context.Context, timeout time.Duration) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to pass the timeout as an argument here?
you can just get it from the DestroyInfraOptions struct as o.AwsInfraGracePeriod

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't, good spot.

@Patryk-Stefanski Patryk-Stefanski force-pushed the HOSTEDCP-1051 branch 2 times, most recently from 06b9821 to fbfec64 Compare September 4, 2023 11:23
@netlify
Copy link

netlify bot commented Sep 4, 2023

Deploy Preview for hypershift-docs ready!

Name Link
🔨 Latest commit fbfec64
🔍 Latest deploy log https://app.netlify.com/sites/hypershift-docs/deploys/64f5be3add53b6000895145f
😎 Deploy Preview https://deploy-preview-2967--hypershift-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@Patryk-Stefanski Patryk-Stefanski changed the title Draft: HOSTEDCP-1051 addition of grace period for aws infra destruction HOSTEDCP-1051 addition of grace period for aws infra destruction Sep 5, 2023
@Patryk-Stefanski
Copy link
Contributor Author

/retest-required

@@ -78,8 +81,13 @@ func NewDestroyCommand() *cobra.Command {
}

func (o *DestroyInfraOptions) Run(ctx context.Context) error {
destroyInfraCtx, destroyInfraCtxCancel := context.WithTimeout(ctx, o.AwsInfraGracePeriod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably need to check if AwsInfraGracePeriod is set here and use a default if not set before using it.

Setting the default value in the args/flags is not enough as this code could be called from somewhere else like we do in our e2e here https://github.com/openshift/hypershift/blob/main/test/e2e/util/fixture.go#L81-L87

@Patryk-Stefanski
Copy link
Contributor Author

/retest-required

destroyInfraCtx, destroyInfraCtxCancel := context.WithTimeout(ctx, o.AwsInfraGracePeriod)
defer destroyInfraCtxCancel()

o.Log.Info(fmt.Sprintf("waiting %d s", int(o.AwsInfraGracePeriod.Seconds())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carefully with the log entries, we usually set most of the things on conditions to avoid the "too much verbosity" issue. Maybe make sense to explore to update the reason of InfrastructureReady condition saying something like Deprovisioning... Grace Period remaining X or something like that (IMHO). WDYT @muraee ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense. Will I remove it for now and if needs be I can look into reporting it in the condition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logs and conditions are signal for different problem spaces. Conditions are contractual signal for consumers subject to API support policies. Logs are non contractual UX for humans.
External clients like the cli should no often manipulate status for resources owned by controllers. Also at the time of deleting the infra, all the CRs should be gone anyways.
Agree on wanting to avoid unnecessary logs, but as far as I cans see this line would only log once which is fine. I'd consider even adding a line with the remaining timeout along the existing o.Log.Info("WARNING: error during destroy, will retry", "error", err.Error()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the log to be more descriptive so it now looks like Infra destruction timeout set to 5 s. I think when someone sets a timeout and the timeout duration set is reported back to him should be enough don't think there's a need for a countdown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the timeout is met? I'd expect we return an err and communicate deletion is moving ahead without infra deletion succeeding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Atm it exits out with an error and doesn’t proceed with the destruction. Don’t think implementing what you explained above would be too much work however I think it would lead to confusion as there would be resources left over after a successful destruction. I think it makes more sense to leave it as is but I’m open to discussion.

cmd/cluster/aws/destroy.go Outdated Show resolved Hide resolved
cmd/infra/aws/destroy.go Outdated Show resolved Hide resolved
cmd/infra/aws/destroy.go Outdated Show resolved Hide resolved
@@ -60,6 +61,7 @@ func NewDestroyCommand() *cobra.Command {
cmd.Flags().StringVar(&opts.Name, "name", opts.Name, "A name for the cluster")
cmd.Flags().StringVar(&opts.BaseDomain, "base-domain", opts.BaseDomain, "The ingress base domain for the cluster")
cmd.Flags().StringVar(&opts.BaseDomainPrefix, "base-domain-prefix", opts.BaseDomainPrefix, "The ingress base domain prefix for the cluster, defaults to cluster name. se 'none' for an empty prefix")
cmd.Flags().DurationVar(&opts.AwsInfraGracePeriod, "aws-infra-grace-period", opts.AwsInfraGracePeriod, "Timeout for destroying infrastructure in minutes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we comment include what default is in the flag description?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag is optional and there is no default value

@enxebre
Copy link
Member

enxebre commented Sep 25, 2023

Thanks! can you please squash commits and update the message to follow conventions in https://hypershift-docs.netlify.app/contribute/?

@@ -32,6 +31,7 @@ func NewDestroyCommand(opts *core.DestroyOptions) *cobra.Command {
cmd.Flags().StringVar(&opts.AWSPlatform.BaseDomain, "base-domain", opts.AWSPlatform.BaseDomain, "Cluster's base domain; inferred from the hosted cluster by default")
cmd.Flags().StringVar(&opts.AWSPlatform.BaseDomainPrefix, "base-domain-prefix", opts.AWSPlatform.BaseDomainPrefix, "Cluster's base domain prefix; inferred from the hosted cluster by default")
cmd.Flags().StringVar(&opts.CredentialSecretName, "secret-creds", opts.CredentialSecretName, "A Kubernetes secret with a platform credential, pull-secret and base-domain. The secret must exist in the supplied \"--namespace\"")
cmd.Flags().DurationVar(&opts.AWSPlatform.AwsInfraGracePeriod, "aws-infra-grace-period", opts.AWSPlatform.AwsInfraGracePeriod, "Timeout for destroying infrastructure in minutes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be added to the product CLI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, addressed in latest commit

@Patryk-Stefanski
Copy link
Contributor Author

/retest-required

@sjenning
Copy link
Contributor

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 10, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 10, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Patryk-Stefanski, sjenning

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2023
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 111b136 and 2 for PR HEAD 4d47f04 in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 11, 2023

@Patryk-Stefanski: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit 8f5fd6d into openshift:main Oct 11, 2023
12 checks passed
@celebdor
Copy link
Contributor

celebdor commented Oct 17, 2023

/retitle HOSTEDCP-1051: addition of grace period for aws infra destruction

@openshift-ci openshift-ci bot changed the title HOSTEDCP-1051 addition of grace period for aws infra destruction HOSTEDCP-1051: addition of grace period for aws infra destruction Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cli Indicates the PR includes changes for CLI lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants