Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully shutdown the terraform process #36

Merged
merged 1 commit into from Mar 27, 2020

Conversation

ialidzhikov
Copy link
Member

@ialidzhikov ialidzhikov commented Mar 16, 2020

What this PR does / why we need it:
Currently signals sent to PID 1 in the terraformer image are not forwarded to the child processes as PID 1 is sh. This prevents the terraform process itself to receive a SIGTERM and start graceful termination. After a terminationGracePeriodSeconds, the terraformer Pod can be SIGKILL-ed and the terraform process itself can lose its current terraform.tfstate.

This PR:

  • introduces a wrapper terraformer.sh process which runs as PID 1 and sends SIGTERM to the running child terraform.sh process (if any) on termination and waits for its completion
  • modifies terraform.sh to sent SIGTERM to the running terraform process (if any) on termination and waits for its completion. On the other side, if the current command is apply, terraform stops creating new resources and waits only for the creation of the resources which were already in creating when the SIGTERM was sent.

Which issue(s) this PR fixes:
Ref gardener-attic/gardener-extensions#597

Special notes for your reviewer:

Release note:

`terraformer` does no longer ignore the termination signals sent to PID 1. It does now send a termination signal to the terraform process itself and waits for its completion. This should prevent rare cases in which the `terraformer` was not storing the state of created infrastructure resources.

@ialidzhikov ialidzhikov added the kind/bug Bug label Mar 16, 2020
@ialidzhikov ialidzhikov requested a review from a team as a code owner March 16, 2020 17:35
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 16, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 16, 2020
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with AWS - works perfectly, great!

$ k -n shoot--foobar--aws logs infrastructure.infra.tf-apply-mq7bq -f
Initializing the backend...

[...]

* provider.aws: version = "~> 2.26"

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
aws_iam_role.nodes: Creating...
aws_vpc_dhcp_options.vpc_dhcp_options: Creating...
aws_vpc.vpc: Creating...
aws_iam_role.bastions: Creating...
Tue Mar 17 06:04:30 UTC 2020 Sending SIGTERM to terraform.sh process 7.
Tue Mar 17 06:04:30 UTC 2020 Waiting for terraform.sh process 7 to complete...
Tue Mar 17 06:04:30 UTC 2020 Sending SIGTERM to terraform process 31.
Interrupt received.
Please wait for Terraform to exit or data loss may occur.
Gracefully shutting down...
Stopping operation...
Tue Mar 17 06:04:30 UTC 2020 Waiting for terraform process 31 to complete...
aws_vpc_dhcp_options.vpc_dhcp_options: Creation complete after 1s [id=dopt-0bb24fabf45522cbc]
aws_iam_role.bastions: Creation complete after 1s [id=shoot--foobar--aws-bastions]
aws_iam_role.nodes: Creation complete after 2s [id=shoot--foobar--aws-nodes]
aws_vpc.vpc: Creation complete after 4s [id=vpc-0f2faa33eba498d7d]

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Outputs:

nodes_role_arn = arn:aws:iam::***:role/shoot--foobar--aws-nodes
vpc_id = vpc-0f2faa33eba498d7d
Tue Mar 17 06:04:33 UTC 2020 Terraform process 31 completed.

ConfigMap successfully updated with terraform state.
Tue Mar 17 06:04:34 UTC 2020 Command executed successful.
Tue Mar 17 06:04:34 UTC 2020 terraform.sh process 7 completed.
Tue Mar 17 06:04:34 UTC 2020 terraform.sh exited with 143.

terraform.sh Show resolved Hide resolved
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 17, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 17, 2020
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

terraform.sh Outdated Show resolved Hide resolved
@rfranzke rfranzke added the reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies label Mar 17, 2020
Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 17, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Mar 17, 2020
@ialidzhikov ialidzhikov added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 17, 2020
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Mar 17, 2020
@ialidzhikov ialidzhikov removed the reviewed/do-not-merge Has no approval for merging as it may break things, be of poor quality or have (ext.) dependencies label Mar 17, 2020
Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm <3

Copy link
Member

@timebertt timebertt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Bug needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants