Skip to content

Commit

Permalink
feat: add option ephemeral runners (#1374)
Browse files Browse the repository at this point in the history
* add option ephemeral runners

* fix tests

* Add retry mechanisme for scaling errors

* Add retry mechanisme for scaling errors

Add retry mechanisme for scaling errors

Add retry mechanisme for scaling errors

Add retry mechanisme for scaling errors

* Add tests for lamda handler

* Add basic test for ephemeral case

* Add basic test for scale down in lambda wrapper

* Ensure check_runs are ignored for ephemeral runners

* limit termination to only the instance itself

* fix: add logging context to runner lambda (#1399)

* fix(logging): Add context to scale logs

Signed-off-by: Nathaniel McAuliffe <nmcauliffe@expediagroup.com>

* Remove testing

Signed-off-by: Nathaniel McAuliffe <nmcauliffe@expediagroup.com>

* Remove unnecessary import

Signed-off-by: Nathaniel McAuliffe <nmcauliffe@expediagroup.com>

* Moving log fields to end, adjusting format

* feat: Add hooks for prebuilt images (AMI), including amazon linux packer example (#1444)

* Initial creation of runner image

* Refactored startup script and added it to the per-boot folder

* Make the runner location a variable

So we can pass the runner version in at packer build time if we want to update the runner version.

* Retrieve external config setting via tags

Retrieve the required config via the instance tags so we dont have to pass in and set environment on the instance in an awkward way.

* Enable tag based config

Give the instance the permission to query its own tags and set the correct tags on the instance.

* Add a CI job

* Fix the CI build

* Fix the formatting

* Retain user_data provisioning and remove duplication

refactored to make sure user_data continues to work with minimal breaking changes.
Use a single set of scripts shared between image and user_data provisioning.

* Fix interpolation issues in template file

* fix build

* Fix formatting

* minor tweaks and fixes

* Fixes from testing

* Enable docker on boot

* Add in output of start time for the runner

* Scoop up the runner log

* Add a powershell build script for windows users

* Fix formatting

* Use SSM parameters for configuration

Its best practice to use SSM parameters for configuration of the runners. In adding this i have also added parameter path  based config so its easy to extend in the future.

* Make the SSM policy more specific

* Update .github/workflows/packer-build.yml

Co-authored-by: Niek Palm <npalm@users.noreply.github.com>

* Added condition to the describe tags policy

* Dont use templatefile on the tags policy

Because of the use of ${} in the policy terraform is trying to replace it.

* Added an option to turn off userdata scripting

* Added/updated documentation

* Revert policy as it has no effect on the permissions

* Add reference to prebuilt images in the main readme

* Add an example of deploying with prebuilt images

* Update readme

* Use current user as ami_owner

* Update example to 5 secs

* Updated ami name to include the arch

* Fixed log file variable

* Added explicit info about required settings to the readme

* Change userdata_enabled to enabled_userdata

Keep within existing naming convention

Co-authored-by: Niek Palm <npalm@users.noreply.github.com>

* add option ephemeral runners

* Add retry mechanisme for scaling errors

Add retry mechanisme for scaling errors

Add retry mechanisme for scaling errors

Add retry mechanisme for scaling errors

* add dead letter queue, and refactor

* cleanup

* cleanup

* sync develop

* review fix

Co-authored-by: Scott Guymer <scott@scottguymer.co.uk>

* review fix

Co-authored-by: Scott Guymer <scott@scottguymer.co.uk>

* review vfix

Co-authored-by: Scott Guymer <scott@scottguymer.co.uk>

* review vfix

Co-authored-by: Scott Guymer <scott@scottguymer.co.uk>

* fix review

* process review comments

* process review comments

* review comment

* process review comments

* Update examples/ephemeral/README.md

Co-authored-by: Nathaniel McAuliffe <nmcauliffe@expediagroup.com>

* Process review comments

* Add docs

* review comments

* update docs

Co-authored-by: Scott Guymer <scott@scottguymer.co.uk>
Co-authored-by: Nathaniel McAuliffe <nmcauliffe@expediagroup.com>
  • Loading branch information
3 people committed Dec 22, 2021
1 parent 7cb73c8 commit 2f323d6
Show file tree
Hide file tree
Showing 37 changed files with 803 additions and 133 deletions.
11 changes: 11 additions & 0 deletions .ci/build-yarn.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env bash

# Build all the lambda's, output on the default place (inside the lambda module)

lambdaSrcDirs=("modules/runner-binaries-syncer/lambdas/runner-binaries-syncer" "modules/runners/lambdas/runners" "modules/webhook/lambdas/webhook")
repoRoot=$(dirname $(dirname $(realpath ${BASH_SOURCE[0]})))

for lambdaDir in ${lambdaSrcDirs[@]}; do
cd "$repoRoot/${lambdaDir}"
yarn && yarn run dist
done
47 changes: 41 additions & 6 deletions README.md

Large diffs are not rendered by default.

12 changes: 9 additions & 3 deletions examples/default/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,13 @@ module "runners" {
webhook_secret = random_id.random.hex
}

# Grab zip files via lambda_download
webhook_lambda_zip = "lambdas-download/webhook.zip"
runner_binaries_syncer_lambda_zip = "lambdas-download/runner-binaries-syncer.zip"
runners_lambda_zip = "lambdas-download/runners.zip"
enable_organization_runners = false
runner_extra_labels = "default,example"

enable_organization_runners = false
runner_extra_labels = "default,example"

# enable access to the runners via SSM
enable_ssm_on_runners = true
Expand All @@ -61,7 +63,11 @@ module "runners" {
instance_types = ["m5.large", "c5.large"]

# override delay of events in seconds
delay_webhook_event = 5
delay_webhook_event = 5
runners_maximum_count = 1

# set up a fifo queue to remain order
fifo_build_queue = true

# override scaling down
scale_down_schedule_expression = "cron(* * * * ? *)"
Expand Down
57 changes: 57 additions & 0 deletions examples/ephemeral/.terraform.lock.hcl

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 30 additions & 0 deletions examples/ephemeral/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Action runners deployment ephemeral example

This example is based on the default setup, but shows how runners can be used with the ephemeral flag enabled. Once enabled, ephemeral runners will be used for one job only. Each job requires a fresh instance. This feature should be used in combination with the `workflow_job` event. See GitHub webhook endpoint configuration(link needed here). It is also suggested to use a pre-build AMI to minimize runner launch times.
## Usages

Steps for the full setup, such as creating a GitHub app can be found in the root module's [README](../../README.md). First download the Lambda releases from GitHub. Alternatively you can build the lambdas locally with Node or Docker, there is a simple build script in `<root>/.ci/build.sh`. In the `main.tf` you can simply remove the location of the lambda zip files, the default location will work in this case.

> Ensure you have set the version in `lambdas-download/main.tf` for running the example. The version needs to be set to a GitHub release version, see https://github.com/philips-labs/terraform-aws-github-runner/releases
```bash
cd lambdas-download
terraform init
terraform apply
cd ..
```

Before running Terraform, ensure the GitHub app is configured. See the [configuration details](../../README.md#usages) for more details.

```bash
terraform init
terraform apply
```

You can receive the webhook details by running:

```bash
terraform output -raw webhook_secret
```

Be-aware some shells will print some end of line character `%`.
25 changes: 25 additions & 0 deletions examples/ephemeral/lambdas-download/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
locals {
version = "<REPLACE_BY_GITHUB_RELEASE_VERSION>"
}

module "lambdas" {
source = "../../../modules/download-lambda"
lambdas = [
{
name = "webhook"
tag = local.version
},
{
name = "runners"
tag = local.version
},
{
name = "runner-binaries-syncer"
tag = local.version
}
]
}

output "files" {
value = module.lambdas.files
}
71 changes: 71 additions & 0 deletions examples/ephemeral/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
locals {
environment = "ephemeraal"
aws_region = "eu-west-1"
}

resource "random_id" "random" {
byte_length = 20
}

data "aws_caller_identity" "current" {}

module "runners" {
source = "../../"
create_service_linked_role_spot = true
aws_region = local.aws_region
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets

environment = local.environment
tags = {
Project = "ProjectX"
}

github_app = {
key_base64 = var.github_app_key_base64
id = var.github_app_id
webhook_secret = random_id.random.hex
}

# Grab the lambda packages from local directory. Must run /.ci/build.sh first
webhook_lambda_zip = "../../lambda_output/webhook.zip"
runner_binaries_syncer_lambda_zip = "../../lambda_output/runner-binaries-syncer.zip"
runners_lambda_zip = "../../lambda_output/runners.zip"

enable_organization_runners = true
runner_extra_labels = "default,example"

# enable access to the runners via SSM
enable_ssm_on_runners = true

# Let the module manage the service linked role
# create_service_linked_role_spot = true

instance_types = ["m5.large", "c5.large"]

# override delay of events in seconds
delay_webhook_event = 0

# Ensure you set the number not too low, each build require a new instance
runners_maximum_count = 20

# override scaling down
scale_down_schedule_expression = "cron(* * * * ? *)"

enable_ephemeral_runners = true

# configure your pre-built AMI
# enabled_userdata = false
# ami_filter = { name = ["github-runner-amzn2-x86_64-2021*"] }
# ami_owners = [data.aws_caller_identity.current.account_id]

# Enable logging
# log_level = "debug"

# Setup a dead letter queue, by default scale up lambda will kepp retrying to process event in case of scaling error.
# redrive_policy_build_queue = {
# enabled = true
# maxReceiveCount = 50 # 50 retries every 30 seconds => 25 minutes
# deadLetterTargetArn = null
# }
}
15 changes: 15 additions & 0 deletions examples/ephemeral/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
output "runners" {
value = {
lambda_syncer_name = module.runners.binaries_syncer.lambda.function_name
}
}

output "webhook_endpoint" {
value = module.runners.webhook.endpoint
}

output "webhook_secret" {
sensitive = true
value = random_id.random.hex
}

3 changes: 3 additions & 0 deletions examples/ephemeral/providers.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
provider "aws" {
region = local.aws_region
}
5 changes: 5 additions & 0 deletions examples/ephemeral/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

variable "github_app_key_base64" {}

variable "github_app_id" {}

15 changes: 15 additions & 0 deletions examples/ephemeral/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 3.27"
}
local = {
source = "hashicorp/local"
}
random = {
source = "hashicorp/random"
}
}
required_version = ">= 0.14"
}
7 changes: 7 additions & 0 deletions examples/ephemeral/vpc.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module "vpc" {
source = "git::https://github.com/philips-software/terraform-aws-vpc.git?ref=2.2.0"

environment = local.environment
aws_region = local.aws_region
create_private_hosted_zone = false
}
21 changes: 17 additions & 4 deletions main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,13 +19,24 @@ resource "random_string" "random" {
}

resource "aws_sqs_queue" "queued_builds" {
name = "${var.environment}-queued-builds.fifo"
name = "${var.environment}-queued-builds${var.fifo_build_queue ? ".fifo" : ""}"
delay_seconds = var.delay_webhook_event
visibility_timeout_seconds = var.runners_scale_up_lambda_timeout
message_retention_seconds = var.job_queue_retention_in_seconds
fifo_queue = true
receive_wait_time_seconds = 10
content_based_deduplication = true
fifo_queue = var.fifo_build_queue
receive_wait_time_seconds = 0
content_based_deduplication = var.fifo_build_queue
redrive_policy = var.redrive_build_queue.enabled ? jsonencode({
deadLetterTargetArn = aws_sqs_queue.queued_builds_dlq[0].arn,
maxReceiveCount = var.redrive_build_queue.maxReceiveCount
}) : null

tags = var.tags
}

resource "aws_sqs_queue" "queued_builds_dlq" {
count = var.redrive_build_queue.enabled ? 1 : 0
name = "${var.environment}-queued-builds_dead_letter"

tags = var.tags
}
Expand All @@ -48,6 +59,7 @@ module "webhook" {
kms_key_arn = var.kms_key_arn

sqs_build_queue = aws_sqs_queue.queued_builds
sqs_build_queue_fifo = var.fifo_build_queue
github_app_webhook_secret_arn = module.ssm.parameters.github_app_webhook_secret.arn

lambda_s3_bucket = var.lambda_s3_bucket
Expand Down Expand Up @@ -92,6 +104,7 @@ module "runners" {
sqs_build_queue = aws_sqs_queue.queued_builds
github_app_parameters = local.github_app_parameters
enable_organization_runners = var.enable_organization_runners
enable_ephemeral_runners = var.enable_ephemeral_runners
scale_down_schedule_expression = var.scale_down_schedule_expression
minimum_running_time_in_minutes = var.minimum_running_time_in_minutes
runner_boot_time_in_minutes = var.runner_boot_time_in_minutes
Expand Down

0 comments on commit 2f323d6

Please sign in to comment.