CloudFormation templates to set up the necessary cloud infrastructure for AWS pipeline tests.
- Process overview
- Prerequisites
- Setting up the AWS infrastructure
- Step 0: Create an S3 bucket
- Step 1: Set up a Virtual Private Cloud (VPC)
- Step 2: Set up the Core Environment
- Step 3: Set up the Nextflow resources
- Step 4: Set up GitHub Actions
- Using the same infrastructure for Nextflow Tower Launch
- Launch your AWS jobs with Nextflow Tower
When this GitHub Action workflow is triggered, it starts an AWS batch job, which has permissions to subsequently spawn more AWS batch jobs. The first batch job pulls the nf-core pipeline from GitHub and starts it. The running pipeline can access a previously created S3 bucket to store any output files, such as the work directory, the trace directory, and the results. The pipeline's progress can be monitored on nf-core's nextflow tower instance. The final results are provided to nf-co.re.
- Access to nf-core AWS account
This process was set up by following this guide but some of the templates were adapted to include missing permission, so we recommend to use the templates in this repository instead.
If an S3 bucket does not exist yet, create an S3 bucket to store the run intermediate files and results. An S3 bucket was created to store all work
and results
directories for the AWS tests: S3:nf-core-awsmegatests
.
- Log in to AWS
- Navigate to
S3
- Create new bucket, remember the name, i.e.:
nf-core-awsmegatests
- Our template is based on the template on this tutorial: 'Launch Quick Start'. You can directly launch this one, or the VPCsetup.yml template in this repository.
eu-west-1
.
- Under
CloudFormation
, select 'Create Template' with new resources. - Select 'Template is ready' and 'Upload a template file'
- Upload the template VPCsetup.yml
- Set 'Availability Zones' to
eu-west-1, eu-west-2, eu-west-3
- Set 'Number of Availability Zones' to
3
- Follow to the next steps of the wizard, acknowledge the capabilities and create stack.
The template used in this step is based on the template available here: 'Option A: Full Stack':
- Press 'Launch Stack'
- Select 'Template is ready'
- Select 'Upload a template file'
- Use GWFcore.yml and press 'Next'
- Follow the launch wizard. Give the stack a name (e.g. GWFcore).
- Set 'S3 bucket name' to
nf-core-awsmegatests
(or the name specified in the previous step). Set existing to 'true' if the S3 bucket exists. - Set 'Workflow orchestrator' to
Nextflow
- Private subnet -> VPC created on previous step (check on previous step in resources tab), private subnet 1A/2A/3A
- Set max spot bid % to a reasonable number (e.g. 60 %), alter any other defaults settings as necessary. We left the rest of the settings by default
- Follow to the next step of the wizard, acknowledge the capabilities and create stack.
- Launch the Nextflow resources template
- Provide a name to the stack (e.g. NextflowResources).
- Provide the S3 bucket name for the data and nextflow logs (we provided the
nf-core-awstests
bucket). The bucket must exist. - Provide the default job queue ARNs that were generated as output when running the previous template.
- Leave the Nextflow container image field empty. You can leave the optional fields as default.
- Provide the high priority job queue ARNs generated as output of the previous template.
- Acknowledge the capabilities and create stack.
A GitHub Actions workflow example to trigger the AWS tests can be found here. The secrets that it uses need to be set up at an organization level, so that all pipelines can use them:
- AWS_ACCESS_KEY_ID: IAM key ID for the AWS user in the nf-core account organization. A specific user was set with restricted roles to run these tests.
- AWS_SECRET_ACCESS_KEY: IAM key secret for the same user.
- AWS_TOWER_TOKEN: token for Nextflow tower, to be able to track all pipeline tests running on AWS.
- AWS_JOB_DEFINITION: this job definition needs to be created once manually on AWS batch and can then be used in all pipeline runs. Currently, it is called
nextflow
. - AWS_JOB_QUEUE: the name of the default queue that was created with CloudFormation templates.
- AWS_S3_BUCKET: the name of the s3 bucket specified during the template launch (nf-core-awsmegatests).
The GitHub actions workflow installs Miniconda
as it is needed to install up awscli
. In order to use Miniconda
the latest stable release of a GitHub Action offered from the marketplace is used. Subsequently, awscli
is installed via the conda-forge
channel.
For accessing the nf-core AWS account as well the nextflow tower instance, secrets have to be set. This can only be done by one of the core members within the repository under Settings > Secrets > Add new secret.
To use the same infrastructure to perform manual pipeline launches via Nextflow Tower Launch, the following additional setups need to be made:
- Adding the necessary permissions to the used Instance Role.
For this you will have to manually create 3 different policies containing the specified permissions (under the IAM service):
- policy batchNextflowTower:
- "batch:DescribeJobQueues"
- "batch:CancelJob"
- "batch:SubmitJob"
- "batch:ListJobs"
- "batch:DescribeComputeEnvironments"
- "batch:TerminateJob"
- "batch:DescribeJobs"
- "batch:RegisterJobDefinition"
- "batch:DescribeJobDefinitions"
- policy ec2NextflowTower:
- "ec2:DescribeInstances"
- "ec2:DescribeInstanceAttribute"
- "ec2:DescribeInstanceTypes"
- "ec2:DescribeInstanceStatus"
- policy ecsNextflowTower:
- "ecs:DescribeContainerInstances"
- "ecs:DescribeTasks"
And then under roles find the "GenomicsEnvBatchInstance role (GWFcore-IamStack-1TEL4W6B-GenomicsEnvBatchInstance in our case) and attached the policies to this role.
- Create an "nf-core-head" queue. The head queue to lauch the nextflow head job for tower is required to be using on instances on demand. Therefore a queue can be created from the "ondemand" compute environment that was already created together with the compute queues using the CloudFormation templates.
Go to tower.nf and follow these steps:
- Create a compute environment
- Choose a name
- Select Amazon Batch as platform
- Use as credentials your user's AWS access key ID and secret for the nf-core AWS account.
- Region: eu-west-1
- Storage type: AWS S3
- Pipeline work directory: any path under s3://nf-core-awsmegatests will work (if it does not exist it creates it)
- Head queue: nf-core-head
- Compute queue: default-9bc88600-ab1e-11ea-8954-02ea27505f6c (the exact numbers will change if the CloudFormation template is re-run again, but the name should start by "default")
- Leave the Head Job role and Compute Job role fields empty.
- Launch your pipeline with the defined compute environment.
- Fill in the fields, it's straight-forward. Don't forget to specify the
outdir
directory parameter so you can find your results, e.g.:
params{
outdir = 's3://nf-core-awsmegatests/foobar'
}
- Enjoy monitoring your AWS run!