Skip to content

Commit

Permalink
Set AWS container properties from container options (nextflow-io#2471)
Browse files Browse the repository at this point in the history
This commit adds the support for `containerOptions` directive for AWS Batch jobs 

This is useful to fine control the container execution properties using the same
the option used for the Docker container runtime.

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Co-authored-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
  • Loading branch information
2 people authored and phue committed Jan 25, 2022
1 parent 36d064f commit 7db9b4b
Show file tree
Hide file tree
Showing 15 changed files with 705 additions and 68 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/build.yml
Expand Up @@ -5,7 +5,6 @@ on:
push:
branches:
- '*'
- '!refs/tags/.*'
tags-ignore:
- '*'
pull_request:
Expand All @@ -14,7 +13,6 @@ on:
jobs:
build:
name: Build Nextflow
if: "!contains(github.event.head_commit.message, '[ci skip]') && (github.event == 'push' || github.repository != github.event.pull_request.head.repo.full_name)"
runs-on: ubuntu-latest
timeout-minutes: 120
strategy:
Expand Down
80 changes: 63 additions & 17 deletions docs/awscloud.rst
Expand Up @@ -32,7 +32,7 @@ See :ref:`AWS configuration<config-aws>` for more details.
AWS IAM policies
=================

`AIM policies <https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html>`_ are the mechanism used by AWS to
`IAM policies <https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html>`_ are the mechanism used by AWS to
defines permissions for IAM identities. In order to access certain AWS services, the proper policies must be
attached to the identity associated to the AWS credentials.

Expand Down Expand Up @@ -76,10 +76,11 @@ Minimal permissions policies to be attached to the AWS account used by Nextflow

S3 policies
------------
Nextflow requires policies also to access `S3 buckets <https://aws.amazon.com/s3/>`_ in order to::
- use the workdir
- pull input data
- publish results
Nextflow requires policies also to access `S3 buckets <https://aws.amazon.com/s3/>`_ in order to:

1. use the workdir
2. pull input data
3. publish results

Depending on the pipeline configuration, the above actions can be done all in a single bucket but, more likely, spread across multiple
buckets. Once the list of buckets used by the pipeline is identified, there are two alternative ways to give Nextflow access to these buckets:
Expand Down Expand Up @@ -165,10 +166,10 @@ Get started
-------------

1 - In the AWS Console, create a `Compute environment <http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html>`_ (CE) in your AWS Batch Service.
* if are using a custom AMI (see following sections), the AMI ID must be specified in the CE configuration
* make sure to select an AMI (either custom or existing) with Docker installed (see following sections)
* make sure the policy ``AmazonS3FullAccess`` (granting access to S3 buckets) is attached to the instance role configured for the CE
* if you plan to use Docker images from Amazon ECS container, make sure the ``AmazonEC2ContainerServiceforEC2Role`` policy is also attached to the instance role
1.1 - if are using a custom AMI (see following sections), the AMI ID must be specified in the CE configuration
1.2 - make sure to select an AMI (either custom or existing) with Docker installed (see following sections)
1.3 - make sure the policy ``AmazonS3FullAccess`` (granting access to S3 buckets) is attached to the instance role configured for the CE
1.4 - if you plan to use Docker images from Amazon ECS container, make sure the ``AmazonEC2ContainerServiceforEC2Role`` policy is also attached to the instance role

2 - In the AWS Console, create (at least) one `Job Queue <https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html>`_
and bind it to the Compute environment
Expand All @@ -186,9 +187,10 @@ Configuration

When configuring your pipeline:

- import the `nf-amazon` plugin
- specify the AWS Batch :ref:`executor<awsbatch-executor>`
- specify one or more AWS Batch queues for the execution by using the :ref:`process-queue` directive.
1 - import the `nf-amazon` plugin
2 - specify the AWS Batch :ref:`executor<awsbatch-executor>`
3 - specify one or more AWS Batch queues for the execution by using the :ref:`process-queue` directive
4 - specify the AWS job container properties by using the :ref:`process-containerOptions` directive.

An example ``nextflow.config`` file is shown below::

Expand All @@ -200,6 +202,7 @@ An example ``nextflow.config`` file is shown below::
executor = 'awsbatch'
queue = 'my-batch-queue'
container = 'quay.io/biocontainers/salmon'
containerOptions = '--shm-size 16000000 --ulimit nofile=1280:2560 --ulimit nproc=16:32'
}

aws {
Expand All @@ -212,19 +215,62 @@ An example ``nextflow.config`` file is shown below::

Different queues bound to the same or different Compute environments can be configured according to each process' requirements.

Container Options
=================

As of version ``21.12.0-edge``, the use of the Nextflow :ref:`process-containerOptions` directive is supported to fine control
the properties of the container execution associated with each Batch job.

Not all the standard container options are supported by AWS Batch. These are the options accepted ::


-e, --env string
Set environment variables (format: <name> or <name>=<value>)
--init
Run an init inside the container that forwards signals and reaps processes
--memory-swap int
The total amount of swap memory (in MiB) the container can use: '-1' to enable unlimited swap
--memory-swappiness int
Tune container memory swappiness (0 to 100) (default -1)
--privileged
Give extended privileges to the container
--read-only
Mount the container's root filesystem as read only
--shm-size int
Size (in MiB) of /dev/shm
--tmpfs string
Mount a tmpfs directory (format: <path>:<options>,size=<int>), size is in MiB
-u, --user string
Username or UID (format: <name|uid>[:<group|gid>])
--ulimit string
Ulimit options (format: <type>=<soft limit>[:<hard limit>])

Container options must be passed in their long from for "--option value" or short form "-o value", if available.

Few examples ::

containerOptions '--tmpfs /run:rw,noexec,nosuid,size=128 --tmpfs /app:ro,size=64'

containerOptions '-e MYVAR1 --env MYVAR2=foo2 --env MYVAR3=foo3 --memory-swap 3240000 --memory-swappiness 20 --shm-size 16000000'

containerOptions '--ulimit nofile=1280:2560 --ulimit nproc=16:32 --privileged'


Check the `AWS doc <https://docs.aws.amazon.com/batch/latest/APIReference/API_ContainerProperties.html>`_ for further details.

Custom AMI
==========
There are several reasons why you might need to create your own `AMI (Amazon Machine Image) <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html>`_
to use in your Compute environments. Typically:

- you do not want to modify your existing Docker images and prefer to install the CLI tool on the hosting environment
1 - you do not want to modify your existing Docker images and prefer to install the CLI tool on the hosting environment

- the existing AMI (selected from the marketplace) does not have Docker installed
2 - the existing AMI (selected from the marketplace) does not have Docker installed

- you need to attach a larger storage to your EC2 instance (the default ECS instance AMI has only a 30G storage
volume which may not be enough for most data analysis pipelines)
3 - you need to attach a larger storage to your EC2 instance (the default ECS instance AMI has only a 30G storage
volume which may not be enough for most data analysis pipelines)

- you need to install additional software, not available in the Docker image used to execute the job
4 - you need to install additional software, not available in the Docker image used to execute the job

Create your custom AMI
----------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/process.rst
Expand Up @@ -1533,7 +1533,7 @@ only for a specific process e.g. mount a custom path::
}


.. warning:: This feature is not supported by :ref:`awsbatch-executor` and :ref:`k8s-executor` executors.
.. warning:: This feature is not supported by :ref:`k8s-executor` and :ref:`azurebatch-executor` executors.

.. _process-cpus:

Expand Down
Expand Up @@ -17,6 +17,8 @@

package nextflow.processor

import nextflow.util.CmdLineOptionMap

import static nextflow.processor.TaskProcessor.*

import java.nio.file.Path
Expand Down Expand Up @@ -412,9 +414,15 @@ class TaskConfig extends LazyMap implements Cloneable {
return opts instanceof CharSequence ? opts.toString() : null
}

Map getContainerOptionsMap() {
CmdLineOptionMap getContainerOptionsMap() {
def opts = get('containerOptions')
return opts instanceof Map ? opts : Collections.emptyMap()
if( opts instanceof Map )
return CmdLineOptionMap.fromMap(opts)
if( opts instanceof CharSequence )
return CmdLineHelper.parseGnuArgs(opts.toString())
if( opts!=null )
throw new IllegalArgumentException("Invalid `containerOptions` directive value: $opts [${opts.getClass().getName()}]")
return CmdLineOptionMap.emptyOption()
}

/**
Expand Down
87 changes: 62 additions & 25 deletions modules/nf-commons/src/main/nextflow/util/CmdLineHelper.groovy
Expand Up @@ -16,27 +16,36 @@
*/

package nextflow.util

import groovy.transform.CompileStatic

import java.util.regex.Pattern

/**
*
* Implement command line parsing helpers
*
* @author Paolo Di Tommaso <paolo.ditommaso@gmail.com>
*/
@CompileStatic
class CmdLineHelper {

def List<String> args
static private Pattern CLI_OPT = ~/--([a-zA-Z_-]+)(?:\W.*)?$|-([a-zA-Z])(?:\W.*)?$/

private List<String> args

CmdLineHelper( String cmdLineToBeParsed ) {
args = splitter(cmdLineToBeParsed ?: '')
}

def boolean contains(String argument) {
private boolean contains(String argument) {
return args.indexOf(argument) != -1
}

def getArg( String argument ) {
def pos = args.indexOf(argument)
private getArg( String argument ) {
int pos = args.indexOf(argument)
if( pos == -1 ) return null

def result = []
List<String> result = []
for( int i=pos+1; i<args.size(); i++ ) {
if( args[i].startsWith('-') ) {
break
Expand All @@ -55,25 +64,6 @@ class CmdLineHelper {
}
}

def asList( String argument, String splitter=',' ) {
def val = getArg(argument)
if( !val ) return val

if( val instanceof Boolean ) {
return []
}

if( val instanceof String ) {
val = [val]
}

for( int i=0; i<val.size(); i++ ) {
val[i] = val[i] ?. split(splitter)
}

return val.flatten()
}


/**
* Given a string the splitter method separate it by blank returning a list of string.
Expand Down Expand Up @@ -112,4 +102,51 @@ class CmdLineHelper {
return result.join(' ')
}

/**
* Parse command line and returns the options and their values as a map object.
*
* @param cmdline
* The command line as single string
* @return
* A map object holding the option key-value(s) associations
*/
static CmdLineOptionMap parseGnuArgs(String cmdline) {
final BLANK = ' ' as char
final result = new CmdLineOptionMap()

if( !cmdline )
return result

final tokenizer = new QuoteStringTokenizer(cmdline, BLANK);
String opt = null
String last = null
while( tokenizer.hasNext() ) {
final String token = tokenizer.next()
if( !token || token=='--')
continue
final matcher = CLI_OPT.matcher(token)
if( matcher.matches() ) {
if( opt ) {
result.addOption(opt,'true')
}
opt = matcher.group(1) ?: matcher.group(2)
}
else {
if( !opt ) {
if( !last ) continue
result.addOption(last, token)
}
else {
result.addOption(opt, token)
last = opt
opt = null
}
}
}

if( opt )
result.addOption(opt, 'true')

return result
}
}
82 changes: 82 additions & 0 deletions modules/nf-commons/src/main/nextflow/util/CmdLineOptionMap.groovy
@@ -0,0 +1,82 @@
package nextflow.util


import com.google.common.hash.Hasher
import groovy.transform.CompileStatic
import groovy.transform.EqualsAndHashCode
import groovy.transform.ToString

/**
* Holder for parsed command line options.
*
* @author Manuele Simi <manuele.simi@gmail.com>
*/
@CompileStatic
@ToString(includes = 'options', includeFields = true)
@EqualsAndHashCode(includes = 'options', includeFields = true)
class CmdLineOptionMap implements CacheFunnel {

final private Map<String, List<String>> options = new LinkedHashMap<String, List<String>>()
final private static CmdLineOptionMap EMPTY = new CmdLineOptionMap()

protected CmdLineOptionMap addOption(String key, String value) {
if ( !options.containsKey(key) )
options[key] = new ArrayList<String>(10)
options[key].add(value)
return this
}

boolean hasMultipleValues(String key) {
options.containsKey(key) ? options[key].size() > 1 : false
}

boolean hasOptions() {
options.size()
}

List<String> getValues(String key) {
return options.containsKey(key) ? options[key] : Collections.emptyList() as List<String>
}

def getFirstValue(String key) {
getFirstValueOrDefault(key, null)
}

boolean asBoolean() {
return options.size()>0
}

boolean exists(String key) {
options.containsKey(key)
}

def getFirstValueOrDefault(String key, String alternative) {
options.containsKey(key) && options[key].get(0) ? options[key].get(0) : alternative
}

static CmdLineOptionMap fromMap(final Map map) {
def optionMap = new CmdLineOptionMap()
map.each {
optionMap.addOption(it.key as String, it.value as String)
}
return optionMap
}

static CmdLineOptionMap emptyOption() {
return EMPTY
}

@Override
String toString() {
def serialized = []
options.each {
serialized << "option{${it.key}: ${it.value.each {it}}}"
}
return "[${serialized.join(', ')}]"
}

@Override
Hasher funnel(Hasher hasher, CacheHelper.HashMode mode) {
return CacheHelper.hasher(hasher, options, mode)
}
}
Expand Up @@ -26,7 +26,7 @@ package nextflow.util
* @author Paolo Di Tommaso <paolo.ditommaso@gmail.com>
*
*/
public class QuoteStringTokenizer implements Iterator<String>, Iterable<String> {
class QuoteStringTokenizer implements Iterator<String>, Iterable<String> {

private List<Character> chars = Arrays.<Character>asList(' ' as Character);

Expand Down

0 comments on commit 7db9b4b

Please sign in to comment.