Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set AWS container properties from container options #2471

Merged
merged 31 commits into from Dec 12, 2021
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
c664e63
Update changelog
pditommaso Nov 30, 2021
b9008c8
Map the container options of the task into the equivalent AWS job con…
manuelesimi Dec 1, 2021
dad83bf
Adjust format.
manuelesimi Dec 2, 2021
9dce577
Set size and options for tmpfs mounts.
manuelesimi Dec 2, 2021
5e9c1cc
Add containerOptions map [ci fast]
pditommaso Oct 4, 2021
f2ef534
Merge remote-tracking branch 'origin/aws-container-options' into AWS/…
manuelesimi Dec 2, 2021
5e9ae82
Use a custom handler as command line option map. Support repeatable o…
manuelesimi Dec 3, 2021
8fa9ea7
Support command line options with dashes.
manuelesimi Dec 3, 2021
0c79646
Use the option map to set AWS container options.
manuelesimi Dec 3, 2021
e12edaa
Adjust format.
manuelesimi Dec 3, 2021
6c6cbf4
Fix misleading comment.
manuelesimi Dec 4, 2021
7cfbbb0
Add static compilation to new classses. Adapt the code to comply.
manuelesimi Dec 5, 2021
52bd6e1
Test for CmdLineOptionMap.
manuelesimi Dec 5, 2021
75c5623
Update AWS doc with supported container options.
manuelesimi Dec 5, 2021
fdcbe5b
Merge branch 'master' into aws/container_properties
manuelesimi Dec 5, 2021
963d97c
Fix some typos in the AWS cloud page.
manuelesimi Dec 6, 2021
28c5361
Merge remote-tracking branch 'origin/aws/container_properties' into A…
manuelesimi Dec 6, 2021
a75742f
Specify ulimit format.
manuelesimi Dec 6, 2021
01aa2a2
Check that the container options are not empty.
manuelesimi Dec 8, 2021
0576d5c
Resolve some glitches with one or zero container options.
manuelesimi Dec 8, 2021
522d2c7
Avoid to create unnecessary objects for AWS properties.
manuelesimi Dec 9, 2021
27e12a8
Merge branch 'master' into aws/container_properties
pditommaso Dec 10, 2021
f43be44
Aws Container options improvements
pditommaso Dec 10, 2021
80dbfa8
Fix build
pditommaso Dec 10, 2021
a724067
Remove unused code.
manuelesimi Dec 10, 2021
d1d9da0
Define a unit test for each container properties.
manuelesimi Dec 10, 2021
96c3fd4
Fix failing tests.
manuelesimi Dec 11, 2021
4c90edd
Remove invalid test [ci fast]
pditommaso Dec 12, 2021
4922034
Groovified code [ci fast]
pditommaso Dec 12, 2021
064eacd
Fix failing test
pditommaso Dec 12, 2021
397f3f0
Add required version to docs [ci skip]
pditommaso Dec 12, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 0 additions & 2 deletions .github/workflows/build.yml
Expand Up @@ -5,7 +5,6 @@ on:
push:
branches:
- '*'
- '!refs/tags/.*'
tags-ignore:
- '*'
pull_request:
Expand All @@ -14,7 +13,6 @@ on:
jobs:
build:
name: Build Nextflow
if: "!contains(github.event.head_commit.message, '[ci skip]') && (github.event == 'push' || github.repository != github.event.pull_request.head.repo.full_name)"
runs-on: ubuntu-latest
timeout-minutes: 120
strategy:
Expand Down
78 changes: 61 additions & 17 deletions docs/awscloud.rst
Expand Up @@ -32,7 +32,7 @@ See :ref:`AWS configuration<config-aws>` for more details.
AWS IAM policies
=================

`AIM policies <https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html>`_ are the mechanism used by AWS to
`IAM policies <https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html>`_ are the mechanism used by AWS to
defines permissions for IAM identities. In order to access certain AWS services, the proper policies must be
attached to the identity associated to the AWS credentials.

Expand Down Expand Up @@ -76,10 +76,11 @@ Minimal permissions policies to be attached to the AWS account used by Nextflow

S3 policies
------------
Nextflow requires policies also to access `S3 buckets <https://aws.amazon.com/s3/>`_ in order to::
- use the workdir
- pull input data
- publish results
Nextflow requires policies also to access `S3 buckets <https://aws.amazon.com/s3/>`_ in order to:

1. use the workdir
2. pull input data
3. publish results

Depending on the pipeline configuration, the above actions can be done all in a single bucket but, more likely, spread across multiple
buckets. Once the list of buckets used by the pipeline is identified, there are two alternative ways to give Nextflow access to these buckets:
Expand Down Expand Up @@ -165,10 +166,10 @@ Get started
-------------

1 - In the AWS Console, create a `Compute environment <http://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html>`_ (CE) in your AWS Batch Service.
* if are using a custom AMI (see following sections), the AMI ID must be specified in the CE configuration
* make sure to select an AMI (either custom or existing) with Docker installed (see following sections)
* make sure the policy ``AmazonS3FullAccess`` (granting access to S3 buckets) is attached to the instance role configured for the CE
* if you plan to use Docker images from Amazon ECS container, make sure the ``AmazonEC2ContainerServiceforEC2Role`` policy is also attached to the instance role
1.1 - if are using a custom AMI (see following sections), the AMI ID must be specified in the CE configuration
1.2 - make sure to select an AMI (either custom or existing) with Docker installed (see following sections)
1.3 - make sure the policy ``AmazonS3FullAccess`` (granting access to S3 buckets) is attached to the instance role configured for the CE
1.4 - if you plan to use Docker images from Amazon ECS container, make sure the ``AmazonEC2ContainerServiceforEC2Role`` policy is also attached to the instance role

2 - In the AWS Console, create (at least) one `Job Queue <https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html>`_
and bind it to the Compute environment
Expand All @@ -186,9 +187,10 @@ Configuration

When configuring your pipeline:

- import the `nf-amazon` plugin
- specify the AWS Batch :ref:`executor<awsbatch-executor>`
- specify one or more AWS Batch queues for the execution by using the :ref:`process-queue` directive.
1 - import the `nf-amazon` plugin
2 - specify the AWS Batch :ref:`executor<awsbatch-executor>`
3 - specify one or more AWS Batch queues for the execution by using the :ref:`process-queue` directive
4 - specify the AWS job container properties by using the :ref:`process-containerOptions` directive.

An example ``nextflow.config`` file is shown below::

Expand All @@ -200,6 +202,7 @@ An example ``nextflow.config`` file is shown below::
executor = 'awsbatch'
queue = 'my-batch-queue'
container = 'quay.io/biocontainers/salmon'
containerOptions = '--shm-size 16000000 --ulimit nofile=1280:2560 --ulimit nproc=16:32'
}

aws {
Expand All @@ -212,19 +215,60 @@ An example ``nextflow.config`` file is shown below::

Different queues bound to the same or different Compute environments can be configured according to each process' requirements.

Container Options
=======================
Container options are mapped into AWS job container properties.

Not all the container options are supported by AWS Batch. These are the options accepted ::


-e, --env string
Set environment variables (format: <name> or <name>=<value>)
--init
Run an init inside the container that forwards signals and reaps processes
--memory-swap int
The total amount of swap memory (in MiB) the container can use: '-1' to enable unlimited swap
--memory-swappiness int
Tune container memory swappiness (0 to 100) (default -1)
--privileged
Give extended privileges to the container
--read-only
Mount the container's root filesystem as read only
--shm-size int
Size (in MiB) of /dev/shm
--tmpfs string
Mount a tmpfs directory (format: <path>:<options>,size=<int>), size is in MiB
-u, --user string
Username or UID (format: <name|uid>[:<group|gid>])
--ulimit string
Ulimit options (format: <type>=<soft limit>[:<hard limit>])

Container options must be passed in their long from for "--option value" or short form "-o value", if available.

Few examples ::

containerOptions '--tmpfs /run:rw,noexec,nosuid,size=128 --tmpfs /app:ro,size=64'

containerOptions '-e MYVAR1 --env MYVAR2=foo2 --env MYVAR3=foo3 --memory-swap 3240000 --memory-swappiness 20 --shm-size 16000000'

containerOptions '--ulimit nofile=1280:2560 --ulimit nproc=16:32 --privileged'


Check the `AWS doc <https://docs.aws.amazon.com/batch/latest/APIReference/API_ContainerProperties.html>`_ for further details.

Custom AMI
==========
There are several reasons why you might need to create your own `AMI (Amazon Machine Image) <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html>`_
to use in your Compute environments. Typically:

- you do not want to modify your existing Docker images and prefer to install the CLI tool on the hosting environment
1 - you do not want to modify your existing Docker images and prefer to install the CLI tool on the hosting environment

- the existing AMI (selected from the marketplace) does not have Docker installed
2 - the existing AMI (selected from the marketplace) does not have Docker installed

- you need to attach a larger storage to your EC2 instance (the default ECS instance AMI has only a 30G storage
volume which may not be enough for most data analysis pipelines)
3 - you need to attach a larger storage to your EC2 instance (the default ECS instance AMI has only a 30G storage
volume which may not be enough for most data analysis pipelines)

- you need to install additional software, not available in the Docker image used to execute the job
4 - you need to install additional software, not available in the Docker image used to execute the job

Create your custom AMI
----------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/process.rst
Expand Up @@ -1533,7 +1533,7 @@ only for a specific process e.g. mount a custom path::
}


.. warning:: This feature is not supported by :ref:`awsbatch-executor` and :ref:`k8s-executor` executors.
.. warning:: This feature is not supported by :ref:`k8s-executor` and :ref:`azurebatch-executor` executors.

.. _process-cpus:

Expand Down
Expand Up @@ -17,6 +17,8 @@

package nextflow.processor

import nextflow.util.CmdLineOptionMap

import static nextflow.processor.TaskProcessor.*

import java.nio.file.Path
Expand Down Expand Up @@ -412,9 +414,15 @@ class TaskConfig extends LazyMap implements Cloneable {
return opts instanceof CharSequence ? opts.toString() : null
}

Map getContainerOptionsMap() {
CmdLineOptionMap getContainerOptionsMap() {
def opts = get('containerOptions')
return opts instanceof Map ? opts : Collections.emptyMap()
if( opts instanceof Map )
return CmdLineOptionMap.fromMap(opts)
if( opts instanceof CharSequence )
return CmdLineHelper.parseGnuArgs(opts.toString())
if( opts!=null )
throw new IllegalArgumentException("Invalid `containerOptions` directive value: $opts [${opts.getClass().getName()}]")
return CmdLineOptionMap.emptyOption()
}

/**
Expand Down
87 changes: 62 additions & 25 deletions modules/nf-commons/src/main/nextflow/util/CmdLineHelper.groovy
Expand Up @@ -16,27 +16,36 @@
*/

package nextflow.util

import groovy.transform.CompileStatic

import java.util.regex.Pattern

/**
*
* Implement command line parsing helpers
*
* @author Paolo Di Tommaso <paolo.ditommaso@gmail.com>
*/
@CompileStatic
class CmdLineHelper {

def List<String> args
static private Pattern CLI_OPT = ~/--([a-zA-Z_-]+)(?:\W.*)?$|-([a-zA-Z])(?:\W.*)?$/

private List<String> args

CmdLineHelper( String cmdLineToBeParsed ) {
args = splitter(cmdLineToBeParsed ?: '')
}

def boolean contains(String argument) {
private boolean contains(String argument) {
return args.indexOf(argument) != -1
}

def getArg( String argument ) {
def pos = args.indexOf(argument)
private getArg( String argument ) {
int pos = args.indexOf(argument)
if( pos == -1 ) return null

def result = []
List<String> result = []
for( int i=pos+1; i<args.size(); i++ ) {
if( args[i].startsWith('-') ) {
break
Expand All @@ -55,25 +64,6 @@ class CmdLineHelper {
}
}

def asList( String argument, String splitter=',' ) {
def val = getArg(argument)
if( !val ) return val

if( val instanceof Boolean ) {
return []
}

if( val instanceof String ) {
val = [val]
}

for( int i=0; i<val.size(); i++ ) {
val[i] = val[i] ?. split(splitter)
}

return val.flatten()
}


/**
* Given a string the splitter method separate it by blank returning a list of string.
Expand Down Expand Up @@ -112,4 +102,51 @@ class CmdLineHelper {
return result.join(' ')
}

/**
* Parse command line and returns the options and their values as a map object.
*
* @param cmdline
* The command line as single string
* @return
* A map object holding the option key-value(s) associations
*/
static CmdLineOptionMap parseGnuArgs(String cmdline) {
final BLANK = ' ' as char
final result = new CmdLineOptionMap()

if( !cmdline )
return result

final tokenizer = new QuoteStringTokenizer(cmdline, BLANK);
String opt = null
String last = null
while( tokenizer.hasNext() ) {
final String token = tokenizer.next()
if( !token || token=='--')
continue
final matcher = CLI_OPT.matcher(token)
if( matcher.matches() ) {
if( opt ) {
result.addOption(opt,'true')
}
opt = matcher.group(1) ?: matcher.group(2)
}
else {
if( !opt ) {
if( !last ) continue
result.addOption(last, token)
}
else {
result.addOption(opt, token)
last = opt
opt = null
}
}
}

if( opt )
result.addOption(opt, 'true')

return result
}
}
82 changes: 82 additions & 0 deletions modules/nf-commons/src/main/nextflow/util/CmdLineOptionMap.groovy
@@ -0,0 +1,82 @@
package nextflow.util


import com.google.common.hash.Hasher
import groovy.transform.CompileStatic
import groovy.transform.EqualsAndHashCode
import groovy.transform.ToString

/**
* Holder for parsed command line options.
*
* @author Manuele Simi <manuele.simi@gmail.com>
*/
@CompileStatic
@ToString(includes = 'options', includeFields = true)
@EqualsAndHashCode(includes = 'options', includeFields = true)
class CmdLineOptionMap implements CacheFunnel {

final private Map<String, List<String>> options = new LinkedHashMap<String, List<String>>()
final private static CmdLineOptionMap EMPTY = new CmdLineOptionMap()

protected CmdLineOptionMap addOption(String key, String value) {
if ( !options.containsKey(key) )
options[key] = new ArrayList<String>(10)
options[key].add(value)
return this
}

boolean hasMultipleValues(String key) {
options.containsKey(key) ? options[key].size() > 1 : false
}

boolean hasOptions() {
options.size()
}

List<String> getValues(String key) {
return options.containsKey(key) ? options[key] : Collections.emptyList() as List<String>
}

def getFirstValue(String key) {
getFirstValueOrDefault(key, null)
}

boolean asBoolean() {
return options.size()>0
}

boolean exists(String key) {
options.containsKey(key)
}

def getFirstValueOrDefault(String key, String alternative) {
options.containsKey(key) && options[key].get(0) ? options[key].get(0) : alternative
}

static CmdLineOptionMap fromMap(final Map map) {
def optionMap = new CmdLineOptionMap()
map.each {
optionMap.addOption(it.key as String, it.value as String)
}
return optionMap
}

static CmdLineOptionMap emptyOption() {
return EMPTY
}

@Override
String toString() {
def serialized = []
options.each {
serialized << "option{${it.key}: ${it.value.each {it}}}"
}
return "[${serialized.join(', ')}]"
}

@Override
Hasher funnel(Hasher hasher, CacheHelper.HashMode mode) {
return CacheHelper.hasher(hasher, options, mode)
}
manuelesimi marked this conversation as resolved.
Show resolved Hide resolved
}
Expand Up @@ -26,7 +26,7 @@ package nextflow.util
* @author Paolo Di Tommaso <paolo.ditommaso@gmail.com>
*
*/
public class QuoteStringTokenizer implements Iterator<String>, Iterable<String> {
class QuoteStringTokenizer implements Iterator<String>, Iterable<String> {

private List<Character> chars = Arrays.<Character>asList(' ' as Character);

Expand Down