Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 download retry #1164

Merged
merged 79 commits into from Jun 8, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
5ef6c51
Remove reference to old send mail
pditommaso Mar 31, 2019
aac90f2
Fix missing return type for Date.format method
pditommaso Mar 31, 2019
7eedacd
Update process.rst
evanfloden Apr 2, 2019
0f812e6
Add files via upload
evanfloden Apr 2, 2019
8778902
Update conf.py
evanfloden Apr 2, 2019
541d353
Improve error reporting for null publishDir
pditommaso Apr 3, 2019
62fe6e7
Adds parameter and workflow metadata to weblog payload (#1077)
sven1103 Apr 5, 2019
d93a9ac
Fix splitCsv fails to parse values containing commas #1102
pditommaso Apr 6, 2019
8fc6d33
Fixed Do not mount inputs when stageInMode == copy #1105
pditommaso Apr 6, 2019
d8f3ad8
Add warning on publish with a null var
pditommaso Apr 10, 2019
a75a975
Adds note for operators that consume channels (#1113)
sven1103 Apr 16, 2019
b329105
Fix Echo directive when Ansi log is enabled #1112 #996
pditommaso Apr 16, 2019
06e7444
Add warning message when using untested Java version (2)
pditommaso Apr 16, 2019
313ce60
Update changelog
pditommaso Apr 17, 2019
778368c
[release 19.04.0] Update timestamp and build number
pditommaso Apr 17, 2019
964a267
Remove unused build script
pditommaso Apr 18, 2019
69d0bb7
Fix Unable to list S3 bucket content #1121
pditommaso Apr 22, 2019
f8d0847
Fix LSF executor should use mem setting in lsf.config #1124
pditommaso Apr 22, 2019
0817b2b
Experimental support for gpu resources #997
pditommaso Apr 24, 2019
8a29607
Add gpu directive experimental warning #997
pditommaso Apr 24, 2019
87b6887
Update changelog
pditommaso Apr 24, 2019
0fb5095
[release 19.04.0-edge] Update timestamp and build number
pditommaso Apr 24, 2019
f18da86
Bump 19.05.0-SNAPSHOT version number
pditommaso Apr 25, 2019
91c395e
Fix Launcher should return non-zero exit when fail to setp env #1126
pditommaso Apr 25, 2019
ef5c8a5
Syntax enhancement aka DLS-2 #984
pditommaso Apr 27, 2019
48dfa60
Improve CI tests scripts
pditommaso Apr 28, 2019
af4f9f0
Add support for AWS user volumes and jobRoleArn
pditommaso Apr 29, 2019
be2165e
Provide scm credentials when fetching information about remote branch…
olifly Apr 30, 2019
f60d129
Fix Log messages don't show in the console #1129
pditommaso Apr 30, 2019
221f473
Fix Quiet cli option is not honoured
pditommaso Apr 30, 2019
4bb8764
Add to gitignore build subdirectories
pditommaso May 2, 2019
fbb93d9
Fix nextflow build timestamp json rendering
pditommaso May 2, 2019
015c33e
Update aws sdk to version 1.11.542
pditommaso May 3, 2019
565ec31
Update get started docs
pditommaso May 3, 2019
1e709a9
Sync build scripts
pditommaso May 3, 2019
7d37c96
Fix unit test execution
pditommaso May 3, 2019
63be5f4
Render last tag along with the process name #1144
pditommaso May 11, 2019
83a1d2d
Add NXF_ANSI_SUMMARY var to disable log summary
pditommaso May 11, 2019
2e81d5d
Fix method isDynamic visibilty
pditommaso May 11, 2019
513419d
Fix Env variable with blanks is not resolved correctly in containers …
pditommaso May 11, 2019
db7415b
Add Aws Batch maxParallelTransfers config #1107
pditommaso May 11, 2019
a4b6e6c
Code cleanup
pditommaso May 11, 2019
2239e0d
Print summary only the run takes > 3mins
pditommaso May 11, 2019
11d5506
Improve docs for config profiles
pditommaso May 11, 2019
5a62383
Update readme IDE version
pditommaso May 15, 2019
e251458
Update created label in timestamp
pditommaso May 15, 2019
6aeb084
Add Support for LSF per task resource reserve mode #1071
pditommaso May 15, 2019
3f3af75
Fix WebLogObserver data leak on completion #1010
pditommaso May 15, 2019
aaa2a6b
Refactor AWS cli path option
pditommaso May 16, 2019
4ac5b6e
Refactored params package structure
pditommaso Apr 22, 2019
40d1cdc
Fix Groovy deps overriding via CI_GROOVY_VERSION var
pditommaso May 19, 2019
101f80c
Fix urls and typos in readme and contributing files
pditommaso May 19, 2019
e38897b
Improvd Dsl-2 error reporting
pditommaso May 20, 2019
c3454db
Fix typo
pditommaso May 20, 2019
c30b1a5
Update changelog
pditommaso May 20, 2019
6107fbd
[release 19.05.0-egde] Update timestamp and build numbers [ci skip]
pditommaso May 20, 2019
a0a923f
Print ansi summary for exc > 60 secs
pditommaso May 20, 2019
9b30c62
Fix failing test with groovy 2.5.8
pditommaso May 23, 2019
9c10ce7
Fix comparable test with groovy 2.5.8
pditommaso May 23, 2019
395422e
Improve errorStrategy docs [ci skip]
pditommaso May 24, 2019
88f602f
retry download from s3
sivkovic May 30, 2019
52524c3
fix tests
sivkovic May 30, 2019
12134da
configurable transferAttempts, remove echo and rename retry function
sivkovic May 31, 2019
7231fe6
fix tests
sivkovic May 31, 2019
dcced15
Kuberun should honour -bg (background) option #1159
pditommaso Jun 1, 2019
bcf7edf
Fix Disable ansi logging when using kuberun command #1161
pditommaso Jun 1, 2019
1c93fdf
Fix typo in the docs
pditommaso Jun 1, 2019
8e393c6
Fix indentation typo
pditommaso Jun 1, 2019
1170c69
Fix kuberun doesn't delete config maps #1165
pditommaso Jun 1, 2019
5c4b6de
Bump 19.06.0-SNAPSHOT version number
pditommaso Jun 1, 2019
ddce7be
rename sleepBetweenAttempts to delayBetweenAttempts
sivkovic Jun 3, 2019
8e9cd1c
support blank spaces in file names
sivkovic Jun 3, 2019
07542cd
fix tests for S3Helper
sivkovic Jun 3, 2019
0075c6f
Fix Invalid response cause Google pipelines exec to crash #1163
pditommaso Jun 3, 2019
ff6f95c
fix AwsBatch tests
sivkovic Jun 4, 2019
a8031e4
Fix Invalid response cause Google pipelines exec to crash (2) #1163
pditommaso Jun 4, 2019
414ea43
code refactor and test for download retry
sivkovic Jun 5, 2019
352a89a
Merge branch 'master' into feature/download_retry
sivkovic Jun 6, 2019
4b474eb
update docs for aws cli retry parameters
sivkovic Jun 7, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/config.rst
Expand Up @@ -511,6 +511,8 @@ cliPath The path where the AWS command line tool is installe
jobRole The AWS Job Role ARN that needs to be used to execute the Batch Job.
maxParallelTransfers Max parallel upload/download transfer operations *per job* (default: ``16``).
volumes One or more container mounts. Mounts can be specified as simple e.g. `/some/path` or canonical format e.g. ``/host/path:/mount/path[:ro|rw]``. Multiple mounts can be specifid separating them with a comma or using a list object.
maxTransferAttempts The maximum number of downloads attempts from s3 (default: `1`).
delayBetweenAttempts Delay between download attempts from s3 (default `10`).
=========================== ================

.. _config-cloud:
Expand Down
Expand Up @@ -93,7 +93,10 @@ class AwsBatchFileCopyStrategy extends SimpleFileCopyStrategy {
@Override
String stageInputFile( Path path, String targetName ) {
// third param should not be escaped, because it's used in the grep match rule
"downloads+=(\"nxf_s3_download s3:/${Escape.path(path)} ${Escape.path(targetName)}\")"
def stage_cmd = opts.maxTransferAttempts > 1
? "downloads+=(\"nxf_s3_retry nxf_s3_download s3:/${Escape.path(path)} ${Escape.path(targetName)}\")"
: "downloads+=(\"nxf_s3_download s3:/${Escape.path(path)} ${Escape.path(targetName)}\")"
return stage_cmd
}
pditommaso marked this conversation as resolved.
Show resolved Hide resolved
pditommaso marked this conversation as resolved.
Show resolved Hide resolved

/**
Expand Down
Expand Up @@ -34,6 +34,10 @@ class AwsOptions {

static final public int MAX_TRANSFER = 16

static final public int MAX_TRANSFER_ATTEMPTS = 1

static final public int DEFAULT_DELAY_BETWEEN_ATTEMPTS = 10

String cliPath

String storageClass
Expand All @@ -46,6 +50,10 @@ class AwsOptions {

int maxParallelTransfers = MAX_TRANSFER

int maxTransferAttempts = MAX_TRANSFER_ATTEMPTS

int delayBetweenAttempts = DEFAULT_DELAY_BETWEEN_ATTEMPTS

/**
* The job role ARN that should be used
*/
Expand All @@ -71,6 +79,8 @@ class AwsOptions {
storageClass = session.config.navigate('aws.client.uploadStorageClass') as String
storageEncryption = session.config.navigate('aws.client.storageEncryption') as String
maxParallelTransfers = session.config.navigate('aws.batch.maxParallelTransfers', MAX_TRANSFER) as int
maxTransferAttempts = session.config.navigate('aws.batch.maxTransferAttempts', MAX_TRANSFER_ATTEMPTS) as int
delayBetweenAttempts = session.config.navigate('aws.batch.delayBetweenAttempts', DEFAULT_DELAY_BETWEEN_ATTEMPTS) as int
region = session.config.navigate('aws.region') as String
volumes = makeVols(session.config.navigate('aws.batch.volumes'))
jobRole = session.config.navigate('aws.batch.jobRole')
Expand Down Expand Up @@ -133,4 +143,4 @@ class AwsOptions {
throw new IllegalArgumentException("Not a valid `aws.batch.volumes` value: $obj [${obj.getClass().getName()}]")
}

}
}
Expand Up @@ -26,6 +26,8 @@ class S3Helper {
def storage = opts.storageClass ?: 'STANDARD'
def encryption = opts.storageEncryption ? "--sse $opts.storageEncryption " : ''
def maxConnect = opts.maxParallelTransfers ?: AwsOptions.MAX_TRANSFER
def attempts = opts.maxTransferAttempts ?: AwsOptions.MAX_TRANSFER_ATTEMPTS
def delayBetweenAttempts = opts.delayBetweenAttempts ?: AwsOptions.DEFAULT_DELAY_BETWEEN_ATTEMPTS

"""
# aws helper
Expand All @@ -43,6 +45,29 @@ class S3Helper {
unset IFS
}

nxf_s3_retry() {
local max_attempts=$attempts
local timeout=$delayBetweenAttempts
local attempt=0
local exitCode=0
while (( \$attempt < \$max_attempts ))
do
sivkovic marked this conversation as resolved.
Show resolved Hide resolved
if "\$@"
pditommaso marked this conversation as resolved.
Show resolved Hide resolved
then
return 0
else
exitCode=\$?
fi
if [[ \$exitCode == 0 ]]
then
break
fi
sleep \$timeout
attempt=\$(( attempt + 1 ))
timeout=\$(( timeout * 2 ))
done
}

nxf_s3_download() {
local source=\$1
local target=\$2
Expand All @@ -56,6 +81,7 @@ class S3Helper {
}

nxf_parallel() {
IFS=\$'\\n'
local cmd=("\$@")
local cpus=\$(nproc 2>/dev/null || < /proc/cpuinfo grep '^process' -c)
local max=\$(if (( cpus>$maxConnect )); then echo $maxConnect; else echo \$cpus; fi)
Expand All @@ -80,6 +106,7 @@ class S3Helper {
done
((\${#pid[@]}>0)) && wait \${pid[@]}
)
unset IFS
}
""".stripIndent()
}
Expand Down
Expand Up @@ -123,6 +123,29 @@ class AwsBatchFileCopyStrategyTest extends Specification {
done
unset IFS
}

nxf_s3_retry() {
local max_attempts=1
local timeout=10
local attempt=0
local exitCode=0
while (( \$attempt < \$max_attempts ))
do
if "\$@"
then
return 0
else
exitCode=\$?
fi
if [[ \$exitCode == 0 ]]
then
break
fi
sleep \$timeout
attempt=\$(( attempt + 1 ))
timeout=\$(( timeout * 2 ))
done
}

nxf_s3_download() {
local source=$1
Expand All @@ -137,6 +160,7 @@ class AwsBatchFileCopyStrategyTest extends Specification {
}

nxf_parallel() {
IFS=$'\\n\'
local cmd=("$@")
local cpus=$(nproc 2>/dev/null || < /proc/cpuinfo grep '^process' -c)
local max=$(if (( cpus>16 )); then echo 16; else echo $cpus; fi)
Expand All @@ -161,6 +185,7 @@ class AwsBatchFileCopyStrategyTest extends Specification {
done
((${#pid[@]}>0)) && wait ${pid[@]}
)
unset IFS
}
'''.stripIndent()

Expand All @@ -186,6 +211,29 @@ class AwsBatchFileCopyStrategyTest extends Specification {
done
unset IFS
}

nxf_s3_retry() {
local max_attempts=1
local timeout=10
local attempt=0
local exitCode=0
while (( \$attempt < \$max_attempts ))
do
if "\$@"
then
return 0
else
exitCode=\$?
fi
if [[ \$exitCode == 0 ]]
then
break
fi
sleep \$timeout
attempt=\$(( attempt + 1 ))
timeout=\$(( timeout * 2 ))
done
}

nxf_s3_download() {
local source=$1
Expand All @@ -200,6 +248,7 @@ class AwsBatchFileCopyStrategyTest extends Specification {
}

nxf_parallel() {
IFS=$'\\n\'
local cmd=("$@")
local cpus=$(nproc 2>/dev/null || < /proc/cpuinfo grep '^process' -c)
local max=$(if (( cpus>16 )); then echo 16; else echo $cpus; fi)
Expand All @@ -224,6 +273,7 @@ class AwsBatchFileCopyStrategyTest extends Specification {
done
((${#pid[@]}>0)) && wait ${pid[@]}
)
unset IFS
}
'''.stripIndent()
}
Expand Down