Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

s3 download retry #1164

Merged
merged 79 commits into from Jun 8, 2019
Merged

Conversation

sivkovic
Copy link
Contributor

Hi Paolo,
Since the 26d3755 didn't solve our problem #1107 I implemented a retry mechanism when downloading files from s3. We tested it with 200 TN samples (15536 processes). This way we are able to use 'optimal' for instance types for our compute environment and having a single queue for all processes. Also what we detected as really beneficial in this kind of setup is that wait time for instances is shorter and cost is lower since we are not limiting AWS Batch scheduler to just some instance types. If you think this is beneficial I can also add configurable number of retries or any cleanup you think is needed.
Best,

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pull request. One thing that's not completely like is that the retry is triggered for any error condition, for example when the file does not exist, but I guess the exit code of awscli does not make such kind of difference.

Anyhow, I would propose to keep as an opt-in feature controlled by the number of attempts that should be > 0 (default: 0)

Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a minor comment. If you sign-off the DCO, I think it could be merged.

@pditommaso
Copy link
Member

The PR look ok, thanks a lot. To proceed with the merge I need a commit with your sign-off.

@sivkovic sivkovic force-pushed the feature/download_retry branch 4 times, most recently from 0750024 to a75683b Compare June 6, 2019 21:11
pditommaso and others added 22 commits June 6, 2019 17:13
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
)

Signed-off-by: Sven Fillinger <sven.fillinger@qbic.uni-tuebingen.de>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
When input files are copied in the task work dir, the
input files should not be mounted in the container environment.

Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Sven Fillinger <sven.fillinger@qbic.uni-tuebingen.de>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
…-io#996

This commit enables the process echo when the ansi logging is enabled
that was turned off in a previous commit.

It also redirected the console output to the Ansi renderer for for the
`print/ln` and `view` operators when the ansi logging is enabled.

Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
The unit used by LSF batch scheduler for memory limits depends
on the setting of the attribute LSF_UNIT_FOR_LIMITS defined in
the lsf.config file.

This commit adds to the LSF executor the ability to fetch such
setting and properly apply when a job is submitted for execution.

Moreover if fixes the default mem unit to KB (instead of MB)
when the above setting is not defined.

Solves nextflow-io#1124. See also PR nextflow-io#1035.

Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
sivkovic and others added 15 commits June 6, 2019 17:13
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
This commit disables the rich ansi logging feature when
using the `kuberun` command since the output is not rendered
correctly. Likely this is due to the wrong encoding of ANSI
escape code when written/read over the http api.

Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
This commit implements the deletion of the K8s configMap
create by the `kuberun` command to propagate the nextflow
configuration to the K8s cluster.

Deletion take place only when the pod terminated sucessfully.
This policy can be overriden by the `k8s.cleanup=false|true`
option.

Bonus: the kuberun command now returns the exit status of the
driver nextflow pod.

Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
@sivkovic sivkovic force-pushed the feature/download_retry branch 2 times, most recently from 0750024 to 414ea43 Compare June 6, 2019 21:14
@pditommaso
Copy link
Member

pditommaso commented Jun 7, 2019

Thanks. One last thing, could you please add an entry in the documentation regarding the new setting options. See config.rst line 512.

Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
@pditommaso pditommaso changed the base branch from master to testing June 8, 2019 16:59
@pditommaso pditommaso merged commit 1abac2f into nextflow-io:testing Jun 8, 2019
@pditommaso
Copy link
Member

OK, merged on testing branch. Thanks a lot for your contribution!

pditommaso added a commit that referenced this pull request Jun 8, 2019
Use duration unit to define S3 delayBetweenAttemps config
options.
pditommaso pushed a commit that referenced this pull request Jun 9, 2019
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
pditommaso added a commit that referenced this pull request Jun 9, 2019
Use duration unit to define S3 delayBetweenAttemps config
options.
pditommaso pushed a commit that referenced this pull request Jul 27, 2019
Signed-off-by: Ivkovic <sinisa.ivkovic@gmail.com>
pditommaso added a commit that referenced this pull request Jul 27, 2019
Use duration unit to define S3 delayBetweenAttemps config
options.
@pditommaso pditommaso added this to the v19.07.0 milestone Jul 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants