Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support docker container option --shm-size, per process #2282

Closed
maxbates opened this issue Aug 27, 2021 · 4 comments
Closed

Support docker container option --shm-size, per process #2282

maxbates opened this issue Aug 27, 2021 · 4 comments

Comments

@maxbates
Copy link

maxbates commented Aug 27, 2021

New feature

Some programs which use large file-based databases, like jackhmmer, see a significant performance increase when the database can be loaded into memory using a shm (e.g. /dev/shm) or tmpfs volume.

I would like to specify the size of a ramdisk, dependent on the memory allocated to the container. E.g. If I allocate 64Gb to the container, I would pass --shm-size=64g.

The size needs to be dynamic, so it matches the memory available to the container. Mounting /dev/shm is problematic in this case, because I do not want processes to compete for the host machine's shared memory (i.e. when multiple containers are scheduled onto the same instance).

on AWS, because nextflow creates job definitions and does not support the process directive containerOptions, I do not think it is possible to provision a dynamically sized shared memory volume without manually creating a job definition (which I would like to avoid).

By default, docker allocates 64Mb to /dev/shm, but can be configured using --shm-size (ref)[https://docs.docker.com/engine/reference/run/]. The size cannot be changed from within the container, as it requires remounting the volume.

AWS supports specifying sharedMemorySize in the job definition, which simply passes through to docker's --shm-size

Usage scenario

The use of a ramdisk in alphafold's colab notebook for running jackhmmer can be see here (creating /tmp/ramdisk). There is a similar recommendation on github for speeding up hhblits.

Suggest implementation

Add a process directory shmSize and update AWS Batch plugin's newSubmitRequest(TaskRun task) ref, and the local container executor.

@pditommaso
Copy link
Member

Think what could be done here is parsing the containerOptions and map selected option to the corresponding Batch API such as --shm-size, --ulimit, etc

@maxbates
Copy link
Author

maxbates commented Aug 30, 2021

That would absolutely work for me!

Nice to have: is the directive is dynamic (i.e. it can support $task.memory), if we scale-up with each attempt?

@pditommaso
Copy link
Member

Would this require also the use of --tmpfs Docker option

@pditommaso
Copy link
Member

Solved by #2471

@pditommaso pditommaso added this to the 22.04.0 milestone Dec 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants