Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for running inside Docker containers #56

Merged
merged 21 commits into from
Aug 31, 2019
Merged

Add support for running inside Docker containers #56

merged 21 commits into from
Aug 31, 2019

Conversation

marmistrz
Copy link
Contributor

@marmistrz marmistrz commented Jul 10, 2019

This change takes advantage of the Docker capabilities present in golem-unlimited. Closes #12. Closes #2. Closes #29.

Changes:

  • a new, slimmed down Docker image has been introduced and pushed to the Docker Hub.
  • the directory structure has been sanitized (/app/, /output/, /input/) and a default location for the output directory (/output) has been added
  • we no longer require the administrators to take care of the tools that need to be present on the provider machine or the inter-node connectivity - this is now automated thanks to use of Docker.

TODO:

  • properly set the user (currently we run as root) [needs-gu-changes]
  • set the working directory to output [needs-gu-changes]
  • set the user to mpirun [needs-gu-changes]
  • fix the output directory in make/cmake (they're different, I've commented out the mv_cmd)
  • automatically start/stop the container [needs-gu-changes] (now it's a mess)
  • do proper separation of input, output & app data. [partially done]
  • simplify the tmp-related code
  • check what happens if an executable doesn't exist inside the container?
  • add extra logging to gu-client
  • working deploy+exec
  • deploy the ssh keys
  • test on single node
  • test on multiple nodes, with different OSes
  • update input&output
  • check if we're still affected by the mca btl hack (nope)
  • upload input to all the nodes
  • add a default for output
  • enable SYS_PTRACE capability

TODO-GU:

GU blockers:

Deferred issues: (follow-up issues will be created)

  • for efficient shared memory in OpenMPI we may need bigger /dev/shm, Docker limits it to 64M. Probably using --tmpfs /dev/shm should be enough to guarantee unlimited /dev/shm
  • handle GU processing error at one place.

@marmistrz marmistrz changed the title Add support for running inside Docker containers WIP: Add support for running inside Docker containers Jul 10, 2019
@marmistrz
Copy link
Contributor Author

The PR is basically ready, we're only waiting for the GU blockers.

@marmistrz
Copy link
Contributor Author

We'll probably need to have --cap-add=SYS_PTRACE for MPI to work under Docker efficiently. open-mpi/ompi#4948

@marmistrz marmistrz changed the title WIP: Add support for running inside Docker containers Add support for running inside Docker containers Aug 30, 2019
One of the indirect dependencies, flate2, no longer builds on 1.33
@marmistrz
Copy link
Contributor Author

Because this PR was pending for a very long time, I'll merge this PR without a review as an exception. @kubkon, please take a look when you have a while, I'll address the issues (if any) in a subsequent PR.

@marmistrz marmistrz merged commit 3a0cc17 into master Aug 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up artifacts in /tmp Docker integration Automatically detect common subnet
1 participant