This repository was archived by the owner on Jan 11, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 1
Home
png edited this page May 11, 2020
·
3 revisions
The goals of this plugin are as follows:
- Allow Singularity to start with DMTCP included
- Allow programs running within Singularity to be checkpointed
- Allow programs that have been checkpointed within Singularity to be resumed within another instance of that same container
- Allow checkpointing to occur without modifying the base image.
There are several problems that had to be solved to support this. Primarily, DMTCP runs as a service, so it becomes necessary (under most circumstances) to run Singularity as an instance, rather than simply call exec or run. The plugin, then, must perform a few actions:
- Start a Singularity container as an instance with DMTCP executables mounted as read-only within the container somewhere
- The current implementation binds to
/.dmtcp/ - TODO: Maybe clone the executable on start that we destroy on stop?
- Safety first!
- The current implementation binds to
- Add a command that, when used, actually runs the targeted executable under DMTCP, despite mimicking the regular exec/run commands.
- This is done with the
checkpointcommand that has been added.
- This is done with the
- Add a command that can reach restart from with DMTCP
- This requires starting the instance, then starting a coordinator and attaching the old job.
Instructions for use may be found in the Instructions. If you wish to learn how it works so you, too, can add features to Singularity, check out HowItWorks.