Skip to content
This repository was archived by the owner on Jan 11, 2023. It is now read-only.
png edited this page May 11, 2020 · 3 revisions

Welcome to the dmtcp-singularity-plugin wiki!

The goals of this plugin are as follows:

  1. Allow Singularity to start with DMTCP included
  2. Allow programs running within Singularity to be checkpointed
  3. Allow programs that have been checkpointed within Singularity to be resumed within another instance of that same container
  4. Allow checkpointing to occur without modifying the base image.

There are several problems that had to be solved to support this. Primarily, DMTCP runs as a service, so it becomes necessary (under most circumstances) to run Singularity as an instance, rather than simply call exec or run. The plugin, then, must perform a few actions:

  • Start a Singularity container as an instance with DMTCP executables mounted as read-only within the container somewhere
    • The current implementation binds to /.dmtcp/
    • TODO: Maybe clone the executable on start that we destroy on stop?
      • Safety first!
  • Add a command that, when used, actually runs the targeted executable under DMTCP, despite mimicking the regular exec/run commands.
    • This is done with the checkpoint command that has been added.
  • Add a command that can reach restart from with DMTCP
    • This requires starting the instance, then starting a coordinator and attaching the old job.

Instructions for use may be found in the Instructions. If you wish to learn how it works so you, too, can add features to Singularity, check out HowItWorks.

Clone this wiki locally