Before you start, make sure you set up a directory to store our local container images that you assign to the environment variable `DEEPCLEAN_IMAGES`, then build the training container image:

```console
apptainer build $DEEPCLEAN_IMAGES/train.sif projects/train/apptainer.def
```

In [12]:
from deepclean.tasks import train
from deepclean.base import DeepCleanSandbox, DeepCleanTask
from inspect import getsource


def print_command(cls, sandbox_type="singularity"):
    task_name = cls.__module__ + "." + cls.__qualname__
    source = getsource(cls)
    print(f"Executing task {task_name}, defined as:\n{source}")

    cmd = f"""
    DEEPCLEAN_IMAGES=~/images/deepclean SANDBOX_TYPE={sandbox_type} law run {task_name} \
        --data-fname ~/deepclean/data/deepclean-1251335314-4097.h5 \
        --output-dir ~/deepcelean/results/law-test \
        --gpus 7 \
        --local-scheduler
    """
    print(cmd)
    return cmd

Start by trying to run the "optimal" training task using our custom DeepClean sandbox, defined as:

In [2]:
print(getsource(DeepCleanSandbox))

class DeepCleanSandbox(singularity.SingularitySandbox):
    sandbox_type = "deepclean"

    def _get_volumes(self):
        volumes = super()._get_volumes()
        if self.task and getattr(self.task, "dev", False):
            volumes[root] = "/opt/deepclean"
        return volumes



All this sandbox does is attempt to map in the current version of the code if a `DeepCleanTask` is run with the `dev=True` option. This task is defined as:

In [3]:
print(getsource(DeepCleanTask))

class DeepCleanTask(law.SandboxTask):
    dev = luigi.BoolParameter(default=False)
    gpus = luigi.Parameter(default="")

    @property
    def singularity_forward_law(self) -> bool:
        return False

    @property
    def singularity_allow_binds(self) -> bool:
        return True

    @property
    def singularity_args(self) -> Callable:
        def arg_getter():
            if self.gpus:
                return ["--nv"]
            return []
        return arg_getter

    def sandbox_env(self, env):
        if self.gpus:
            return {"CUDA_VISIBLE_DEVICES": self.gpus}
        return {}



In principle, `dev` could also be a sandbox config option, which might be cleaner, but this is moot for now because when we try to run with this:

In [4]:
cmd = print_command(train.Train, "deepclean")

Executing task deepclean.tasks.train.Train, defined as:
class Train(DeepCleanTask):
    data_fname = luigi.Parameter()
    output_dir = luigi.Parameter()
    sandbox = f"{sandbox_type}::{image_root}/train.sif"

    def get_args(self):
        return [
            "--config",
            "/opt/deepclean/projects/train/config.yaml",
            "--data.fname",
            self.data_fname,
            "--trainer.logger.save_dir",
            self.output_dir,
            "--trainer.max_epochs",
            "1"
        ]

    def run(self):
        from train.cli import main

        main(self.get_args())

    def output(self):
        return law.LocalDirectoryTarget(self.output_dir)


    DEEPCLEAN_IMAGES=~/images/deepclean SANDBOX_TYPE=deepclean law run deepclean.tasks.train.Train         --data-fname ~/deepclean/data/deepclean-1251335314-4097.h5         --output-dir ~/deepcelean/results/law-test         --gpus 7         --local-scheduler
    


In [5]:
! {cmd}

DEBUG: Checking if [0;49;32mTrain[0m([1;49;34mdev[0m=False, [1;49;34mgpus[0m=7, [1;49;34mdata_fname[0m=/home/alec.gunny/deepclean/data/deepclean-1251335314-4097.h5, [1;49;34moutput_dir[0m=/home/alec.gunny/deepcelean/results/law-test) is complete
INFO: Informed scheduler that task   Train__home_alec_gunny_False_7_62f9ee1aac   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 1826162] Worker Worker(salt=4099067170, workers=1, host=dgx1, username=alec.gunny, pid=1826162) running   [0;49;32mTrain[0m([1;49;34mdev[0m=False, [1;49;34mgpus[0m=7, [1;49;34mdata_fname[0m=/home/alec.gunny/deepclean/data/deepclean-1251335314-4097.h5, [1;49;34moutput_dir[0m=/home/alec.gunny/deepcelean/results/law-test)
ERROR: [pid 1826162] Worker Worker(salt=4099067170, workers=1, host=dgx1, username=alec.gunny, pid=1826162) failed    [0;49;32mTrain[0m([1;49;34mdev[0m=False, [1;49;34mg

So evidently implementing a custom sandbox is slightly nontrivial, since it looks like the law global Config looks for a corresponding entry and raises an error when that entry is not there. Note that we hit this error despite the fact that, following the lead for how `law` [handles this for its `contrib` packages](https://github.com/riga/law/blob/2ca2340eab8630f97ade3899e3ad4ca0befbf0a0/law/config.py#L580-L605), at the bottom of `deepclean.tasks.base` we try to insert the relevant defaults into the law Config like so:

```python
config_defaults = {
    "deepclean_sandbox": {},
    "deepclean_sandbox_env": {},
    "deepclean_sandbox_volumes": {}
}
law.util.merge_dicts(
    law.Config._default_config,
    config_defaults,
    deep=True,
    inp,
)

```

So
## QUESTION #1: What's the best way to map the local repo into our containers when a `dev=True` flag is specified? Is it with a custom sandbox, and if so how do we get such a sandbox to be supported?


But barring this, let's move back to using the vanilla `singularity` sandbox and see if we can run with thatce=True
)

In [6]:
cmd = print_command(train.Train)

Executing task deepclean.tasks.train.Train, defined as:
class Train(DeepCleanTask):
    data_fname = luigi.Parameter()
    output_dir = luigi.Parameter()
    sandbox = f"{sandbox_type}::{image_root}/train.sif"

    def get_args(self):
        return [
            "--config",
            "/opt/deepclean/projects/train/config.yaml",
            "--data.fname",
            self.data_fname,
            "--trainer.logger.save_dir",
            self.output_dir,
            "--trainer.max_epochs",
            "1"
        ]

    def run(self):
        from train.cli import main

        main(self.get_args())

    def output(self):
        return law.LocalDirectoryTarget(self.output_dir)


    DEEPCLEAN_IMAGES=~/images/deepclean SANDBOX_TYPE=singularity law run deepclean.tasks.train.Train         --data-fname ~/deepclean/data/deepclean-1251335314-4097.h5         --output-dir ~/deepcelean/results/law-test         --gpus 7         --local-scheduler
    


In [7]:
! {cmd}

DEBUG: Checking if [0;49;32mTrain[0m([1;49;34mdev[0m=False, [1;49;34mgpus[0m=7, [1;49;34mdata_fname[0m=/home/alec.gunny/deepclean/data/deepclean-1251335314-4097.h5, [1;49;34moutput_dir[0m=/home/alec.gunny/deepcelean/results/law-test) is complete
INFO: Informed scheduler that task   Train__home_alec_gunny_False_7_62f9ee1aac   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 1826173] Worker Worker(salt=4909706705, workers=1, host=dgx1, username=alec.gunny, pid=1826173) running   [0;49;32mTrain[0m([1;49;34mdev[0m=False, [1;49;34mgpus[0m=7, [1;49;34mdata_fname[0m=/home/alec.gunny/deepclean/data/deepclean-1251335314-4097.h5, [1;49;34moutput_dir[0m=/home/alec.gunny/deepcelean/results/law-test)

[0;49;35mtask   : [0m[1;49;39mTrain__home_alec_gunny_False_7_62f9ee1aac[0m
[0;49;35msandbox: [0m[1;49;39msingularity::/home/alec.gunny/images/deepclean/train.sif[0m


So evidently the `train` library isn't visible to the default python interpreter. Can we fix this with the hacky solution of just inserting our site-packages into the front of `sys.path`?

In [8]:
cmd = print_command(train.TrainWithInsert)

Executing task deepclean.tasks.train.TrainWithInsert, defined as:
class TrainWithInsert(Train):
    def run(self):
        import sys
        sys.path.insert(0, "/usr/local/lib/python3.10/site-packages")
        super().run()


    DEEPCLEAN_IMAGES=~/images/deepclean SANDBOX_TYPE=singularity law run deepclean.tasks.train.TrainWithInsert         --data-fname ~/deepclean/data/deepclean-1251335314-4097.h5         --output-dir ~/deepcelean/results/law-test         --gpus 7         --local-scheduler
    


In [9]:
! {cmd}

DEBUG: Checking if [0;49;32mTrainWithInsert[0m([1;49;34mdev[0m=False, [1;49;34mgpus[0m=7, [1;49;34mdata_fname[0m=/home/alec.gunny/deepclean/data/deepclean-1251335314-4097.h5, [1;49;34moutput_dir[0m=/home/alec.gunny/deepcelean/results/law-test) is complete
INFO: Informed scheduler that task   TrainWithInsert__home_alec_gunny_False_7_62f9ee1aac   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 1826503] Worker Worker(salt=670383571, workers=1, host=dgx1, username=alec.gunny, pid=1826503) running   [0;49;32mTrainWithInsert[0m([1;49;34mdev[0m=False, [1;49;34mgpus[0m=7, [1;49;34mdata_fname[0m=/home/alec.gunny/deepclean/data/deepclean-1251335314-4097.h5, [1;49;34moutput_dir[0m=/home/alec.gunny/deepcelean/results/law-test)

[0;49;35mtask   : [0m[1;49;39mTrainWithInsert__home_alec_gunny_False_7_62f9ee1aac[0m
[0;49;35msandbox: [0m[1;49;39msingularity::/home/ale

For some reason this doesn't work either. So fine, we'll run the training script as a subprocess called with the full path to our desired interpreter:

In [10]:
cmd = print_command(train.TrainWithSubprocess)

Executing task deepclean.tasks.train.TrainWithSubprocess, defined as:
class TrainWithSubprocess(Train):
    def run(self):
        import subprocess, shlex

        cmd = [
            "/usr/local/bin/python",
            "/opt/deepclean/projects/train/train"
        ]
        cmd += self.get_args()

        try:
            proc = subprocess.run(
                cmd, capture_output=True, check=True, text=True
            )
        except subprocess.CalledProcessError as e:
            raise RuntimeError(
                "Command '{}' failed with return code {} "
                "and stderr:\n{}".format(
                    shlex.join(e.cmd), e.returncode, e.stderr
                )
            ) from None
        print(proc.stdout)


    DEEPCLEAN_IMAGES=~/images/deepclean SANDBOX_TYPE=singularity law run deepclean.tasks.train.TrainWithSubprocess         --data-fname ~/deepclean/data/deepclean-1251335314-4097.h5         --output-dir ~/deepcelean/results/law-test         --gpus 7      

In [11]:
! {cmd}

DEBUG: Checking if [0;49;32mTrainWithSubprocess[0m([1;49;34mdev[0m=False, [1;49;34mgpus[0m=7, [1;49;34mdata_fname[0m=/home/alec.gunny/deepclean/data/deepclean-1251335314-4097.h5, [1;49;34moutput_dir[0m=/home/alec.gunny/deepcelean/results/law-test) is complete
INFO: Informed scheduler that task   TrainWithSubprocess__home_alec_gunny_False_7_62f9ee1aac   has status   PENDING
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Pending tasks: 1
INFO: [pid 1826895] Worker Worker(salt=2989404497, workers=1, host=dgx1, username=alec.gunny, pid=1826895) running   [0;49;32mTrainWithSubprocess[0m([1;49;34mdev[0m=False, [1;49;34mgpus[0m=7, [1;49;34mdata_fname[0m=/home/alec.gunny/deepclean/data/deepclean-1251335314-4097.h5, [1;49;34moutput_dir[0m=/home/alec.gunny/deepcelean/results/law-test)

[0;49;35mtask   : [0m[1;49;39mTrainWithSubprocess__home_alec_gunny_False_7_62f9ee1aac[0m
[0;49;35msandbox: [0m[1;49;39msingu

I'm running into this error above because I'm currently using GPU 0 on this machine for a different project, and its memory is fully occupied. So evidently our override of `sandbox_env` to only expose GPU 7 didn't work because our training task didn't get scheduled there. But at least we know the library got installed properly and has GPU support!

## Question #2: How do we leverage the container's python interpreter inside of `run` without having to resort to subprocesses?
## Question #3: What's the best way to set up GPU isolation in this context?