Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the way options are provided #28

Open
multimeric opened this issue May 18, 2021 · 5 comments
Open

Refactor the way options are provided #28

multimeric opened this issue May 18, 2021 · 5 comments

Comments

@multimeric
Copy link
Contributor

toil_container version: 1.1.6
Python version: 3.9.4
Operating System: Ubuntu 20.10

Description

Currently the most useful entry point to toil_container() is the call() function of ContainerJob. Unfortunately, it uses the options dictionary, which is defined at initialization time. This means that:

a) All options have to be known before the job actually runs. This is not always true, as bind mounts might be dynamic, and even the docker/singularity choice could be changed dynamically
b) We have to actually pass options every time we instantiate a job. This is verbose and not DRY.

New feature

I propose that we:

  • Extract containerJob.call out into a static function such as containerCall(). containerCall() then takes the "options" (docker/singularity, mounts etc) as an argument, and does not rely on self.options. This resolves my first point.
  • containerJob then uses containerCall() internally for backwards-compatibility (I have no intention to use containerJob myself, though)
  • Make containerCall() pull its default options directly from the Toil config object, accessed through self.jobStore.config from any given job. These can still be overridden by arguments. This system means that, if you have --singularity enabled, all jobs will automatically use singularity (by default) without having to pass options around (my second point above)
@jsmedmar
Copy link
Contributor

jsmedmar commented May 18, 2021

This sounds fair. However, if containerCall is a static function, self will not be available.

If you are not planning to use containerJob, then all we really need to do is to extract the little logic in containerJob.call and put that in a single function toil_container.containers.call for example. This function could look like this:

def call(docker=None, singularity=None, working_dir=None, volumes=None, cwd=None, env=None, check_output=False):
    call_kwargs = {}

    if singularity and docker:
        raise exceptions.UsageError("use docker or singularity, not both.")

    if singularity or docker:
        call_kwargs["check_output"] = check_output

        # used for testing only
        call_kwargs["remove_tmp_dir"] = getattr(self, "_rm_tmp_dir", True)

        if working_dir:
            call_kwargs["working_dir"] = working_dir

        if volumes:
            call_kwargs["volumes"] = volumes

        if singularity:
            call_kwargs["image"] = singularity
            call_function = singularity_call
        else:
            call_kwargs["image"] = docker
            call_function = docker_call

    elif check_output:
        call_function = subprocess.check_output

    else:
        call_function = subprocess.check_call

    errors = (exceptions.ContainerError, subprocess.CalledProcessError, OSError)

    try:
        output = call_function(**call_kwargs)
    except errors as error:  # pylint: disable=catching-non-exception
        raise exceptions.SystemCallError(error)

    try:
        return output.decode()
    except AttributeError:
        return output

Then containerJob could just call that function, and you can import container.call wherever you like, passing the arguments you want.

Am I understanding this correctly?

@multimeric
Copy link
Contributor Author

Yes that's basically exactly what I want to do (and have done, in my WIP code)! I think that would be a big improvement.

The only issue I've encountered so far is in solving point b). Toil doesn't seem to have a mechanism for adding arbitrary data to the config, and accessing it in jobs (see DataBiosphere/toil#3619). So something simple like "use singularity for container jobs" has to be passed to each job which is a pain. The closest I've got is to accessing fileStore.jobStore.config from a job's run function, but sadly this returns the job store config which is only a subset of the total toil config.

@jsmedmar
Copy link
Contributor

jsmedmar commented May 18, 2021

I think I understand what you are trying to do.

The --docker and --singularity options come from toil_container and have nothing to do with Toil see:

class ContainerArgumentParser(ToilShortArgumentParser):
"""Toil Argument Parser with options for containerized system calls."""
_ARGUMENT_GROUP_NAME = "container arguments"
def __init__(self, *args, **kwargs):
"""Add container options to parser."""
super(ContainerArgumentParser, self).__init__(*args, **kwargs)
settings = self.add_argument_group(self._ARGUMENT_GROUP_NAME)
settings.add_argument(
"--docker",
help="name/path of the docker image available in daemon",
default=None,
required=False,
)
settings.add_argument(
"--singularity",
help="name/path of the singularity image available in deamon",
default=None,
required=False,
)
settings.add_argument(
"--volumes",
help="tuples of (local path, absolute container path)",
required=False,
default=None,
action="append",
nargs=2,
)
self.add_argument(
"--help-container",
action=_ContainerHelpAction,
default=argparse.SUPPRESS,
help="show help with container arguments and exit",
)
def hide_action_group(self, action_group):
"""Return falsey if group shouldn't be showed."""
if action_group.title.startswith(self._ARGUMENT_GROUP_NAME):
return not getattr(self, SHOW_CONTGROUPS_PROPERTY, False)
return super(ContainerArgumentParser, self).hide_action_group(action_group)
def parse_args(self, args=None, namespace=None):
"""Validate parsed options."""
args = super(ContainerArgumentParser, self).parse_args(
args=args, namespace=namespace
)
images = [args.docker, args.singularity]
if all(images):
raise click.UsageError("use --singularity or --docker, not both.")
if args.volumes and not any(images):
raise click.UsageError(
"--volumes should be used only with " "--singularity or --docker."
)
if any(images):
validate_kwargs = {}
if args.volumes:
validate_kwargs["volumes"] = args.volumes
if args.workDir:
validate_kwargs["working_dir"] = args.workDir
if args.docker:
validate_kwargs["image"] = args.docker
validators.validate_docker(**validate_kwargs)
else:
validate_kwargs["image"] = args.singularity
validators.validate_singularity(**validate_kwargs)
return args

Where the toil options are added in a parent class:

Job.Runner.addToilOptions(self)

One reason you might not be seeing --singularity and --docker in self.jobStore.config is because you are not using toil_container.parsers.ContainerArgumentParser . Can you confirm you are using it? Here is an example of how to use it (you can ignore completely the ContainerJob stuff:

# whalesay.py
from toil_container import ContainerJob
from toil_container import ContainerArgumentParser


class WhaleSayJob(ContainerJob):

    def run(self, fileStore):
        """Run `cowsay` with Docker, Singularity or Subprocess."""
        msg = self.call(["cowsay", self.options.msg], check_output=True)
        fileStore.logToMaster(msg)


def main():
    parser = ContainerArgumentParser()
    parser.add_argument("-m", "--msg", default="Hello from the ocean!")
    options = parser.parse_args()
    job = WhaleSayJob(options=options)
    ContainerJob.Runner.startToil(job, options)


if __name__ == "__main__":
    main()

If after checking this and confirming that --docker and --singularity can be retrieved from self.jobStore.config, we might also be able to remove the requirement of the options attribute in containerJob. This would make our pipelines DRYer, so looking forward to your reporting.

@jsmedmar
Copy link
Contributor

jsmedmar commented May 18, 2021

update -- I just ran a quick test on this. So there is no self.jobStore. jobStore is an attribute of the fileStore argument passed to the run method. I checked, and it seems that only Toil arguments make it to fileStore.jobStore.config, any additional argument added to the argparse object doesn't make it to config.

@multimeric
Copy link
Contributor Author

Yes I noted that exact result above:

The closest I've got is to accessing fileStore.jobStore.config from a job's run function, but sadly this returns the job store config which is only a subset of the total toil config.

I'll just have to hope I get an answer back from the toil folk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants