Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to Step create/run methods #9

Open
eslavich opened this issue Jan 20, 2021 · 0 comments
Open

Improvements to Step create/run methods #9

eslavich opened this issue Jan 20, 2021 · 0 comments

Comments

@eslavich
Copy link
Contributor

eslavich commented Jan 20, 2021

This issue is for discussion of potential improvements to the Python interface for creating and running steps and pipelines. First a summary of the current state of affairs:

Methods for creating steps

Column key:

  • Selects subclass: Does the method attempt to select the correct Step subclass based on arguments or config file contents?
  • Runs step: Does the method run the step before returning?
  • CRDS pars: Does the method incorporate parameters from CRDS pars files?
  • Config input: Does the method accept a path to a user config?
  • Override parameters: Does the method support overriding parameters on an individual basis?
  • Override style: If parameter overrides are supported, are they passed as standard keyword arguments or CLI-style arguments?
Method Selects subclass Runs step CRDS pars Config input Override parameters Override style Notes
__init__ keyword Accepts a config_file argument but does not apply parameters from it.
call keyword User config file passed as keyword argument. Config file's class field ignored.
from_cmdline CLI Selects Step subclass based on config class field or class name argument.
from_config_file Selects Step subclass based on config class field.
from_config_section Probably not intended to be part of the public API.

Methods for running steps

Column key:

  • Creates step: Does the method create the Step instance before running it?
Method Creates step Notes
__call__ Alias for run.
call Python API only (not used by CLI code).
from_cmdline The strun script is a thin wrapper around this method.
process Subclass implementation method. Not intended to be called directly by general users.
run Eventually called by any method that needs to run the step.

Suggestions for improvement

  • Eliminate run-and-call methods. Little value add, and presents confusing interface where step creation arguments and step run arguments are blended together into one method signature.
  • Move methods that return instances whose classes may be different from the one the method was invoked on. This is confusing and better handled with module methods.
  • Remove from_cmdline and instead call the corresponding cmdline module method directly.
  • Rename process to make clear that it shouldn't be invoked by users. Maybe a name with a leading underscore, or something like run_impl.
  • Remove one of run or __call__ so that usage is uniform.
  • Rename config_file argument to __init__ to something like working_dir, to make clear that the config is not loaded.
  • Change CLI code to be a relatively thin wrapper around the Python interface (instead of parallel implementations like the current from_cmdline vs call). This will ensure consistency between the two interfaces. There is already some divergence between call and from_cmdline, e.g. the _pars_model atttribute is not set by call, and call doesn't know how to select the Step subclass based on a config.
  • Pass around step parameters as a separate dict argument instead of **kwargs. This provides a clear separation between the parameters and other method arguments.

Possible new interface

  • Step.__init__(self, params=None, working_dir=None, ...): Parameters are passed to initializer in a dict.
  • Step.call_impl(self, *args): Step subclass implementation.
  • Step.__call__(self, *args): Wrapper around call_impl that handles common setup and teardown.
  • stpipe.create_step(*, step_class=None, config_path=None, crds_params_enabled=True, dataset=None, params=None, working_dir=None, ...): Convenience method for creating steps. At least one of step_class or config_path is required to determine the step class. dataset is required if crds_params_enabled is True.
  • stpipe.cmdline.from_cmdline(args): Method that parses CLI arguments. Ends in a call to stpipe.create_step.

Run step from CLI

from stpipe.cmdline import from_cmdline

# stpipe step run config.cfg dataset.asdf --foo=42
step, inputs = from_cmdline(args)
step(*inputs)

Run step from Python

from stpipe import create_step

step = create_step(config_path="config.cfg", dataset="dataset.asdf", params={"foo": 42})
step("dataset.asdf")

or

from stpipe import create_step

step = create_step(config_path="config.asdf", dataset="dataset.asdf")
step.some_param = "some_value"
step()

Developing a step

from stpipe import Step

class MyStep(Step):
    def call_impl(self, dataset):
        print(f"Value of foo: {self.foo}")

step = MyStep(params={"foo": 42})
step("dataset.asdf")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant