Improvements to Step create/run methods #9

eslavich · 2021-01-20T15:27:42Z

This issue is for discussion of potential improvements to the Python interface for creating and running steps and pipelines. First a summary of the current state of affairs:

Methods for creating steps

Column key:

Selects subclass: Does the method attempt to select the correct Step subclass based on arguments or config file contents?
Runs step: Does the method run the step before returning?
CRDS pars: Does the method incorporate parameters from CRDS pars files?
Config input: Does the method accept a path to a user config?
Override parameters: Does the method support overriding parameters on an individual basis?
Override style: If parameter overrides are supported, are they passed as standard keyword arguments or CLI-style arguments?

Method	Selects subclass	Runs step	CRDS pars	Config input	Override parameters	Override style	Notes
__init__	✗	✗	✗	✗	✓	keyword	Accepts a `config_file` argument but does not apply parameters from it.
call	✗	✓	✓	✓	✓	keyword	User config file passed as keyword argument. Config file's `class` field ignored.
from_cmdline	✓	✓	✓	✓	✓	CLI	Selects Step subclass based on config `class` field or class name argument.
from_config_file	✓	✗	✗	✓	✗		Selects Step subclass based on config `class` field.
from_config_section	✗	✗	✗	✓	✗		Probably not intended to be part of the public API.

Methods for running steps

Column key:

Creates step: Does the method create the Step instance before running it?

Method	Creates step	Notes
__call__	✗	Alias for `run`.
call	✓	Python API only (not used by CLI code).
from_cmdline	✓	The `strun` script is a thin wrapper around this method.
process	✗	Subclass implementation method. Not intended to be called directly by general users.
run	✗	Eventually called by any method that needs to run the step.

Suggestions for improvement

Eliminate run-and-call methods. Little value add, and presents confusing interface where step creation arguments and step run arguments are blended together into one method signature.
Move methods that return instances whose classes may be different from the one the method was invoked on. This is confusing and better handled with module methods.
Remove from_cmdline and instead call the corresponding cmdline module method directly.
Rename process to make clear that it shouldn't be invoked by users. Maybe a name with a leading underscore, or something like run_impl.
Remove one of run or __call__ so that usage is uniform.
Rename config_file argument to __init__ to something like working_dir, to make clear that the config is not loaded.
Change CLI code to be a relatively thin wrapper around the Python interface (instead of parallel implementations like the current from_cmdline vs call). This will ensure consistency between the two interfaces. There is already some divergence between call and from_cmdline, e.g. the _pars_model atttribute is not set by call, and call doesn't know how to select the Step subclass based on a config.
Pass around step parameters as a separate dict argument instead of **kwargs. This provides a clear separation between the parameters and other method arguments.

Possible new interface

Step.__init__(self, params=None, working_dir=None, ...): Parameters are passed to initializer in a dict.
Step.call_impl(self, *args): Step subclass implementation.
Step.__call__(self, *args): Wrapper around call_impl that handles common setup and teardown.
stpipe.create_step(*, step_class=None, config_path=None, crds_params_enabled=True, dataset=None, params=None, working_dir=None, ...): Convenience method for creating steps. At least one of step_class or config_path is required to determine the step class. dataset is required if crds_params_enabled is True.
stpipe.cmdline.from_cmdline(args): Method that parses CLI arguments. Ends in a call to stpipe.create_step.

Run step from CLI

from stpipe.cmdline import from_cmdline

# stpipe step run config.cfg dataset.asdf --foo=42
step, inputs = from_cmdline(args)
step(*inputs)

Run step from Python

from stpipe import create_step

step = create_step(config_path="config.cfg", dataset="dataset.asdf", params={"foo": 42})
step("dataset.asdf")

or

from stpipe import create_step

step = create_step(config_path="config.asdf", dataset="dataset.asdf")
step.some_param = "some_value"
step()

Developing a step

from stpipe import Step

class MyStep(Step):
    def call_impl(self, dataset):
        print(f"Value of foo: {self.foo}")

step = MyStep(params={"foo": 42})
step("dataset.asdf")

The text was updated successfully, but these errors were encountered:

stscijgbot-jp mentioned this issue Jan 20, 2021

stpipe.Step methods run() and call() don't treat kwargs the same way spacetelescope/jwst#1098

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements to Step create/run methods #9

Improvements to Step create/run methods #9

eslavich commented Jan 20, 2021 •

edited

Loading

Improvements to Step create/run methods #9

Improvements to Step create/run methods #9

Comments

eslavich commented Jan 20, 2021 • edited Loading

Methods for creating steps

Methods for running steps

Suggestions for improvement

Possible new interface

Run step from CLI

Run step from Python

Developing a step

eslavich commented Jan 20, 2021 •

edited

Loading