New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

DM-16867: Move pieces of pipe_supertask to pipe_base #73

Merged

andy-slac merged 49 commits into master from tickets/DM-16867

Jan 4, 2019

Contributor

andy-slac commented Dec 31, 2018 •

edited

Modules that have been moved here:

configOverrides
graph
graphBuilder
pipeline
pipeTools
pipelineBuilder
new ABC TaskFactory was added
all existing unit tests for the above

More or less complete history was copied too, branch tickets/DM-16867-supertask-history contains that history and its HEAD corresponds to the master of pipe_supertask. All migration changes to the code were done after merging that branch into tickets/DM-16867.

TallJimbo approved these changes

View reviewed changes

Member

TallJimbo left a comment

I've left lots of little minor comments, some of which I'm sure you'll want to defer to future tickets.

My only big question for this change is whether this is the right home for the expr_parser subpackage. My understanding was that we were not using this code at all yet, and that makes me a bit wary about putting it on master now, especially since it seems unlikely to be broken by changes to other parts of the codebase, and hence not obviously useful to get into CI now. I was also hoping that these expressions would ultimately be useful in high-level Registry APIs, not Pipeline activators. What do you think about moving this subpackage to a non-master (for now) branch of daf_butler instead?

python/lsst/pipe/base/pipelineBuilder.py

@@ @@ -19,21 +19,16 @@ @@
               # the GNU General Public License along with this program.  If not,
               # see <http://www.lsstcorp.org/LegalNotices/>.
               #
+              """Module defining PipelineBuilder class and related methods.

Member

TallJimbo Jan 2, 2019

I'd be curious to get @jonathansick's opinion on this, but my inclination would be to remove this docstring, as it provides no real information and this module is itself something of an implementation detail (in that all of its public symbols are lifted into the package namespace).

Contributor Author

andy-slac Jan 3, 2019

I do agree that docstring like that add very little (or very little negative) value to the module. OTOH modules are supposed to have a summary docstring in them. I could either drop them entirely or try to expand them (but that would probably duplicate class docstring content).

python/lsst/pipe/base/pipelineBuilder.py Outdated

		@@ -55,123 +50,83 @@


		class PipelineBuilder(object):

Member

TallJimbo Jan 2, 2019

Should not inherit from object in Python 3; that's implied.

Contributor Author

andy-slac Jan 3, 2019

This is leftover of 2-to-3 migration which was never completed for pipe_supertask. It is removed in one of the latter commits.

python/lsst/pipe/base/pipelineBuilder.py Outdated

    
                  def makePipeline(self, args):

                      """Build pipeline from command line arguments.

                      Pipeline will be checked for possible inconsistencies before

                      returning, exception is raised if any problems are detected.

Member

TallJimbo Jan 2, 2019

Better to put this in a Raises section and document the exception type raised.

python/lsst/pipe/base/pipelineBuilder.py Outdated

-                  def __init__(self, taskFactory):
-                      self.taskFactory = taskFactory
+                  def pipeline(self, order_pipeline=False):

Member

TallJimbo Jan 2, 2019

Keyword arguments in this corner of the codebase should be camelCase. Given that "pipeline" is already the method name, though, I'd just called this argument ordered.

Contributor Author

andy-slac Jan 4, 2019

Does the same naming rule apply to package names - should my expr_parser package be renamed to exprParser?

Member

TallJimbo Jan 4, 2019

Yes, I believe so.

python/lsst/pipe/base/pipelineBuilder.py

-                          pipeline = orderedPipeline
-                      return pipeline
+                      orderedPipeline = pipeTools.orderPipeline(self._pipeline, self._taskFactory)

Member

TallJimbo Jan 2, 2019

Missing import for pipeTools?

Contributor Author

andy-slac Jan 3, 2019

I think it should be there, unit test would fail without that import

Member

TallJimbo Jan 4, 2019

I suspect it was fixed on a later commit and I didn't notice.

python/lsst/pipe/base/pipelineBuilder.py Outdated

    
                  def _moveTask(self, pipeline, label, new_idx):

                  def moveTask(self, label, new_idx):

Member

TallJimbo Jan 2, 2019

Is the index of a PipelineTask in a Pipeline or PipelineBuilder really user-visible? I'd think it'd be better for these objects to make the ordering deterministic (i.e. topologically sorted given DatasetType input/output relationships, maybe lexicographical to break ties). I'm not opposed to letting them having an unsorted state with an explicit validation or sort operation, but I don't think it's a good idea for them to expose an index-based interface to the Tasks.

Contributor Author

andy-slac Jan 3, 2019

The index is not visible (though we can make it visible when doing pipetask --show=pipeline), but tasks are ordered in a pipeline as Pipeline is just a list. For execution we'll sort things topologically of course, but my idea was that people may prefer some tasks to be prioritized compared to others (within topological constraints) and one way to do it is re-ordering pipeline. If re-ordering is not useful I can certainly drop it, and there is always an option for an activator to implement its own prioritization later.

python/lsst/pipe/base/taskFactory.py Outdated

+              # You should have received a copy of the GNU General Public License
+              # along with this program.  If not, see <http://www.gnu.org/licenses/>.
+              """Module definig TaskFactory interface.

Member

TallJimbo Jan 2, 2019

Typo. Also not sure this docstring adds any value.

python/lsst/pipe/base/taskFactory.py

+                      ----------
+                      taskName : `str`
+                          Name of the PipelineTask class, interpretation depends entirely on
+                          activator, e.g. it may or may not include dots.

Member

TallJimbo Jan 2, 2019

Would it be appropriate to say that the interpretation now is now entirely up to the TaskFactory and the contexts in which it is used? Or are there other places where assumptions are made about the form of these names?

Contributor Author

andy-slac Jan 3, 2019 •

edited

At this point there are no other assumption, task names are just passed from Pipeline to task factory. TaskFactory implemented in ctrl_mpexec expects either something that can be resolved w.r.t. (~~PYTHONPATH~~) --package option (e.g. module.TaskClass) or just a task class name (super-inefficient to search). Other factories may use different conventions, e.g. registry can be built which uses abstract task names instead of class names, though that would probably make pipeline definitions non-portable.

Contributor Author

andy-slac Jan 3, 2019

And I may be confusing myself here, I think that logic may be a bit different - TaskFactory is used to resolve any user-provided name to a full class name and that is passed then to a Pipeline. Though even that is an implementation detail now and is not documented anywhere.

Member

TallJimbo Jan 4, 2019

I don't know that we need to enforce this, but I can't actually think of a use case for having different TaskFactory implementations that might expect different kinds of names, and I think it would be desirable for all activators to handle task names consistently.

I gather the reason you didn't want to move the TaskFactory implementation here was because you didn't quite consider it mature (and DM-17044 backs that up). Perhaps after it does mature we should just move the implementation here and remove the ABC?

Contributor Author

andy-slac Jan 4, 2019

Well, I think I actually prefer TaskFactory to be an ABC here, at least it tells clients what minimal interface is needed for the rest of the thing to work. Implementation in ctrl_mpexec does a little bit more that is needed from this interface - it also supports functionality that is needed by pipetask script like making a list of available tasks.

python/lsst/pipe/base/graphBuilder.py

@@ @@ -85,7 +85,7 @@ def __init__(self, taskName, refs): @@
               # ------------------------
-              class GraphBuilder:
+              class GraphBuilder(object):

Member

TallJimbo Jan 2, 2019

No need to inherit from object in Python 3.

tests/test_exprParserLex.py Outdated

+                      pass
+                  def tearDown(self):
+                      pass

Member

TallJimbo Jan 2, 2019 •

edited

Better to just leave these methods out if they do nothing, I think.

Member

TallJimbo commented Jan 2, 2019

Also: I looked at each of the commits since the transfer and commented on them individually (and didn't look carefully at the mass-rename ones). So there may be comments on things you'd already fixed in later commits, and if you want me to look at any of the code you transferred as whole rather than the changes, please let me know.

andy-slac force-pushed the tickets/DM-16867 branch from b58cae0 to 4d05848 Compare

January 3, 2019 23:40

Contributor Author

andy-slac commented Jan 4, 2019

Regarding expr_parser stuff - it is used though I'm not very happy with its current "interface" (or rather lack of it). pipetask accepts user expression string which it parses using expr_parser and then makes a string again out of that parsed expression and passes it to a pre-flight. I think better approach would be to have some abstract interface for user expression defined in daf_butler, trouble is that I have no idea how that interface can look like. One possible option is to define an interface as a parsed expression tree (AST) as it is defined now in expr_paser/exprTree.py but that implies that we "freeze" the grammar for the expression. I need to think a little bit about that, maybe having expr_paser/exprTree.py in daf_butler is a good first step and the rest can be kept in pipe_base as a reusable parser for one sort of syntax.

Member

TallJimbo commented Jan 4, 2019

I think it'd make slightly more sense to just put it all in daf_butler for now - while I agree that we don't want to freeze the grammar now, I think any kind of engineering to make extending the parser easier (like trying to divide interface and implementation across packages) is premature. I am also skeptical that there is any parsing functionality we'd want in pipe_base but not in daf_butler.

Contributor Author

andy-slac commented Jan 4, 2019

OK, I'm going to move expr_parser to daf_butler then and undo some commits here, hope I will not make too much mess with that.

andy-slac added 24 commits

January 3, 2019 23:14


          Initial version of the Activator/Quantum interfaces.

f5329f0

First sketch, things are very likely to change as we start implementing
and understand more details.


          Add trivial implementation of Pipeline class.

965453e

For now Pipeline is just a list, it inherits from list and does not add
anything extra. We'll probably add few convenience method as we start
implementing client of this class.


          Small scale cleanup of the supertask modules.

ade523d

Deleted obsolete stuff. renamed couple of modules for LSST convention.
Ran pylint on everyhting, fixed few easy-to-fix things.


          Start refactoring of command line activator.

0c8a93c

Moved few utility methods into util.py. Move SuperTaskFactory into
taskFactory module and rename calss into TaskFactory. Add TaskInfo class
to be used as item type in Pipeline.


          First round of refactoring of command line "activator" (DM-10733)

d034ee4

Instead of pre-flight activator we now define a common "graph builder"
class which takes Pipeline as input and produces "execution graph" as
output. The exact structure of that graph is not clear yet, for now it
is just a list of tuples which is sufficient for command line
nvironment.

Pipeline task was revised slightly to provide more efficient
implementation of task definition when reasonable.


          Update SuperTask unit tests

2d5f29a

Updated pipeline tests for new TaskDef class. Replaced SuperTask unit
test with nnew version which has tests for defineQuanta/runQuantum
methods.


          Add unit test for graphBuilder module

4b12863


          Implement a bunch of missing features (DM-10839)

ebc540b

Add support for specifying more than one task on command line and also
option for reading pipeline definition from pickle file (and also write
it to pickle file). Add a class representing execution graph, or at
least one representation of graph that is used by command-line tool.
Implemented a bunch of --show options.


          Fix unit tests

4354afc


          Rename pipelineTools.py to pipeTools.py

b51ca37


          Improve pipeline building CLI.

efa0627

Added new options for building/modifying pipelines. Also added quantum
graph options, but they are not used yet by cmdLineFwk (will work on
qgraph next). Code that builds pipelines based on command line arguments
was factored out into a PipelineBuilder class.


          Fix Copyright lines in all Python files

4bf4d5a


          Update supertask interface and many related classes.

f115136

Still WIP, many updates to parser, SuperTask, SuperTaskConfig, unit test
and examples.


          Fix all flake8 complaints, add travis.yml

6dcef1d


          Add method pipeline2gv for GraphViz output.

4b204f9


          Add support for making pipeline DOT files

329f826


          Implement DOT rendering of QuantumGraph

020c8b0


          Move DOT code to separate module

8f3f839


          Rename parser module into cmdLineParser to reduce confusion

4bf1d4c


          Add user expr parsing to graphBuilder

cb360ee


          Initial incomplete implementation of pre-flight solver.

e2c7793

Part of a pre-flight algorithm is in SqlRegistry class in daf_butler.
Core in this package takes data generated by registry, conbines that
into Qunata and builds complete QuantumGraph.

I have added couple of example tasks which are less trivial than
examples that we had before and they also work with actual unit types
that exist in registry today. Can be used to build QGraph based on
ci_hsc registry for example.


          Move SuperTask class to pipe_base package (DM-15220)

fbef8fc

Class is actually renamed to PipelineTask. Updated all dependencies here
to use PipelineTask from pipe_base.


          Change PipelineTask.run() method arguments (DM-14823)

cf40c7d

The interface of PipelineTask class has changed, updated all
dependencies here.


          Change in pre-flight which adds special per-row data class

daaabe2

andy-slac and others added 19 commits

January 3, 2019 23:14


          Preflight API changes to support output collection

2a6ec72


          Support multiple collection names in CLI

fa62021


          Add an option to skip quanta with existing outputs

2edc4c8


          Rename PreFlightCollections into DatasetOriginInfo

2ed0e7f


          Fix registry data member access after shema update

d5ccb3c


          Always verify pipeline for cycle absence (DM-15801)

d95131d

Loops are only detected when we order pipeline and previously ordering
was only done when --order-pipeline option was set. Now we do checks
unconditionally. Added one more test for trivial loop.


          Re-implement task run stage with gen3 middleware (DM-15686)

1191f61

To run things we have to have all input DataRefs in the same collection
as output, added association of the all quantumGraph inputs with output
collection. Current assumption is that output collection is empty, will
have to extend implementation to support non-empty collections.

Updated example tasks to return data that can be saved to butler, with
that `stac` is now able to run complete chain on `ci_hsc` repo and write
something into that repo.


          Add dataset components to output collection

08480d1


          Rename camera/sensor units (DM-16312)

fbcdd9a

Data units were renamed in daf_butler schema, Camera becomes Instrument
and Sensor becomes Detector. These names are used in few places in this
package too, follow that renaming in all usage of the unit names.


          Replace lsst.log with logging.

f48e5ad

Consistently use `logging` in Python code, CmdLineFwk configures logging
output to be forwarded to lsst.log.


          Update code for new return type of get*DatasetTypes (DM-16275)

ff2a898

These methods now return DatasetTypeDescriptor instance instead of plain
DatasetType, need to change all uses of the returned maps.


          Get rid of StorageClass instantiation (DM-15887)

9e3c2c3

Change in DatasetType avoids the need to register StorageClass with
factory when StorageClass is not going to be used. Removed all
registration of example StorageClass in unit tests and task examples.


          Rename DataUnit -> Dimension.

c297fe0


          Record Init(Input/Output)s as graph properties

9f4af61


          Fix typo in method name lookup

9bf0b16


          Make PipelineTask activator handle config lists

1963a0a

Make the PipelineTask activator able to parse config parameter
overrides which are lists when the overrides are passed from the
command line.


          Add option to automatically register DatasetTypes.

a567fc1

In addition, all DatasetTypes are now always checked against those
in the Registry.


          Merge branch 'tickets/DM-16867-supertask-history-take2' into tickets/…

ac1bc6e

…DM-16867-take2


          Move supertask modules to base/ folder

6f2c1ce

andy-slac force-pushed the tickets/DM-16867 branch from 4d05848 to d02f863 Compare

January 4, 2019 06:09

andy-slac added 6 commits

January 4, 2019 11:42


          Replace supertask with base for moved modules

4b56e80


          Re-design PipelineBuilder class.

481c719

PipelineBuilder is now looking more or less like normal class with
methods rather than something executing external actions.

Few other small changes to unit tests, exclude parser stuff from flake8
testing.


          Add TaskFactory ABC

baaa15f


          Update Copyright; remove 2-to-3 migration stuff

7e7e42f


          Update docstrings for conventional format

e4ed9ae


          Update GraphBuilder for recent change in Dimension interface

3393fcb

andy-slac force-pushed the tickets/DM-16867 branch from 6e051c1 to 3393fcb Compare

January 4, 2019 17:46

andy-slac merged commit 76b0903 into master

andy-slac deleted the tickets/DM-16867 branch

July 10, 2022 02:46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment