New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-20205: Update to Final PipelineTasks API #98
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With all travel I managed to look at just one file today, will continue reviewing later but want to checkpoint what has been done so far.
# You should have received a copy of the GNU General Public License | ||
# along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
"""Module defining a butler like object specified to a specific quantum. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specified -> specialized?
in practice is that the only gets and puts that this class allows | ||
are DatasetRefs that are contained in the quantum. | ||
|
||
In the future this class will also be used to record providence on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we had providence as a part of butler, but I guess you mean provenance? 👼
butler : `lsst.daf.butler.Butler` | ||
Butler object from/to which datasets will be get/put | ||
quantum : `lsst.daf.butler.core.Quantum` | ||
Quantum object that describes the data-sets which will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data-sets -> datasets?
|
||
Parameters | ||
---------- | ||
dataset : `InputQuantizedConnection` or `list` of `lsst.daf.butler.DatasetRef` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our latest convention is to write it as
`list` [`lsst.daf.butler.DatasetRef`]
Raises | ||
------ | ||
ValueError | ||
If a `DatasetRef` is passed to get that is not defined in the quantum object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs module name for DatasetRef
. I'd add a ~
for each of those too to reduce verbosity in generated docs.
all the inputs of a quantum, a list of `lsst.daf.butler.DatasetRef`, or | ||
a single `lsst.daf.butler.DatasetRef`. The function will get and return | ||
the corresponding datasets from the butler. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns
section would be useful too.
elif isinstance(dataset, DatasetRef) or isinstance(dataset, DeferredDatasetRef): | ||
return self._get(dataset) | ||
else: | ||
raise TypeError("Dataset argument is not a type that can be used to get") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of the isinstance
-based switch, it feels a bit fragile for my taste. But I'm not sure anything can be done to avoid it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I dont think so if we are going to try and preserve this sort of overload. That is until we get to python 3.8. If (when?) we can take that as the base line, there is a nice built in single dispatch for methods. That will make the code cleaner and more extensible, but fundamentally we want this method overload based on the type of the input.
if isinstance(dataset, OutputQuantizedConnection): | ||
if not isinstance(values, Struct): | ||
raise ValueError("dataset is a OutputQuantizedConnection, a Struct with corresponding" | ||
"attributes must be passed as the values to put") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a space before "attributes".
for i, ref in enumerate(refs): | ||
self._put(valuesAttribute[i], ref) | ||
else: | ||
self._put(valuesAttribute, refs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be slightly simplified if _put()
supported lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated back and forth on this, if it was better to have a simple function that did one thing, or do duplicate the handling the lists and single values. I also debated having an intermediate function to handle lists. Ultimately I decided on a simpler _put, but I don't really have strong feelings.
if inout is RefDirection.OUT: | ||
for r in ref: | ||
if (r.datasetType, r.dataId) not in self.allOutputs: | ||
raise ValueError("DatasetRef is not part of the Quantum being processed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is repetition in this method, I think you can simplify it if you just pass self.allInputs
or self.allOutputs
as a parameter instead of enum (and enum will not even have to exist in that case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good point, I changed this so many times I could not see the forest for the trees
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked at few more files, not finished reviewing yet. Will try to finish it tomorrow if nothing urgent happens.
python/lsst/pipe/base/config.py
Outdated
"InitOutputDatasetConfig", "InitOutputDatasetField", | ||
"ResourceConfig", "QuantumConfig", "PipelineTaskConfig"] | ||
__all__ = ["ResourceConfig", "PipelineTaskConfig", | ||
"PipelineTaskConnections"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PipelineTaskConnections
lives in connections
, do you want to export it from this module too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, was just something left over from before I moved it to its own file, good catch
python/lsst/pipe/base/config.py
Outdated
Possibly raises a TypeError if attempting to create a factory function | ||
from an incompatible type | ||
This metaclass ensures a `PipelineTaskConnections` class is specified in the class construction | ||
parameters with a parameter name of pipelineConnections. Using The supplied connection class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -> the
python/lsst/pipe/base/config.py
Outdated
this metaclass constructs a `lsst.pex.config.Config` instance which can be used to configure the | ||
connections class. This config is added to the config class under declaration with the name | ||
"connections" used as an identifier. The connections config also has a reference to the connections | ||
class used in its construction with associated with an atttribute named "connectionsClass". Finally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extraneous "with". I'd put connectionsClass
in backticks here.
python/lsst/pipe/base/config.py
Outdated
connections class. This config is added to the config class under declaration with the name | ||
"connections" used as an identifier. The connections config also has a reference to the connections | ||
class used in its construction with associated with an atttribute named "connectionsClass". Finally | ||
The newly constructed config class (not an instance of it) is assigned to the Config class under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The -> the
python/lsst/pipe/base/config.py
Outdated
"connections" used as an identifier. The connections config also has a reference to the connections | ||
class used in its construction with associated with an atttribute named "connectionsClass". Finally | ||
The newly constructed config class (not an instance of it) is assigned to the Config class under | ||
construction with the attribute name "ConnectionsConfigClass". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectionsConfigClass
in backticks here too.
WE have a limit on docstring/comment line length at 79 characters, I think this docstring exceeds it.
python/lsst/pipe/base/connections.py
Outdated
|
||
Parameters | ||
---------- | ||
quantum : `lsst.daf.butler.core.Quantum` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think lsst.daf.butler.Quantum
should be used here.
python/lsst/pipe/base/pipeline.py
Outdated
"""Extract and classify the dataset types from a single `PipelineTask`. | ||
|
||
Parameters | ||
---------- | ||
taskClass: `type` | ||
A concrete `PipelineTask` subclass. | ||
config: `PipelineTaskConfig` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config
was removed too, add description for connectionsInstance
parameter
connection fields describing input dataset types. Argument values will | ||
be data objects retrieved from data butler. If a dataset type is | ||
configured with ``multiple`` field set to ``True`` then the argument | ||
value will be a list of objects, otherwise it will be a single objects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single objects - > single object
tests/test_connections.py
Outdated
# You should have received a copy of the GNU General Public License | ||
# along with this program. If not, see <http://www.gnu.org/licenses/>. | ||
|
||
"""Simple unit test for ResourceConfig. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ResourceConfig -> connections?
tests/test_connections.py
Outdated
import lsst.pipe.base as pipeBase | ||
|
||
|
||
class TestConncetionsClass(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conncetions -> Connections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks OK (if my understanding of how it work is correct), though a bit too much meta-magic for my taste. I think it probably needs good user manual or maybe more code examples in docstrings.
python/lsst/pipe/base/connections.py
Outdated
try: | ||
dct['dimensions'] = set(kwargs['dimensions']) | ||
except TypeError: | ||
raise dimensionsValueError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise ... from ...
may be better here for less confusing message. from None
will suppress TypeError traceback if you prefer.
python/lsst/pipe/base/connections.py
Outdated
lsst.pipe.base.connectionTypes and are listed as follows: | ||
|
||
* InitInput - Defines connections in a quantum graph which are used as inputs to the __init__ function | ||
of the PipelineTask corresponding to this class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines should not be indented or they will become indented in sphinx output too. I think you want this:
* InitInput - Defines connections in a quantum graph which are used as inputs to the __init__ function
of the PipelineTask corresponding to this class
(but also properly wrapped at 79 characters).
You can also try try definition list syntax: http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#definition-lists
python/lsst/pipe/base/connections.py
Outdated
match dimensions that exist in the butler registry which will be used in executing the corresponding | ||
`PipelineTask`. | ||
|
||
The second parameter is labeled defaultTemplates and is conditionally optional. The name attributes of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put defaultTemplates in double backticks.
python/lsst/pipe/base/connections.py
Outdated
The second parameter is labeled defaultTemplates and is conditionally optional. The name attributes of | ||
connections can be specified as python format strings, with named format arguments. If any of the | ||
name parameters on connections defined in a `PipelineTaskConnections` class contain a template, then | ||
a default template value must be specified in the defaultTemplate argument. This is done by passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultTemplate -> defaultTemplates?
python/lsst/pipe/base/connections.py
Outdated
"""PipelineTaskConnections is a class used to declare desired IO when a PipelineTask is run by an | ||
activator | ||
|
||
PipelineTaskConnection classes are created by declaring class attributes of types defined in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This long description should probably be moved into Notes section after Parameters.
python/lsst/pipe/base/connections.py
Outdated
Once a `PipelineTaskConnections` class is created, it is used in the creation of a | ||
`PipelineTaskConfig`. This is further documented in the documentation of `PipelineTaskConfig`. For the | ||
purposes of this documentation, the relevant information is that the config class allows configuration | ||
of connection names by users when running a pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small example of defining connections class and task config class would be super-helpful here.
I am working on as a separate body of work (as to not hold this up) documentation that will go in the package documentation talking about how this works, and including a long tutorial on how this all gets used to make PipelineTasks. |
2056f2b
to
532716c
Compare
This commit overhauls the framework for writing and using PipelineTasks. It introduces a new class for defining all the connections a PipelineTask expects during execution. It is connected to a PipelineTaskConfig at declaration at which point relevant configuration are added to the Config class. The connections class allows for better introspection from the execution framework, and simplifies the implementation of a PipelineTask.
532716c
to
a0fa78a
Compare
python/lsst/pipe/base/config.py
Outdated
doc=docString, | ||
default=default) | ||
# add a reference to the connection class used to create this sub config | ||
configConnectionsNamespace['connectionsClass'] = connectionsClass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the right capitalization for variables that refer to types is never very clear, "connectionsClass" here is sufficiently similar to "Task.ConfigClass" in usage and context that I think it's a pretty serious problem to capitalize it differently.
No description provided.