Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-20205: Update to Final PipelineTasks API #98

Merged
merged 4 commits into from Aug 30, 2019
Merged

Conversation

natelust
Copy link
Contributor

No description provided.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With all travel I managed to look at just one file today, will continue reviewing later but want to checkpoint what has been done so far.

# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

"""Module defining a butler like object specified to a specific quantum.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specified -> specialized?

in practice is that the only gets and puts that this class allows
are DatasetRefs that are contained in the quantum.

In the future this class will also be used to record providence on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we had providence as a part of butler, but I guess you mean provenance? 👼

butler : `lsst.daf.butler.Butler`
Butler object from/to which datasets will be get/put
quantum : `lsst.daf.butler.core.Quantum`
Quantum object that describes the data-sets which will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data-sets -> datasets?


Parameters
----------
dataset : `InputQuantizedConnection` or `list` of `lsst.daf.butler.DatasetRef`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our latest convention is to write it as

`list` [`lsst.daf.butler.DatasetRef`]

Raises
------
ValueError
If a `DatasetRef` is passed to get that is not defined in the quantum object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs module name for DatasetRef. I'd add a ~ for each of those too to reduce verbosity in generated docs.

all the inputs of a quantum, a list of `lsst.daf.butler.DatasetRef`, or
a single `lsst.daf.butler.DatasetRef`. The function will get and return
the corresponding datasets from the butler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returns section would be useful too.

elif isinstance(dataset, DatasetRef) or isinstance(dataset, DeferredDatasetRef):
return self._get(dataset)
else:
raise TypeError("Dataset argument is not a type that can be used to get")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of the isinstance-based switch, it feels a bit fragile for my taste. But I'm not sure anything can be done to avoid it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I dont think so if we are going to try and preserve this sort of overload. That is until we get to python 3.8. If (when?) we can take that as the base line, there is a nice built in single dispatch for methods. That will make the code cleaner and more extensible, but fundamentally we want this method overload based on the type of the input.

if isinstance(dataset, OutputQuantizedConnection):
if not isinstance(values, Struct):
raise ValueError("dataset is a OutputQuantizedConnection, a Struct with corresponding"
"attributes must be passed as the values to put")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a space before "attributes".

for i, ref in enumerate(refs):
self._put(valuesAttribute[i], ref)
else:
self._put(valuesAttribute, refs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be slightly simplified if _put() supported lists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated back and forth on this, if it was better to have a simple function that did one thing, or do duplicate the handling the lists and single values. I also debated having an intermediate function to handle lists. Ultimately I decided on a simpler _put, but I don't really have strong feelings.

if inout is RefDirection.OUT:
for r in ref:
if (r.datasetType, r.dataId) not in self.allOutputs:
raise ValueError("DatasetRef is not part of the Quantum being processed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is repetition in this method, I think you can simplify it if you just pass self.allInputs or self.allOutputs as a parameter instead of enum (and enum will not even have to exist in that case).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good point, I changed this so many times I could not see the forest for the trees

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked at few more files, not finished reviewing yet. Will try to finish it tomorrow if nothing urgent happens.

"InitOutputDatasetConfig", "InitOutputDatasetField",
"ResourceConfig", "QuantumConfig", "PipelineTaskConfig"]
__all__ = ["ResourceConfig", "PipelineTaskConfig",
"PipelineTaskConnections"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PipelineTaskConnections lives in connections, do you want to export it from this module too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, was just something left over from before I moved it to its own file, good catch

Possibly raises a TypeError if attempting to create a factory function
from an incompatible type
This metaclass ensures a `PipelineTaskConnections` class is specified in the class construction
parameters with a parameter name of pipelineConnections. Using The supplied connection class,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -> the

this metaclass constructs a `lsst.pex.config.Config` instance which can be used to configure the
connections class. This config is added to the config class under declaration with the name
"connections" used as an identifier. The connections config also has a reference to the connections
class used in its construction with associated with an atttribute named "connectionsClass". Finally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extraneous "with". I'd put connectionsClass in backticks here.

connections class. This config is added to the config class under declaration with the name
"connections" used as an identifier. The connections config also has a reference to the connections
class used in its construction with associated with an atttribute named "connectionsClass". Finally
The newly constructed config class (not an instance of it) is assigned to the Config class under
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -> the

"connections" used as an identifier. The connections config also has a reference to the connections
class used in its construction with associated with an atttribute named "connectionsClass". Finally
The newly constructed config class (not an instance of it) is assigned to the Config class under
construction with the attribute name "ConnectionsConfigClass".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectionsConfigClass in backticks here too.
WE have a limit on docstring/comment line length at 79 characters, I think this docstring exceeds it.


Parameters
----------
quantum : `lsst.daf.butler.core.Quantum`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lsst.daf.butler.Quantum should be used here.

"""Extract and classify the dataset types from a single `PipelineTask`.

Parameters
----------
taskClass: `type`
A concrete `PipelineTask` subclass.
config: `PipelineTaskConfig`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config was removed too, add description for connectionsInstance parameter

connection fields describing input dataset types. Argument values will
be data objects retrieved from data butler. If a dataset type is
configured with ``multiple`` field set to ``True`` then the argument
value will be a list of objects, otherwise it will be a single objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single objects - > single object

# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

"""Simple unit test for ResourceConfig.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResourceConfig -> connections?

import lsst.pipe.base as pipeBase


class TestConncetionsClass(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conncetions -> Connections

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks OK (if my understanding of how it work is correct), though a bit too much meta-magic for my taste. I think it probably needs good user manual or maybe more code examples in docstrings.

try:
dct['dimensions'] = set(kwargs['dimensions'])
except TypeError:
raise dimensionsValueError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise ... from ... may be better here for less confusing message. from None will suppress TypeError traceback if you prefer.

lsst.pipe.base.connectionTypes and are listed as follows:

* InitInput - Defines connections in a quantum graph which are used as inputs to the __init__ function
of the PipelineTask corresponding to this class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines should not be indented or they will become indented in sphinx output too. I think you want this:

* InitInput - Defines connections in a quantum graph which are used as inputs to the __init__ function
  of the PipelineTask corresponding to this class

(but also properly wrapped at 79 characters).

You can also try try definition list syntax: http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#definition-lists

match dimensions that exist in the butler registry which will be used in executing the corresponding
`PipelineTask`.

The second parameter is labeled defaultTemplates and is conditionally optional. The name attributes of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put defaultTemplates in double backticks.

The second parameter is labeled defaultTemplates and is conditionally optional. The name attributes of
connections can be specified as python format strings, with named format arguments. If any of the
name parameters on connections defined in a `PipelineTaskConnections` class contain a template, then
a default template value must be specified in the defaultTemplate argument. This is done by passing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultTemplate -> defaultTemplates?

"""PipelineTaskConnections is a class used to declare desired IO when a PipelineTask is run by an
activator

PipelineTaskConnection classes are created by declaring class attributes of types defined in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This long description should probably be moved into Notes section after Parameters.

Once a `PipelineTaskConnections` class is created, it is used in the creation of a
`PipelineTaskConfig`. This is further documented in the documentation of `PipelineTaskConfig`. For the
purposes of this documentation, the relevant information is that the config class allows configuration
of connection names by users when running a pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small example of defining connections class and task config class would be super-helpful here.

@timj timj changed the title Tickets/DM-20205 Update to Final PipelineTasks API DM-20205: Update to Final PipelineTasks API Aug 19, 2019
@natelust
Copy link
Contributor Author

It looks OK (if my understanding of how it work is correct), though a bit too much meta-magic for my taste. I think it probably needs good user manual or maybe more code examples in docstrings.

I am working on as a separate body of work (as to not hold this up) documentation that will go in the package documentation talking about how this works, and including a long tutorial on how this all gets used to make PipelineTasks.

@natelust natelust force-pushed the tickets/DM-20205 branch 3 times, most recently from 2056f2b to 532716c Compare August 27, 2019 13:21
This commit overhauls the framework for writing and using PipelineTasks.
It introduces a new class for defining all the connections a
PipelineTask expects during execution. It is connected to a
PipelineTaskConfig at declaration at which point relevant configuration
are added to the Config class. The connections class allows for better
introspection from the execution framework, and simplifies the
implementation of a PipelineTask.
doc=docString,
default=default)
# add a reference to the connection class used to create this sub config
configConnectionsNamespace['connectionsClass'] = connectionsClass
Copy link
Member

@TallJimbo TallJimbo Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the right capitalization for variables that refer to types is never very clear, "connectionsClass" here is sufficiently similar to "Task.ConfigClass" in usage and context that I think it's a pretty serious problem to capitalize it differently.

@TallJimbo TallJimbo merged commit 6c9d669 into master Aug 30, 2019
@TallJimbo TallJimbo deleted the tickets/DM-20205 branch August 30, 2019 12:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants