DM-26371 Introduce new QuantumGraph object #143

natelust · 2020-09-11T16:48:13Z

No description provided.

andy-slac

Looks OK, few minor comments, but I'd prefer hash methods be improved for TaskDef and QuantumNode classes.

andy-slac · 2020-09-14T15:39:39Z

python/lsst/pipe/base/graph/_implDetails.py

+    def add(self, element: _T):
+        super().add(element)


Do you need to override this method?

This class is gone now

andy-slac · 2020-09-14T15:53:12Z

python/lsst/pipe/base/graph/_implDetails.py

+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+from __future__ import annotations
+
+__all__ = ("_DatasetTracker", "_OutputSet", "_datasetDictToEdgeIterator")


Do you need this __all__ at all?

At some point there was issues in some code I wrote where the documentation was messed up because of imports being exposed at the module scope, and as explicit is better than implicit I like to include an __all__

andy-slac · 2020-09-14T15:54:19Z

python/lsst/pipe/base/graph/graph.py

+from collections import defaultdict
+
+from collections import abc


Could do both imports in one line.

andy-slac · 2020-09-14T17:24:00Z

python/lsst/pipe/base/graph/graph.py

+        """
+        cumHash = hash(self.nodeNumber)
+        if self.quantum.taskName is None:
+            cumHash ^= hash(self.quantum.taskClass.taskClass.__qualname__)


taskClass.taskClass?

andy-slac · 2020-09-14T17:35:03Z

python/lsst/pipe/base/graph/graph.py

+                        cumHash ^= _hashDsRef(value)
+                else:
+                    cumHash ^= _hashDsRef(item)
+        return cumHash


I am not sure that this complicated hash is consistent with __eq__ definition (or that it's possible to prove that). If nodeNumber is unique within a graph could you use just that for both __eq__ and __hash__? Or maybe object identity is even OK for that?

I'm also curious what problem __eq__ and __hash__ solve here. Is it just for test comparisons? Under what contexts are these being compared (i.e. within one graph, the task label and data ID would be sufficient for uniqueness).

If we do need some complicated hierarchical hash, I'd be inclined to implement more of it on the objects Quantum holds, but it seems even better to try to avoid needing it.

So this was a bit rushed (you may remember my comments at the middleware meeting) and there is a few other things to be considered here. One such use is if you had a node that came from one of two graphs, and you wanted to know which graph that node came from, it is important to know something about the whole QuantumNode. I kept going back and forth on this, but thinking about this response has convinced me that TaskDef should go back into QuantumNode, as thinks like label are important to deciding if nodes are equals. Now for this constraint you strictly dont need all the logic in the QuantumNode object, some could be in the __contains__ method on QuantumGraph or something, but I think that A) having it all centralized is good (lets you compare nodes w/o a graph) B) keep it fast by exploiting hash algorithm.

Other uses are things like comparing the equality of QuantumGraphs. This is useful since we support subsetting, and may want to see if one subset is equal to another. Related to this, a discussion with Michelle happened yesterday raised the possibility of keeping NodeNumber constant in a subset, which would allow a node to be better tracked back to the original graph. But there are other considerations with this, like should NodeNumber be part of a hash and equality, or should it just be Quantum and TaskDef? That would allow you to see if a node is in a graph w/o caring about where it is located within the graph (This is currently what I am leaning towards), in which case NodeNumber is just a convenient number to locate a Node w/n a QuantumGraph instance, but has no bearing on two graphs being considered equal (which I like).

As it is now Quantum contains a lot of mutable things that unimportant to the equality of a Node (at least I have conceived it currently) it seems better to have a hash on the node object on the attributes that are important. There may be cases where we do care about those attributes to hash/equals a Quantum. So unless we refactor quantum too, I would like to leave this hear.

Most of this is thinking 'out loud' so if it is rambling I apologize. A lot of this code has grown organically as new features the graph should have were discovered and so were not "designed" as a whole from the beginning, something I am trying to go back and reason through now.

andy-slac · 2020-09-14T19:39:53Z

python/lsst/pipe/base/graph/graph.py

+            label, config = q2Task[nodeNumber]
+            # This dict is simply to ensure that there is only one TaskDef
+            # created for each label
+            taskDef = taskDefs.setdefault(label, TaskDef(quantum.taskName, config, quantum.taskClass, label))


There is still an overhead of creating TaskDef (and discarding it) even if it's in the dict already.

There is, but there will only be small numbers of TaskDefs, even if there are hundreds its small for something like this. I feel the simplicity of the code outweighs the small amount of extra runtime.

andy-slac · 2020-09-14T19:41:43Z

python/lsst/pipe/base/graph/graph.py

+        # class is known, and decreases un-persisting time for configs by
+        # about a factor of 2
+        for quantum, task in self._quantumToTask.items():
+            q2Task[quantum] = (task.label, task.config)


I have slight preference for storing complete TaskDef, that way you could reduce dependency if TaskDef interface/constructor ever changes.

If saving the TaskDef instance here would still mean we only save the TaskDef once (due to pickles internal memoization), I'm all for it. If not (I don't recall the details of how pickle works, and haven't quite followed all of the data structures here), we should be saving only the label, as that's sufficient to uniquely identify the TaskDef within the graph, and that's sufficient to find the config. We definitely don't want to be saving the same config instance multiple times without memoization, as that'd be hugely wasteful.

I wrote a test and verified that Pickle was not duplicating storage space for TaskDefs, and so I simplified things

andy-slac · 2020-09-14T19:44:24Z

python/lsst/pipe/base/graph/graph.py

+        self._datasetRefDict = datasetRefDict
+        self._datasetDict = datasetDict
+        self._taskGraph = taskGraph
+        self._count = None


This looks like it repeats much of the logic in constructor, would it be simpler if we just save constructor parameters? Or do we loose in size/performance (if we care about that)?

Dropped in favor of doing things differently with changes to construction

andy-slac · 2020-09-14T20:05:22Z

python/lsst/pipe/base/pipeline.py

+    def __hash__(self):
+        with io.StringIO() as sio:
+            self.config.saveToStream(sio)
+            configHash = hash(sio.getvalue())


I'm not very comfortable with this, I'm not convinced that formatting pex_config will always produce the same output for two identical configs. Can you just ignore config and not include it in a hash?

No, as it is the only thing that can disambiguate a TaskDef created in the same pipeline with the same input repos, only differing by a config option. This is useful for comparing one QuantumGraph to another. I will look to see if there is a different way to compare the configs directly

The hash is not required to be unique, the requirement is that objects that compare as equal have the same hash, but does not say that objects that are different must have different hash (and hash in general is not unique). You can drop config from hash and still keep it in __eq__ (comparing configs is well defined). And in the same pipeline tasks have to have different labels so hash that includes labels should already be OK.

But the same pipeline might be run two ways, I don't see anything that says the TaskDefs should have the context of anything else other than themselves. I have verified creating a new config and turning it to a string over 2000 times leads to the same serialization. This may not be true from one version of the stack to another, but that is possible with any stack version changes.

Well, that sort of test is probably not 100% conclusive. I can probably agree that formatting the same config instance should produce identical output (though even that is not guaranteed by anything). But if you take to separate config instances, they could compare equal but may produce differently formatted strings. I'm thinking about a set-like Field class where comparison does not depend on items order but formatted output could order them differently.

Shouldn't we be letting Config define how it handles equality and __hash__?

I'm not sure Config wants to be hashable. It is a mutable structure unless you freeze it and in general making complicated dynamic data structures hashable is not worth trouble in my opinion. In this particular case we don't even need its hash, other attributes of TaskDef are sufficient for building reasonable hash.

Any you make a good point, I was so focused on the tree I was not seeing the forest (thinking too closely of equals and not hash interplay). Thanks for pointing out the data model again, equal things must hash the same, but the converse of hash must be everything in equals is not necessary (if you are fine with a bit higher collision rate). I'll update this now.

andy-slac · 2020-09-14T20:06:27Z

python/lsst/pipe/base/pipeline.py

+        with io.StringIO() as sio:
+            self.config.saveToStream(sio)
+            configHash = hash(sio.getvalue())
+        return hash(self.taskName) ^ configHash ^ hash(self.taskClass) ^ hash(self.label)


Hmm, usually people do hash(tuple(....)) on multiple values, I don't know if ^ is better or worse.

So it is close to the same thing, but without the overhead of creating temporary containers. This is what is recommended in official documentation for __hash__ in python (though its not exclusively restricted to this method)

It seems like the suggestion has changed sometime between python 3.4 (the last time I looked at it) and python 3.8 docs. I will update to use the new suggestion of making a tuple

TallJimbo · 2020-09-14T21:00:42Z

python/lsst/pipe/base/connections.py

@@ -497,7 +498,8 @@ def adjustQuantum(self, datasetRefMap: NamedKeyDict[DatasetType, typing.Set[Data
        return datasetRefMap


-def iterConnections(connections: PipelineTaskConnections, connectionType: str) -> typing.Generator:
+def iterConnections(connections: PipelineTaskConnections,
+                    connectionType: Union[str, Iterable[str]]) -> typing.Generator:


Making the return type typing.Generator[BaseConnection, None, None] is a lot more useful than saying Generator with no generics. And I know we might differ on this, but if the user shouldn't care that this is implemented as a generator, consider Iterator instance

I didn't think to expand this when making the change, but I updated it now

TallJimbo · 2020-09-14T21:04:47Z

python/lsst/pipe/base/graph/_implDetails.py

+    def first(self) -> Optional[_T]:
+        if len(self) == 0:
+            return None
+        return next(iter(self))


I think this would be a lot better as free function; you don't want to have to copy some external set (or even convert some other set-like object into one of these) just to be able to use this method.

Dropped in a refactor

TallJimbo · 2020-09-14T21:10:53Z

python/lsst/pipe/base/graph/graph.py

+                        cumHash ^= _hashDsRef(value)
+                else:
+                    cumHash ^= _hashDsRef(item)
+        return cumHash


I'm also curious what problem __eq__ and __hash__ solve here. Is it just for test comparisons? Under what contexts are these being compared (i.e. within one graph, the task label and data ID would be sufficient for uniqueness).

If we do need some complicated hierarchical hash, I'd be inclined to implement more of it on the objects Quantum holds, but it seems even better to try to avoid needing it.

TallJimbo · 2020-09-14T21:11:27Z

python/lsst/pipe/base/graph/graph.py

+        and the graph that holds the relations between quanta
+        """
+        # Datastructures used to identify relations between components; TaskDef
+        # for taks, and QunatumNode for the quanta


TallJimbo · 2020-09-14T21:21:40Z

python/lsst/pipe/base/graph/graph.py

+
+    @property
+    def inputQuanta(self) -> List[QuantumNode]:
+        """Makes a `list` of all `QuantumNode`s that are 'input' nodes to the


Start-of-docstring verbs should be in the imperative voice, i.e. "Make" rather than "Makes" here (also true for several methods below, with "Returns" -> "Return").

TallJimbo · 2020-09-14T21:35:43Z

python/lsst/pipe/base/graph/graph.py

+
+        Parameters
+        ----------
+        datasetTypeName : `DSTypeName`


I'm a little uncomfortable including a NewType in docstrings. It'll be unfamiliar to most developers, and while they can ignore type annotations type annotations if they find them confusing, making the fact that this is just a str at runtime so indirected from usage point seems bad for them.

I changed it to str, since that is basically what it is, but added notes in the documentation about ability to use DatasetTypeName for type safety in type checking

TallJimbo · 2020-09-14T21:36:39Z

python/lsst/pipe/base/graph/_implDetails.py

+_S = TypeVar("_S")
+
+# NewTypes
+DSTypeName = NewType("DSTypeName", str)


I hate to be That Guy, but our conventions say this really should be spelled out as DatasetTypeName, or at least DsTypeName (and while I find the latter detestable, the former really is not too much longer and much more readable, at least to me).

TallJimbo · 2020-09-14T21:38:02Z

python/lsst/pipe/base/graph/graph.py

+
+        Returns
+        -------
+        `TaskDef` or None


None should always be in backticks in documentation.

TallJimbo · 2020-09-14T21:39:07Z

python/lsst/pipe/base/graph/graph.py

+    def findTaskDefByName(self, taskName: str) -> List[TaskDef]:
+        """Determines which `TaskDef`s in this graph are associated with a `str`
+        representing a tasks name (looks at the taskName property of
+        `TaskDefs`).


TaskDefs is plural and won't render properly. FWIW, I'm not sure

`TaskDef`s

(which I've seen a few times as well) will either.

I think it's

`TaskDef`\ s

TallJimbo · 2020-09-14T21:50:01Z

python/lsst/pipe/base/graph/graph.py

+        # class is known, and decreases un-persisting time for configs by
+        # about a factor of 2
+        for quantum, task in self._quantumToTask.items():
+            q2Task[quantum] = (task.label, task.config)


If saving the TaskDef instance here would still mean we only save the TaskDef once (due to pickles internal memoization), I'm all for it. If not (I don't recall the details of how pickle works, and haven't quite followed all of the data structures here), we should be saving only the label, as that's sufficient to uniquely identify the TaskDef within the graph, and that's sufficient to find the config. We definitely don't want to be saving the same config instance multiple times without memoization, as that'd be hugely wasteful.

natelust · 2020-09-15T18:01:49Z

A lot of comments on the implementation details (which are great) but have people looked at the interface? How does it feel to people, are there methods you think that are missing that could be useful?

MichelleGower

As for whether the class has all needed functionality, that is a difficult question to answer (especially since the networkx library used internally has a large set of graph operations). I pointed out one function I specifically noticed wasn't available. The good news is that I was able to successfully prototype changes to BPS to use this. And the BPS code <-> QuantumGraph interaction happens more at the API level than before (e.g., save and load, and some because BPS was just never modified to use new QuantumGraph API).

MichelleGower · 2020-09-15T21:45:18Z

python/lsst/pipe/base/graph/_implDetails.py

+
+
+class _OutputSet(set, Generic[_T]):
+    """Simple set wrapper that provides a "first" method


Is the set ordered? Will first always return the same item? (Asking if the name "first" implies more than what it is doing)

This class is now gone

MichelleGower · 2020-09-15T21:47:33Z

python/lsst/pipe/base/graph/_implDetails.py

+from dataclasses import dataclass, field
+from typing import (Generic, Mapping, Optional, Set, TypeVar, Generator, Tuple, NewType, Union)
+
+# Generic type parameters


Confusing why one needs 3 generic type definitions. If for specific uses, is it typical to not name them in a way to hint at what they should be used for?

In the new implementation it is down to two TypeVars. From what it seems, it does currently seem to be the convention that generic parameters be short one or two letter variables. The new implementation has restricted TypeVars which does give more guidance. If this was not an implementation detail module that is not exported anywhere, I would generally agree that this information would probably belong in the class level docstring.

MichelleGower · 2020-09-15T21:59:11Z

python/lsst/pipe/base/graph/graph.py

+        return cumHash
+
+
+class QuantumGraph:


It feels like this class suffers from the implementation being two separate graphs (TaskDefs vs Quanta). Sometimes the separation is hidden and other times one has to understand it (e.g., when iterating over Quanta, we get the QuantumNode, but not the TaskDef. So outside code has to turn around and ask the QuantaGraph for the TaskDef.) I think there are implementation possibilities in either direction (separate classes for each graph or completely hide the fact that internally stored separately). I guess the question is whether code would use one without the other on whether to keep as single object or separate.

I have added TaskDef back to QuantumNode, it is something I kept going back and forth with. I think that addresses most of your concerns. Related though is the question if the taskGraph property and iterTaskGraph method should exist, and if no one objects I would like to keep it as a convenience, so if someone wants to interact with the QuantumGraph for only specific tasks (such as subset), they do not need to iterate over the whole graph to build the associations themselves.

MichelleGower · 2020-09-15T22:00:50Z

python/lsst/pipe/base/graph/graph.py

+        self._taskGraph.add_edges_from(_datasetDictToEdgeIterator(self._datasetDict))
+        # Attempting to create a graph that contains a single task will create an
+        # edge from None to the task, but None is not a node we actually want
+        # so remove it here. This is a consequence of using add_edges from and


Typo: add_edges from (missing underscore...was looking for the "from where")

Also is the comment true or a result of how _datasetDictToEdgeIterator was implemented? Is there some modification that can be made to _datasetDictToEdgeIterator to fix this since every call to _datasetDictToEdgeIterator seems to be followed by the check for None and remove_node(None)?

👍 As said in another comment, I think factoring all this away would probably make things cleaner, and will do that.

MichelleGower · 2020-09-15T22:06:53Z

python/lsst/pipe/base/graph/graph.py

+
+    @property
+    def taskGraph(self) -> nx.DiGraph:
+        """Returns a graph representing the relations between the tasks inside


Why does the graph() function rate a "you shouldn't really be using this method, but just in case" message, but this one doesn't?

Because QuantumGraph is a extension/abstraction around the graph property (if you iterate over both containers you get the relations between QantumNodes so in some cases they could be interchangeable. The taskGraph property is just a convenience that exposes some information, and could never be used as or confused with QuantumGraph

MichelleGower · 2020-09-15T22:08:54Z

python/lsst/pipe/base/graph/graph.py

+        nodes.add(node)
+        return self.subset(nodes)
+
+    def ansestorsOfQuantumNode(self: _T, node: QuantumNode) -> _T:


Typo: ansestors -> ancestors

MichelleGower · 2020-09-15T22:13:40Z

python/lsst/pipe/base/graph/graph.py

+        """
+        write_dot(self._connectedQuanta, output)
+
+    def subset(self: _T, nodes: Union[QuantumNode, Iterable[QuantumNode]]) -> _T:


As mentioned in another comment, while starting to play with the code I was looking for a subset function that took nodeNumbers. Also mentioned in that comment, it would make debugging more human-friendly if there was a way to find a node in a subset graph in the full graph using something similar to the nodeNumber.

MichelleGower · 2020-09-15T22:16:50Z

python/lsst/pipe/base/graph/graph.py

+        """Builds the graph that is used to store the relation between tasks,
+        and the graph that holds the relations between quanta
+        """
+        # Datastructures used to identify relations between components; TaskDef


Datastructure is typically written as two words.

MichelleGower · 2020-09-15T22:18:36Z

python/lsst/pipe/base/graph/_implDetails.py

+    of all inputs and outputs
+    """
+    inputs: Set[_U] = field(default_factory=set)
+    output: _OutputSet[_U] = field(default_factory=_OutputSet)


Why is inputs plural and output singular?

because where this is used (tracking dsTypes / quanta) there can be multiple inputs, but only one thing should output it, so it is named such, it is a full container to catch cases where there is an error rather than clobbering. I could probably roll all these things together into a unified class though, which might make a few things cleaner looking.

MichelleGower · 2020-09-15T22:26:13Z

python/lsst/pipe/base/graph/graph.py

+        """
+        return set(self._quanta[taskDef])
+
+    def tasksWithInput(self, datasetTypeName: DSTypeName) -> List[TaskDef]:


Naming of class attributes doesn't seem to be consistent. Sometimes methods are nouns and sometimes verbs.

andy-slac · 2020-09-16T19:26:48Z

One comment regarding interfaces which also touches on sub-graphs and provenance.

One thing that I think sticks out in a new interface is the QuantumNode.nodeNumber thing. I think it is supposed to work OK for single graph but using it across multiple graphs is clearly non-trivial. If we need to identify nodes that may be shared across multiple graphs we need better abstraction to replace nodeNumber. Using numbers for identification is probably not going to work, we would need something like globally unique IDs (uuid) for the node itself, and maybe for the "origin" graph where the node was originally created. And then of course you need to track them down through all transformations/serializations/etc.

natelust · 2020-09-16T20:14:39Z

I intended for nodeNumber to be only meaningful w/n one creation of a graph (and live through pickling), Something that you say give something a single pickle of the QuantumGraph and a list of node numbers to execute. This would save the need for creating many pickles of subgraphs (but would be a pipetask API change so I was just putting it in place for now). Then Michelle suggested it would be useful to preserve in sub setting for debugging reasons, which is reasonable. However, the changing nature of this property does have me re-evaluating it. That does not mean I am rushing to keep or get rid of it, but as you say it is a much more complex thing to think about.

andy-slac

I have done a quick re-review, small bunch of comments. I understand it's still not final, so I don't check "Approve" yet.

andy-slac · 2020-09-21T00:31:35Z

python/lsst/pipe/base/graph/graph.py

+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+from __future__ import annotations
+import io


This import should be moved below __all__

Thanks, if I use something i have not imported my editor is "helpful" and adds the import. I'll try to keep a closer eye on this.

andy-slac · 2020-09-21T00:34:22Z

python/lsst/pipe/base/graph/graph.py

+        inputNodes : list of `QuantumNode`
+            A list of nodes that are inputs to the graph


It returns generator, not list.

andy-slac · 2020-09-21T00:35:15Z

python/lsst/pipe/base/graph/graph.py

+
+        Returns
+        -------
+        list of `DatasetTypeName`


tuple, not list, needs a name for returned item.

I really cant wait until we upgrade our sphinx, it supposedly has the ability to get the types from the annotations

andy-slac · 2020-09-21T00:42:54Z

python/lsst/pipe/base/graph/graph.py

+
+        Raises
+        ------
+        `IndexError`


No backticks around exception names, please check all docstrings.

andy-slac · 2020-09-21T00:46:26Z

python/lsst/pipe/base/graph/graph.py

+
+        Returns
+        -------
+        tasks : list of `TaskDef`


It returns generator

andy-slac · 2020-09-21T00:46:56Z

python/lsst/pipe/base/graph/graph.py

+        Raises
+        ------
+        `KeyError`
+            Raises if the `DatasetTypeName` is not part of the `QuantumGraph`


Raised if

andy-slac · 2020-09-21T00:53:15Z

python/lsst/pipe/base/graph/graph.py

+        self._taskGraph = taskGraph
+        self._count = None
+
+    def __eq__(self, other: object) -> bool:


If comparison for different graphs is useful then it may also need to support subset/superset checking or other set-like operations?

andy-slac · 2020-09-21T00:59:27Z

python/lsst/pipe/base/pipeline.py

+    def __hash__(self):
+        with io.StringIO() as sio:
+            self.config.saveToStream(sio)
+            configHash = hash(sio.getvalue())


Well, that sort of test is probably not 100% conclusive. I can probably agree that formatting the same config instance should produce identical output (though even that is not guaranteed by anything). But if you take to separate config instances, they could compare equal but may produce differently formatted strings. I'm thinking about a set-like Field class where comparison does not depend on items order but formatted output could order them differently.

andy-slac

Looks good, few minor comments

andy-slac · 2020-09-25T03:50:05Z

tests/test_quantumGraph.py

@@ -175,6 +176,8 @@ def testGetQuantumNodeByNodeId(self):
        inputQuanta = tuple(self.qGraph.inputQuanta)
        node = self.qGraph.getQuantumNodeByNodeId(inputQuanta[0].nodeId)
        self.assertEqual(node, inputQuanta[0])
+        wrongNode = NodeId(15, BuildId("alternative build Id"))
+        self.assertRaises(IncompatibleGraphError, self.qGraph.getQuantumNodeByNodeId, wrongNode)


I find context manager version of this test easier to read:

with self.assertRaises(IncompatibleGraphError): self.qGraph.getQuantumNodeByNodeId(wrongNode)

I agree completely. You know exactly what is being tested in the context manager version and it's easy to make it trigger simply by commenting out the with and putting in an if True: (so you don't need to change the indent).

andy-slac · 2020-09-25T03:50:52Z

tests/test_quantumGraph.py

@@ -310,6 +313,10 @@ def testSaveLoad(self):
                loaded.taskName = loaded.taskName.split('.')[-1]
            self.assertEqual(self.qGraph, restore)

+    def testContains(self):
+        oldNode = next(iter(self.qGraph))


andy-slac · 2020-09-25T03:55:55Z

python/lsst/pipe/base/graph/quantumNode.py

@@ -84,42 +82,12 @@ class QuantumNode:
    def __eq__(self, other: object) -> bool:
        if not isinstance(other, QuantumNode):
            return False


May not matter in this case but I think usual pattern for such cases is to return NotImplemented so that Python can do something smart about it.

but surely you don't want python to do something so clever that somehow it ends up returning True?

I don't but maybe someone else will 🙂 NotImplemented utility is in raising an exception for non-comparable types, maybe we do not want that, but OTOH I have no idea why it's OK to compare Quantum and non-Quantum.

Thanks. I just read up on this and I see that returning NotImplemented causes python to switch round the equality and try to see if other.___eq__(self) gives a different answer. Then it seems it tries an is check and eventually returns false. So it seems that returning NotImplemented is the "right" thing to do but also doesn't it guarantee to slow everything down when we already know the answer is false?

I think returning False is fine here; it's the equivalence-relationship analog of marking a class as "final" for inheritance relationships. So it's not something you want to do accidentally - which is why NotImplemented is the general recommendation - but a useful way to express "no future type shall ever compare as equal to instances of this type".

I actually think that making disparate types actually compare as equal to each other is almost always more trouble than its worth (and some Python devs recognized this; note that a list never compares as equal to a tuple), so I'd personally be willing to go farther and say that we should always consider False to be a better default than NotImplemented, and the Python "general recommendation" of NotImplemented is actually slightly bad advice.

andy-slac · 2020-09-25T03:56:24Z

python/lsst/pipe/base/graph/quantumNode.py

-                else:
-                    cumHash ^= _hashDsRef(item)
-        return cumHash
+        return hash((self.taskDef, self.quantum))


MichelleGower

Was able to successfully use this in ctrl_bps. Would use a subset by NodeId list if available, but only a couple of lines without it. Approve merging changes.

natelust force-pushed the tickets/DM-26371 branch from f4877e9 to d4551c0 Compare September 11, 2020 16:50

andy-slac approved these changes Sep 14, 2020

View reviewed changes

TallJimbo reviewed Sep 14, 2020

View reviewed changes

MichelleGower reviewed Sep 15, 2020

View reviewed changes

andy-slac reviewed Sep 21, 2020

View reviewed changes

andy-slac approved these changes Sep 25, 2020

View reviewed changes

MichelleGower approved these changes Sep 25, 2020

View reviewed changes

Add explicit hashing to TaskDef object

a6b9555

natelust force-pushed the tickets/DM-26371 branch 2 times, most recently from 01cbe1d to 02fcdcc Compare September 28, 2020 16:32

natelust added 2 commits September 28, 2020 11:46

Introduce new QuantumGraph object

bb61530

Fix erros in GraphBuilder

88376a3

natelust force-pushed the tickets/DM-26371 branch from 02fcdcc to 88376a3 Compare September 28, 2020 16:46

natelust merged commit a05bd1d into master Sep 28, 2020

timj deleted the tickets/DM-26371 branch April 13, 2022 22:20

		from collections import defaultdict

		from collections import abc



		class _OutputSet(set, Generic[_T]):
		"""Simple set wrapper that provides a "first" method

		inputNodes : list of `QuantumNode`
		A list of nodes that are inputs to the graph

DM-26371 Introduce new QuantumGraph object #143

DM-26371 Introduce new QuantumGraph object #143

Conversation

natelust commented Sep 11, 2020

andy-slac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natelust Sep 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natelust commented Sep 15, 2020

MichelleGower left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andy-slac commented Sep 16, 2020

natelust commented Sep 16, 2020

andy-slac left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

natelust Sep 16, 2020 •

edited