Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-26371 Introduce new QuantumGraph object #143

Merged
merged 3 commits into from Sep 28, 2020
Merged

Conversation

natelust
Copy link
Contributor

No description provided.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks OK, few minor comments, but I'd prefer hash methods be improved for TaskDef and QuantumNode classes.

Comment on lines 44 to 45
def add(self, element: _T):
super().add(element)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to override this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is gone now

# along with this program. If not, see <http://www.gnu.org/licenses/>.
from __future__ import annotations

__all__ = ("_DatasetTracker", "_OutputSet", "_datasetDictToEdgeIterator")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this __all__ at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point there was issues in some code I wrote where the documentation was messed up because of imports being exposed at the module scope, and as explicit is better than implicit I like to include an __all__

Comment on lines 26 to 28
from collections import defaultdict

from collections import abc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do both imports in one line.

"""
cumHash = hash(self.nodeNumber)
if self.quantum.taskName is None:
cumHash ^= hash(self.quantum.taskClass.taskClass.__qualname__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taskClass.taskClass?

cumHash ^= _hashDsRef(value)
else:
cumHash ^= _hashDsRef(item)
return cumHash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this complicated hash is consistent with __eq__ definition (or that it's possible to prove that). If nodeNumber is unique within a graph could you use just that for both __eq__ and __hash__? Or maybe object identity is even OK for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious what problem __eq__ and __hash__ solve here. Is it just for test comparisons? Under what contexts are these being compared (i.e. within one graph, the task label and data ID would be sufficient for uniqueness).

If we do need some complicated hierarchical hash, I'd be inclined to implement more of it on the objects Quantum holds, but it seems even better to try to avoid needing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this was a bit rushed (you may remember my comments at the middleware meeting) and there is a few other things to be considered here. One such use is if you had a node that came from one of two graphs, and you wanted to know which graph that node came from, it is important to know something about the whole QuantumNode. I kept going back and forth on this, but thinking about this response has convinced me that TaskDef should go back into QuantumNode, as thinks like label are important to deciding if nodes are equals. Now for this constraint you strictly dont need all the logic in the QuantumNode object, some could be in the __contains__ method on QuantumGraph or something, but I think that A) having it all centralized is good (lets you compare nodes w/o a graph) B) keep it fast by exploiting hash algorithm.

Other uses are things like comparing the equality of QuantumGraphs. This is useful since we support subsetting, and may want to see if one subset is equal to another. Related to this, a discussion with Michelle happened yesterday raised the possibility of keeping NodeNumber constant in a subset, which would allow a node to be better tracked back to the original graph. But there are other considerations with this, like should NodeNumber be part of a hash and equality, or should it just be Quantum and TaskDef? That would allow you to see if a node is in a graph w/o caring about where it is located within the graph (This is currently what I am leaning towards), in which case NodeNumber is just a convenient number to locate a Node w/n a QuantumGraph instance, but has no bearing on two graphs being considered equal (which I like).

As it is now Quantum contains a lot of mutable things that unimportant to the equality of a Node (at least I have conceived it currently) it seems better to have a hash on the node object on the attributes that are important. There may be cases where we do care about those attributes to hash/equals a Quantum. So unless we refactor quantum too, I would like to leave this hear.

Most of this is thinking 'out loud' so if it is rambling I apologize. A lot of this code has grown organically as new features the graph should have were discovered and so were not "designed" as a whole from the beginning, something I am trying to go back and reason through now.

label, config = q2Task[nodeNumber]
# This dict is simply to ensure that there is only one TaskDef
# created for each label
taskDef = taskDefs.setdefault(label, TaskDef(quantum.taskName, config, quantum.taskClass, label))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still an overhead of creating TaskDef (and discarding it) even if it's in the dict already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is, but there will only be small numbers of TaskDefs, even if there are hundreds its small for something like this. I feel the simplicity of the code outweighs the small amount of extra runtime.

# class is known, and decreases un-persisting time for configs by
# about a factor of 2
for quantum, task in self._quantumToTask.items():
q2Task[quantum] = (task.label, task.config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have slight preference for storing complete TaskDef, that way you could reduce dependency if TaskDef interface/constructor ever changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If saving the TaskDef instance here would still mean we only save the TaskDef once (due to pickles internal memoization), I'm all for it. If not (I don't recall the details of how pickle works, and haven't quite followed all of the data structures here), we should be saving only the label, as that's sufficient to uniquely identify the TaskDef within the graph, and that's sufficient to find the config. We definitely don't want to be saving the same config instance multiple times without memoization, as that'd be hugely wasteful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a test and verified that Pickle was not duplicating storage space for TaskDefs, and so I simplified things

self._datasetRefDict = datasetRefDict
self._datasetDict = datasetDict
self._taskGraph = taskGraph
self._count = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it repeats much of the logic in constructor, would it be simpler if we just save constructor parameters? Or do we loose in size/performance (if we care about that)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped in favor of doing things differently with changes to construction

def __hash__(self):
with io.StringIO() as sio:
self.config.saveToStream(sio)
configHash = hash(sio.getvalue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very comfortable with this, I'm not convinced that formatting pex_config will always produce the same output for two identical configs. Can you just ignore config and not include it in a hash?

Copy link
Contributor Author

@natelust natelust Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as it is the only thing that can disambiguate a TaskDef created in the same pipeline with the same input repos, only differing by a config option. This is useful for comparing one QuantumGraph to another. I will look to see if there is a different way to compare the configs directly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash is not required to be unique, the requirement is that objects that compare as equal have the same hash, but does not say that objects that are different must have different hash (and hash in general is not unique). You can drop config from hash and still keep it in __eq__ (comparing configs is well defined). And in the same pipeline tasks have to have different labels so hash that includes labels should already be OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the same pipeline might be run two ways, I don't see anything that says the TaskDefs should have the context of anything else other than themselves. I have verified creating a new config and turning it to a string over 2000 times leads to the same serialization. This may not be true from one version of the stack to another, but that is possible with any stack version changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that sort of test is probably not 100% conclusive. I can probably agree that formatting the same config instance should produce identical output (though even that is not guaranteed by anything). But if you take to separate config instances, they could compare equal but may produce differently formatted strings. I'm thinking about a set-like Field class where comparison does not depend on items order but formatted output could order them differently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be letting Config define how it handles equality and __hash__?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure Config wants to be hashable. It is a mutable structure unless you freeze it and in general making complicated dynamic data structures hashable is not worth trouble in my opinion. In this particular case we don't even need its hash, other attributes of TaskDef are sufficient for building reasonable hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any you make a good point, I was so focused on the tree I was not seeing the forest (thinking too closely of equals and not hash interplay). Thanks for pointing out the data model again, equal things must hash the same, but the converse of hash must be everything in equals is not necessary (if you are fine with a bit higher collision rate). I'll update this now.

with io.StringIO() as sio:
self.config.saveToStream(sio)
configHash = hash(sio.getvalue())
return hash(self.taskName) ^ configHash ^ hash(self.taskClass) ^ hash(self.label)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, usually people do hash(tuple(....)) on multiple values, I don't know if ^ is better or worse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it is close to the same thing, but without the overhead of creating temporary containers. This is what is recommended in official documentation for __hash__ in python (though its not exclusively restricted to this method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the suggestion has changed sometime between python 3.4 (the last time I looked at it) and python 3.8 docs. I will update to use the new suggestion of making a tuple

@@ -497,7 +498,8 @@ def adjustQuantum(self, datasetRefMap: NamedKeyDict[DatasetType, typing.Set[Data
return datasetRefMap


def iterConnections(connections: PipelineTaskConnections, connectionType: str) -> typing.Generator:
def iterConnections(connections: PipelineTaskConnections,
connectionType: Union[str, Iterable[str]]) -> typing.Generator:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the return type typing.Generator[BaseConnection, None, None] is a lot more useful than saying Generator with no generics. And I know we might differ on this, but if the user shouldn't care that this is implemented as a generator, consider Iterator instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think to expand this when making the change, but I updated it now

def first(self) -> Optional[_T]:
if len(self) == 0:
return None
return next(iter(self))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a lot better as free function; you don't want to have to copy some external set (or even convert some other set-like object into one of these) just to be able to use this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped in a refactor

cumHash ^= _hashDsRef(value)
else:
cumHash ^= _hashDsRef(item)
return cumHash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also curious what problem __eq__ and __hash__ solve here. Is it just for test comparisons? Under what contexts are these being compared (i.e. within one graph, the task label and data ID would be sufficient for uniqueness).

If we do need some complicated hierarchical hash, I'd be inclined to implement more of it on the objects Quantum holds, but it seems even better to try to avoid needing it.

and the graph that holds the relations between quanta
"""
# Datastructures used to identify relations between components; TaskDef
# for taks, and QunatumNode for the quanta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: taks


@property
def inputQuanta(self) -> List[QuantumNode]:
"""Makes a `list` of all `QuantumNode`s that are 'input' nodes to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start-of-docstring verbs should be in the imperative voice, i.e. "Make" rather than "Makes" here (also true for several methods below, with "Returns" -> "Return").


Parameters
----------
datasetTypeName : `DSTypeName`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uncomfortable including a NewType in docstrings. It'll be unfamiliar to most developers, and while they can ignore type annotations type annotations if they find them confusing, making the fact that this is just a str at runtime so indirected from usage point seems bad for them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to str, since that is basically what it is, but added notes in the documentation about ability to use DatasetTypeName for type safety in type checking

_S = TypeVar("_S")

# NewTypes
DSTypeName = NewType("DSTypeName", str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate to be That Guy, but our conventions say this really should be spelled out as DatasetTypeName, or at least DsTypeName (and while I find the latter detestable, the former really is not too much longer and much more readable, at least to me).


Returns
-------
`TaskDef` or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None should always be in backticks in documentation.

def findTaskDefByName(self, taskName: str) -> List[TaskDef]:
"""Determines which `TaskDef`s in this graph are associated with a `str`
representing a tasks name (looks at the taskName property of
`TaskDefs`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TaskDefs is plural and won't render properly. FWIW, I'm not sure

`TaskDef`s

(which I've seen a few times as well) will either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's

`TaskDef`\ s

# class is known, and decreases un-persisting time for configs by
# about a factor of 2
for quantum, task in self._quantumToTask.items():
q2Task[quantum] = (task.label, task.config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If saving the TaskDef instance here would still mean we only save the TaskDef once (due to pickles internal memoization), I'm all for it. If not (I don't recall the details of how pickle works, and haven't quite followed all of the data structures here), we should be saving only the label, as that's sufficient to uniquely identify the TaskDef within the graph, and that's sufficient to find the config. We definitely don't want to be saving the same config instance multiple times without memoization, as that'd be hugely wasteful.

@natelust
Copy link
Contributor Author

A lot of comments on the implementation details (which are great) but have people looked at the interface? How does it feel to people, are there methods you think that are missing that could be useful?

Copy link
Contributor

@MichelleGower MichelleGower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for whether the class has all needed functionality, that is a difficult question to answer (especially since the networkx library used internally has a large set of graph operations). I pointed out one function I specifically noticed wasn't available. The good news is that I was able to successfully prototype changes to BPS to use this. And the BPS code <-> QuantumGraph interaction happens more at the API level than before (e.g., save and load, and some because BPS was just never modified to use new QuantumGraph API).



class _OutputSet(set, Generic[_T]):
"""Simple set wrapper that provides a "first" method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the set ordered? Will first always return the same item? (Asking if the name "first" implies more than what it is doing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is now gone

from dataclasses import dataclass, field
from typing import (Generic, Mapping, Optional, Set, TypeVar, Generator, Tuple, NewType, Union)

# Generic type parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing why one needs 3 generic type definitions. If for specific uses, is it typical to not name them in a way to hint at what they should be used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new implementation it is down to two TypeVars. From what it seems, it does currently seem to be the convention that generic parameters be short one or two letter variables. The new implementation has restricted TypeVars which does give more guidance. If this was not an implementation detail module that is not exported anywhere, I would generally agree that this information would probably belong in the class level docstring.

return cumHash


class QuantumGraph:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this class suffers from the implementation being two separate graphs (TaskDefs vs Quanta). Sometimes the separation is hidden and other times one has to understand it (e.g., when iterating over Quanta, we get the QuantumNode, but not the TaskDef. So outside code has to turn around and ask the QuantaGraph for the TaskDef.) I think there are implementation possibilities in either direction (separate classes for each graph or completely hide the fact that internally stored separately). I guess the question is whether code would use one without the other on whether to keep as single object or separate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added TaskDef back to QuantumNode, it is something I kept going back and forth with. I think that addresses most of your concerns. Related though is the question if the taskGraph property and iterTaskGraph method should exist, and if no one objects I would like to keep it as a convenience, so if someone wants to interact with the QuantumGraph for only specific tasks (such as subset), they do not need to iterate over the whole graph to build the associations themselves.

self._taskGraph.add_edges_from(_datasetDictToEdgeIterator(self._datasetDict))
# Attempting to create a graph that contains a single task will create an
# edge from None to the task, but None is not a node we actually want
# so remove it here. This is a consequence of using add_edges from and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: add_edges from (missing underscore...was looking for the "from where")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also is the comment true or a result of how _datasetDictToEdgeIterator was implemented? Is there some modification that can be made to _datasetDictToEdgeIterator to fix this since every call to _datasetDictToEdgeIterator seems to be followed by the check for None and remove_node(None)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 As said in another comment, I think factoring all this away would probably make things cleaner, and will do that.


@property
def taskGraph(self) -> nx.DiGraph:
"""Returns a graph representing the relations between the tasks inside
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the graph() function rate a "you shouldn't really be using this method, but just in case" message, but this one doesn't?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because QuantumGraph is a extension/abstraction around the graph property (if you iterate over both containers you get the relations between QantumNodes so in some cases they could be interchangeable. The taskGraph property is just a convenience that exposes some information, and could never be used as or confused with QuantumGraph

nodes.add(node)
return self.subset(nodes)

def ansestorsOfQuantumNode(self: _T, node: QuantumNode) -> _T:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: ansestors -> ancestors

"""
write_dot(self._connectedQuanta, output)

def subset(self: _T, nodes: Union[QuantumNode, Iterable[QuantumNode]]) -> _T:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in another comment, while starting to play with the code I was looking for a subset function that took nodeNumbers. Also mentioned in that comment, it would make debugging more human-friendly if there was a way to find a node in a subset graph in the full graph using something similar to the nodeNumber.

"""Builds the graph that is used to store the relation between tasks,
and the graph that holds the relations between quanta
"""
# Datastructures used to identify relations between components; TaskDef
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datastructure is typically written as two words.

of all inputs and outputs
"""
inputs: Set[_U] = field(default_factory=set)
output: _OutputSet[_U] = field(default_factory=_OutputSet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is inputs plural and output singular?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because where this is used (tracking dsTypes / quanta) there can be multiple inputs, but only one thing should output it, so it is named such, it is a full container to catch cases where there is an error rather than clobbering. I could probably roll all these things together into a unified class though, which might make a few things cleaner looking.

"""
return set(self._quanta[taskDef])

def tasksWithInput(self, datasetTypeName: DSTypeName) -> List[TaskDef]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming of class attributes doesn't seem to be consistent. Sometimes methods are nouns and sometimes verbs.

@andy-slac
Copy link
Contributor

One comment regarding interfaces which also touches on sub-graphs and provenance.

One thing that I think sticks out in a new interface is the QuantumNode.nodeNumber thing. I think it is supposed to work OK for single graph but using it across multiple graphs is clearly non-trivial. If we need to identify nodes that may be shared across multiple graphs we need better abstraction to replace nodeNumber. Using numbers for identification is probably not going to work, we would need something like globally unique IDs (uuid) for the node itself, and maybe for the "origin" graph where the node was originally created. And then of course you need to track them down through all transformations/serializations/etc.

@natelust
Copy link
Contributor Author

I intended for nodeNumber to be only meaningful w/n one creation of a graph (and live through pickling), Something that you say give something a single pickle of the QuantumGraph and a list of node numbers to execute. This would save the need for creating many pickles of subgraphs (but would be a pipetask API change so I was just putting it in place for now). Then Michelle suggested it would be useful to preserve in sub setting for debugging reasons, which is reasonable. However, the changing nature of this property does have me re-evaluating it. That does not mean I am rushing to keep or get rid of it, but as you say it is a much more complex thing to think about.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done a quick re-review, small bunch of comments. I understand it's still not final, so I don't check "Approve" yet.

# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.
from __future__ import annotations
import io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This import should be moved below __all__

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, if I use something i have not imported my editor is "helpful" and adds the import. I'll try to keep a closer eye on this.

Comment on lines 171 to 172
inputNodes : list of `QuantumNode`
A list of nodes that are inputs to the graph
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns generator, not list.


Returns
-------
list of `DatasetTypeName`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tuple, not list, needs a name for returned item.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really cant wait until we upgrade our sphinx, it supposedly has the ability to get the types from the annotations


Raises
------
`IndexError`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No backticks around exception names, please check all docstrings.


Returns
-------
tasks : list of `TaskDef`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns generator

Raises
------
`KeyError`
Raises if the `DatasetTypeName` is not part of the `QuantumGraph`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised if

self._taskGraph = taskGraph
self._count = None

def __eq__(self, other: object) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If comparison for different graphs is useful then it may also need to support subset/superset checking or other set-like operations?

def __hash__(self):
with io.StringIO() as sio:
self.config.saveToStream(sio)
configHash = hash(sio.getvalue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that sort of test is probably not 100% conclusive. I can probably agree that formatting the same config instance should produce identical output (though even that is not guaranteed by anything). But if you take to separate config instances, they could compare equal but may produce differently formatted strings. I'm thinking about a set-like Field class where comparison does not depend on items order but formatted output could order them differently.

Copy link
Contributor

@andy-slac andy-slac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, few minor comments

@@ -175,6 +176,8 @@ def testGetQuantumNodeByNodeId(self):
inputQuanta = tuple(self.qGraph.inputQuanta)
node = self.qGraph.getQuantumNodeByNodeId(inputQuanta[0].nodeId)
self.assertEqual(node, inputQuanta[0])
wrongNode = NodeId(15, BuildId("alternative build Id"))
self.assertRaises(IncompatibleGraphError, self.qGraph.getQuantumNodeByNodeId, wrongNode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find context manager version of this test easier to read:

with self.assertRaises(IncompatibleGraphError):
    self.qGraph.getQuantumNodeByNodeId(wrongNode)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree completely. You know exactly what is being tested in the context manager version and it's easy to make it trigger simply by commenting out the with and putting in an if True: (so you don't need to change the indent).

@@ -310,6 +313,10 @@ def testSaveLoad(self):
loaded.taskName = loaded.taskName.split('.')[-1]
self.assertEqual(self.qGraph, restore)

def testContains(self):
oldNode = next(iter(self.qGraph))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old?

@@ -84,42 +82,12 @@ class QuantumNode:
def __eq__(self, other: object) -> bool:
if not isinstance(other, QuantumNode):
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May not matter in this case but I think usual pattern for such cases is to return NotImplemented so that Python can do something smart about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but surely you don't want python to do something so clever that somehow it ends up returning True?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't but maybe someone else will 🙂 NotImplemented utility is in raising an exception for non-comparable types, maybe we do not want that, but OTOH I have no idea why it's OK to compare Quantum and non-Quantum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I just read up on this and I see that returning NotImplemented causes python to switch round the equality and try to see if other.___eq__(self) gives a different answer. Then it seems it tries an is check and eventually returns false. So it seems that returning NotImplemented is the "right" thing to do but also doesn't it guarantee to slow everything down when we already know the answer is false?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think returning False is fine here; it's the equivalence-relationship analog of marking a class as "final" for inheritance relationships. So it's not something you want to do accidentally - which is why NotImplemented is the general recommendation - but a useful way to express "no future type shall ever compare as equal to instances of this type".

I actually think that making disparate types actually compare as equal to each other is almost always more trouble than its worth (and some Python devs recognized this; note that a list never compares as equal to a tuple), so I'd personally be willing to go farther and say that we should always consider False to be a better default than NotImplemented, and the Python "general recommendation" of NotImplemented is actually slightly bad advice.

else:
cumHash ^= _hashDsRef(item)
return cumHash
return hash((self.taskDef, self.quantum))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@MichelleGower MichelleGower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was able to successfully use this in ctrl_bps. Would use a subset by NodeId list if available, but only a couple of lines without it. Approve merging changes.

@natelust natelust force-pushed the tickets/DM-26371 branch 2 times, most recently from 01cbe1d to 02fcdcc Compare September 28, 2020 16:32
@natelust natelust merged commit a05bd1d into master Sep 28, 2020
@timj timj deleted the tickets/DM-26371 branch April 13, 2022 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants