New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace inheritance of NetworkX by encapsulation #501
Conversation
Check out this pull request on You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB. |
node_type_attribute="label", | ||
seed=None, | ||
): | ||
def run(self, nodes=None, n=None, length=None, metapaths=None, seed=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor this function to reduce its Cognitive Complexity from 16 to the 15 allowed.
seed=None, | ||
weighted=False, | ||
edge_weight_label="weight", | ||
self, nodes=None, n=None, p=1.0, q=1.0, length=None, seed=None, weighted=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method "run" has 8 parameters, which is greater than the 7 authorized.
stellargraph/core/graph.py
Outdated
) | ||
|
||
# XXX This has not yet been standardised in the interface. | ||
def adjacency_types(self, graph_schema: GraphSchema): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cyclomatic complexity is too high in method adjacency_types. (7)
stellargraph/core/graph.py
Outdated
Implementation based on encapsulating a NetworkX graph. | ||
""" | ||
|
||
def __init__(self, graph=None, is_directed=False, **attr): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
stellargraph/core/graph.py
Outdated
Implementation based on encapsulating a NetworkX graph. | ||
""" | ||
|
||
def __init__(self, graph=None, is_directed=False, **attr): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Code Climate has analyzed commit e70e637 and detected 23 issues on this pull request. Here's the issue category breakdown:
Note: there are 2 critical issues. View more on Code Climate. |
Returns: | ||
bool: The graph directedness status. | ||
""" | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are essentially "abstract" methods for the StellarGraph
class, right? Is it worth using something more specific to convey this, like the abc
module https://docs.python.org/3/library/abc.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purely abstract, yes. I would have to look more closely at ABC classes, but I have no objection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Using the abc
module and decorating the functions with @abstractclass
would be better.
return type.__call__(cls, *args, **kwargs) | ||
|
||
|
||
class StellarGraph(metaclass=StellarGraphFactory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having this be an abstract class, and using a metaclass here, did you consider storing the implementation graph internally? This is thinking along the lines of "Composition over inheritance".
This might look like:
class StellarGraph:
def __init__(self, *args, **kwargs):
if self.__class__ is StellarGraph:
self._graph = NetworkXStellarGraph(*args, is_directed=False, **kwargs)
elif self.__class__ is StellarDiGraph:
self._graph = NetworkXStellarGraph(*args, is_directed=False, **kwargs)
else:
...
def is_directed(self):
return self._graph.is_directed()
...
This seems to simplify the relationship between things, and means users only see StellarGraph
(or StellarDiGraph
) as the class, rather than being able to observe the NetworkXStellarGraph
class values directly.
Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like the ability to have different backends to support different underlying data types, I think that it is confusing that instantiating the StellarGraph
class gives back a different class e.g. NetworkXStellarGraph
.
Options for encapsulation:
- Subclassing StellarGraph:
- NetworkXStellarGraph
- PandasStellarGraph
- Use graph class as member
StellarGraph.graph = NetworkXGraphInterface(G)
StellarGraph.graph = PandasGraphInterface(nodes_pd, edges_pd)
- Convert all inputs to an underlying format:
StellarGraph.graph = convert_from_nx(G)
StellarGraph.graph = convert_from_pandas(nodes_pd, edges_pd)
I tend to think 2 & 3 are best for users, they see a consistent StellarGraph
object. I favour option 3 myself.
I'm happy to go ahead with the MetaClass style for now on the develop branch but I think we should think about this carefully from a user perspective before we release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced composition would buy us anything here. Firstly, every implemented method would have to delegate to the underlying implementation. Secondly, if the user instantiates StellarGraph with a NetworkX graph, then I don't see a particular problem if they get a NetworkXStellarGraph back.
Don't forget here that we still do not know exactly what methods the StellarGraph interface requires. This is being found by trial and error. Thus, the initial NetworkXStellarGraph keeps accessible implementation details that are not necessarily yet part of StellarGraph. I know this is a hack, but the point of the ticket was simply to make sure NetworkXStellarGraph no longer inherited from NetworkX graph types, without breaking existing functionality (where possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced composition would buy us anything here
It buys a simpler relationship between the various classes, because there's no metaclass trying to swap in different implementations.
But fair enough, about the rest of your points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I have said before, there have been no firm requirements for this work, and we can treat this code as purely exploratory if we want. If the requirement is that users instantiating StellarGraph get StellarGraph, we can work to that, but it will mean a performance hit due to delegating all calls to the underlying implementation.
On the other hand, Java programmers don't seem to have a problem asking for a class implementing interface X and actually getting Y. Given that that g=StellarGraph()
gives back something that satisfies isinstance(g,StellarGraph)
, I'm not sure that casual users will notice, and expert users shouldn't be surprised.
The only odd thing about my implementation is that g=StellarDiGraph()
satisfies isinstance(g,StellarGraph)
by design, which might be a bit unexpected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that it's not that unusual to have classes change, but the ramifications of this logic are more than just the types that get returned; e.g. the user has to be able to work out what arguments get passed to the StellarGraph
constructor: documenting this usefully is much harder when there isn't an __init__
that lists them all, and tooling like PyCharm & mypy likely doesn't work that great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises the question once again whether it is better to document the interface in the class or the __init__
. In this case, the NetworkX-y interface is documented in the class, leaving the init free. Answers on the web suggest that people are more likely to type help(StellarGraph)
than help(StellarGraph.__init__)
, so class docs seem the best place.
Again, the problem we have is that we do not really have fully-defined arguments, due to organic growth of functionality. Multiply this by the polymorphism inherent in trying to support multiple interfaces, and I don't really see a neat encapsulation to be able to be explicit about all arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This raises the question once again whether it is better to document the interface in the class or the init. In this case, the NetworkX-y interface is documented in the class, leaving the init free. Answers on the web suggest that people are more likely to type help(StellarGraph) than help(StellarGraph.init), so class docs seem the best place.
I think the location of the documentation is an orthogonal issue (also, help(StellarGraph)
includes the documentation for __init__
, and, in particular, its signature, which is what I care most about).
In particular, I'm getting at a more fundamental issue: having a computer-understandable definition of how a StellarGraph
can be constructed. The best case for a library (and, indeed, anything) is no documentation needed, because the interface is "perfectly intuitive", meaning a user is guided down the right path automatically without having to read about how to use it ("the easy path is the right path"). Obviously, doing that entirely is unattainable for anything non-trivial, but it's still something that can be kept as a goal.
IME, a critical part of this is making it easy for computers to understand the easy path, because then the user's computer can tell the user when they've gone wrong, as soon as possible. In this specific case, this means having the expected arguments visible to the user and their editor/IDE/tooling. As this code is written, the only way to even know the names of the arguments is to read the documentation, which is something a computer cannot easily do.
Another part of it is making it easy for library maintainers to keep the documentation up to date, so when a user is forced to consult the docs, they are useful. With highly dynamic approaches, more documentation is required (i.e. more to maintain), and, there's less chance of tooling helping maintain it (e.g. if we remove or rename an argument, remind us to update the docs for it).
For instance, you may already know that IntelliJ/pycharm can be helpful about constructing things and calling functions:
It can even show type annotations, if they exist on the function in question:
However, with the dynamic metaclasses, it can't show anything like this:
There's no hint that I might want to be passing two arguments in there.
(NB. my example of IntelliJ/PyCharm is just a proxy for any tooling that might want to reason about our python code, other examples are mypy, pylint and even documentation generators.)
Again, the problem we have is that we do not really have fully-defined arguments, due to organic growth of functionality. Multiply this by the polymorphism inherent in trying to support multiple interfaces, and I don't really see a neat encapsulation to be able to be explicit about all arguments.
This is a bit unfortunate, but it seems to me we should still think quite carefully about steps that make it harder to have great tooling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that, and agree with everything you said. BUT in this case we still have the problem of having two incompatible class argument signatures, and will continue to have until major rationalisation of the NetworkX-based version can be done.
That said, the NodeData, EdgeData and EdgeCache interfaces could all be simplified back to standard init arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving forward, I think we can drop the NetworkX backend. I don't see it as useful moving forward. This will mean we can control our class arguments more easily. I see this PR is a temporary exploration and test for isolating our API from NetworkX.
stellargraph/core/graph.py
Outdated
self._graph = graph | ||
|
||
# Name of optional attribute for edge weights | ||
self._edge_weight_label = attr.get("edge_weight_label", "weight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to require that there's a single notion of "edge weight" for any given graph, but it seems to me that's not quite true. E.g. in a transport network one could weight edges (roads/paths) by:
- distance
- time while driving
- time while cycling
- time while walking
- cost
- elevation change
- arbitrary combinations of the above
Do you think having a single one is an appropriate trade-off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing use-cases seem to pick exactly one attribute, typically weight
by default, to be the edge weight. I agree that there can be multiple types of edge weights, but if these exist at the same time, I imagine they would become edge features (not yet implemented). Again, I chose the simplest interface to match actual usage. I have no objection to this changing in future if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't understand why this had to move into the StellarGraph
class instead of remaining in the algorithms as it was before. It seems more flexible to be part of the algorithm, because different invocations can chose different attributes, if they're available, and, on the surface, it seems like this change isn't related to breaking the networkx-inheritance.
Could you expand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an intermediate solution for now. In the future we need to think about something more general that will store arbitrary node & edge features in a memory and performance efficient way.
Also, I think that edge weights should be "just" a special edge feature expected by the random walk algorithms, I don't think that the StellarGraph class needs to know that it is special.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, line 489 and down (edge_weight_label
, etc.) were existing code, I think (been a while). What part of this has moved into StellarGraph? Ok, maybe you are talking about why this has to be defined up front instead of passed to each algorithm individually? Future creep, I guess.
A fundamental assertion of the subsequent branch (which includes the new StandardStellarGraph
implementation) is that node data and edge data need to be fully encapsulated up front into standardised interfaces. Whilst it would be possible to include a method giving back data for an arbitrary 'column', I instead made it part of the design to pre-designate all necessary columns.
Based on the usages I have seen, it didn't seem problematic to just have different StellarGraph instances if one wanted to experiment with different edge weight variables. We can rethink this if it is or becomes a major decision point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What part of this has moved into StellarGraph? Ok, maybe you are talking about why this has to be defined up front instead of passed to each algorithm individually?
Yes, this edge_weight_label
argument and the self._edge_weight_label
processing has moved from each algorithm into the StellarGraph
class.
A fundamental assertion of the subsequent branch (which includes the new StandardStellarGraph implementation) is that node data and edge data need to be fully encapsulated up front into standardised interfaces
Nodes support arbitrary attributes in some sort of standardised interface, I imagine?
Whilst it would be possible to include a method giving back data for an arbitrary 'column', I instead made it part of the design to pre-designate all necessary columns.
I don't think edge weight is a necessary column (it doesn't seem to even be necessary for most of the algorithms that support obeying weights). How does this model an unweighted graph? Does it force the user to manually specify "weight": 1
for every edge?
From a neon/platform perspective, the more generic approach of arbitrary edge attributes is the right one: e.g. the bitcoin graph has timestamps on the edges, which don't make sense to pretend to be weights (and, some feature engineering uses these timestamps, although that processing is currently done purely in spark).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My plan is to implement edge features along the lines of node features (see #520). Whether or not this solves the problem of 'arbitrary' columns, I don't know. By default, if you don't specify an edge weight, it is usually assumed to be 1 - typically this would be a virtual value.
If we can come up with some simple use-cases for different sorts of edge information, then I am happy to have a (group) rethink about the design. I've been purely guessing so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether or not this solves the problem of 'arbitrary' columns, I don't know
I would think so; in my mind, edge features and columns of data on edges are synonymous (but differently presented views) of the same concept?
By default, if you don't specify an edge weight, it is usually assumed to be 1 - typically this would be a virtual value
I think the piece I was missing is that all of the downstream code handles this defaulting already (given I didn't notice any changes along these lines already), e.g. to_scipy_sparse_matrix
handles seems to ignore edges that don't have the specified weight
property, and BiasedRandomWalk
seems to just branch on if weighted:
at a very early point.
If we can come up with some simple use-cases for different sorts of edge information, then I am happy to have a (group) rethink about the design. I've been purely guessing so far.
One example: the feature engineering done on bitcoin uses edge attributes; you can see some of this done in PySpark in https://github.com/stellargraph/neon/blob/develop/scenarios/scenario-2/Bitcoin_1_Features_Engineering.ipynb in "Add transaction features" where it aggregates the transactions (edges) per day, using the timestamp associated with those edges.
seed=seed, | ||
weighted="unknown", | ||
edge_weight_label="weight", | ||
nodes=nodes, n=n, p=p, q=q, length=length, seed=seed, weighted="unknown" | ||
) | ||
|
||
with pytest.raises(ValueError): | ||
# edge weight labels are by default called weight as is in networkx but they can be any string value if user specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part of the test is no longer relevant, now that edge weight labels aren't specified in run
.
(Also, I think it was never testing what this comment implies it was testing: ValueError
will be thrown because of the weighted="unknown"
, not because of edge_weight_label=None
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. SEP if you read HHGTTG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely the problem of a PR that deletes/moves some functionality to remove outdated tests, otherwise there's just an endless accrual of weird and wacky tests that aren't doing anything useful. Fortunately this particular one isn't too entangled and we can just delete the lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, if this is a hang-over from me shifting arguments out of methods into the constructor, then I agree it is my problem. Will review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not convinced I made any changes relevant to this code, but I removed the duplicate test and reworded the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 (You deleted the edge_weight_label
argument, which this second test was attempting to validate.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments in code.
return type.__call__(cls, *args, **kwargs) | ||
|
||
|
||
class StellarGraph(metaclass=StellarGraphFactory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like the ability to have different backends to support different underlying data types, I think that it is confusing that instantiating the StellarGraph
class gives back a different class e.g. NetworkXStellarGraph
.
Options for encapsulation:
- Subclassing StellarGraph:
- NetworkXStellarGraph
- PandasStellarGraph
- Use graph class as member
StellarGraph.graph = NetworkXGraphInterface(G)
StellarGraph.graph = PandasGraphInterface(nodes_pd, edges_pd)
- Convert all inputs to an underlying format:
StellarGraph.graph = convert_from_nx(G)
StellarGraph.graph = convert_from_pandas(nodes_pd, edges_pd)
I tend to think 2 & 3 are best for users, they see a consistent StellarGraph
object. I favour option 3 myself.
I'm happy to go ahead with the MetaClass style for now on the develop branch but I think we should think about this carefully from a user perspective before we release.
Returns: | ||
bool: The graph directedness status. | ||
""" | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree. Using the abc
module and decorating the functions with @abstractclass
would be better.
stellargraph/core/graph.py
Outdated
|
||
def __call__(cls, *args, **kwargs): | ||
if cls is StellarGraph: | ||
return NetworkXStellarGraph(*args, is_directed=False, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user specifies is_directed
in kwargs
then this will fail.
e.g.
def test_directed_graph_from_nx():
Gnx = nx.karate_club_graph()
> sg = StellarGraph(Gnx, is_directed=False)
cls = <class 'stellargraph.core.graph.StellarGraph'>, args = (<networkx.classes.graph.Graph object at 0x13c65ef28>,)
kwargs = {'is_directed': False}
def __call__(cls, *args, **kwargs):
if cls is StellarGraph:
> return NetworkXStellarGraph(*args, is_directed=False, **kwargs)
E TypeError: StellarGraphFactory object got multiple values for keyword argument 'is_directed'
stellargraph/core/graph.py:249: TypeError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I left it up to the Python infrastructure to complain, since is_directed
was left undocumented. The other thing we could do is raise an error saying to not explicitly set is_directed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think raising an error if is_directed
and the class-specified directedness don't agree would be good.
stellargraph/core/graph.py
Outdated
self._graph = graph | ||
|
||
# Name of optional attribute for edge weights | ||
self._edge_weight_label = attr.get("edge_weight_label", "weight") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an intermediate solution for now. In the future we need to think about something more general that will store arbitrary node & edge features in a memory and performance efficient way.
Also, I think that edge weights should be "just" a special edge feature expected by the random walk algorithms, I don't think that the StellarGraph class needs to know that it is special.
stellargraph/core/graph.py
Outdated
def node_degrees(self) -> Mapping[Any, int]: | ||
return self._graph.degree() | ||
|
||
def adjacency_weights(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be better called to_adjacency_matrix
, and have an option of using edge weights or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a conceptual problem here in that this ticket is (supposedly) about making minimal changes to stop StellarGraph inheriting from NetworkX, whereas the subsequent ticket (see #520) is about distilling the StellarGraph interface and making a time- and memory-efficient implementation that does not use NetworkX, including encapsulating edge and node data in a standard way. So some of the concerns, and some of my answers, keep crossing between these two tickets.
This is one of the reasons I've been trying to push ahead with crystallising what the StellarGraph interface currently supports (i.e. everything that NetworkXStellarGraph can do, rightly or wrongly, including create_graph_schema()
), in advance of trying to optimise what it actually should do. And I wholeheartedly agree that this approach runs the risk of implementing a suboptimal design, but we're working in the dark otherwise. We all need a good think about this, because I don't have any ready answers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your points, but as I understand it this function was introduced to the class in this ticket, or am I mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual function adjacency_weights()
was created in this ticket, but the functionality itself was already in use in several places. This change just hid away the fact that the code was using NetworkX-specific structures and calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so can we can change the name to to_adjacency_matrix
?
# even when the seed is set. | ||
adj[et][n1] = sorted(neigh_et, key=str) | ||
self.adj_types = adj | ||
self.adj_types = adj = self.graph.adjacency_types(self.graph_schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this object should be replaced with functionality inside the stellargraph class, this is just used for efficiency in sampling therefore we can use a typed adjacency list (as proposed as MDAL in this document https://docs.google.com/document/d/1ztdDOmIUFerSVu-R0fgKP3vYTTdZBmY9qbSoIXb3NWQ)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which it is in #520 (see my previous comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Let's move that to be decided in the next ticket and in future API discussions.
stellargraph/data/explorer.py
Outdated
self._raise_error("node {} not in graph".format(node)) | ||
return list(nx.neighbors(self.graph, node)) | ||
return list(self.graph.neighbour_nodes(node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This involves accessing the graph twice (once to check, once to get neighbours). Better to use a try/except:
try:
neighbours = list(self.graph.neighbour_nodes(node))
except KeyError:
self._raise_error("node {} not in graph".format(node))
if isinstance(G, StellarGraph): | ||
nodes = list(G.nodes()) | ||
else: | ||
nodes = list(G) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't list(G.nodes()) work for NetworkX objects too? I don't think we need the test, just the single line:
nodes = list(G.nodes())
# | ||
# Copyright 2019 Data61, CSIRO | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch .. how did we miss this?!
stellargraph/core/graph.py
Outdated
def number_of_edges(self) -> int: | ||
return self._graph.number_of_edges() | ||
|
||
def nodes(self) -> Iterable[Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have list(G.nodes())
everywhere we need to get the nodes, as NetworkX returns an iterator. Can we just convert to a list in this method, rather than in most calls to it? This can be part of the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The answer is two-fold: 1. Always converting to a list is potentially a waste for a big graph if all we want to do is iterate over the nodes. 2. If we have a complex data-type (e.g. a node-type to node-data mapping), using itertools.chain()
or generators is very convenient (see #520).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also an example of the design principle to prefer abstract types (e.g. Iterable) over concrete implementations (e.g. list). Part of the 'D' in 'SOLID', I think.
The user having to do list(g.nodes())
is an indicator that they understand they will be allocating additional memory to store the nodes, rather than that choice being forced on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Although somehow we always end up converting to a list!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually because we want to sub-sample the nodes. But I think we occasionally just iterate through the nodes.
Done.
Done.
Done.
Not sure about this one. I created both Okay, so my observation still stands. For an undirected graph, NetworkX
Checked - okay.
Checked - not happy with results.
|
|
@adocherty , still not sure about the Set vs Iterable problem (note that a set is also iterable). |
The deltas seem like pretty close (within a standard deviation, at least), except for some of the |
My first runs had every timing benchmark and memory allocation higher for drop-networkx than for develop. Which is why I shut down everything else to rerun the tests. Plus I did some twice just in case switching between branches was causing Python recompilation. These comparisons don't seem too bad, though still disappointingly slow in some cases. |
Ah, I see. I'm glad the quiet run was better 👍
Could you clarify what you mean by this? My understanding is that this patch is not expected to make any performance difference, other than potentially introducing fractionally more time overhead due to having to do some extra function calls and member look-ups, and fractionally more memory use due to having the extra wrapper object. |
As far as I recall, this branch mostly just shifted existing functionality from an implementation class called So my concern is I don't know why some tests seem slower on average. |
Sorry, not iteratable but iterator. Undirected graphs return an iterator from NetworkX; however, for directed graphs a set is returned (which is iterable but not an iterator). This is done when you collect the set of in and out nodes over the edges for directed graphs. Note that using import stellargraph as sg
import networkx as nx
g = nx.erdos_renyi_graph(100, 0.1) Undirected graph: gs=sg.StellarGraph(g)
gs.neighbors(2) output:
Directed graph: gs=sg.StellarDiGraph(g)
print(gs.neighbors(2))
type(gs.neighbors(2)) output:
I think what you want to return to have more consistency in return types is this:
output:
|
Most of the tests seem within the margin of variation expected by the std dev. There are a few that seem slower. I assume this is due to the abstraction overhead caused by the new class performing more tests and then calling NetworkX. I'm not sure they are that much worse that we need to worry. |
In my tests, |
They are still different return types even though they are both iterable. It would be better to return the same object type regardless of if it's directed or not. Specifically, My point here is: wrap sets returned in |
I disagree, I'm afraid. If the interface says you will get something that can be iterated over, and you get something back that can be iterated over, the the interface is satisfied; the actual return type is irrelevant as far as functionality is concerned. I don't see that consistency of return type matters. @youph , you're the product owner, so you can make the call.
Not sure if you mean that |
@youph , if you want a check added that a NetworkX graph has been supplied, then this logically should go into the implementation So, I can add a check in |
@youph , @adocherty , I've added a note in |
This was a pretty minor point, but somehow we have ended up arguing about it for far too long.
You are right – it's difficult to return an Iterator for in and out nodes with the current implementation. You have addressed my other issues, and this seems to be an important point for you, so I'm OK to go ahead with it this way. |
Cheers! |
@geoff-d61 @adocherty I just discovered that this PR breaks the interpretability demos, e.g., this one:
How do we fix this? |
Under this revision, StellarGraph is no longer also a NetworkX - it now has its own interface. The NetworkXStellarGraph implementation class (formerly StellarGraphBase) now wraps the supplied NetworkX object.