Replace inheritance of NetworkX by encapsulation #501

geoffj-d61 · 2019-10-10T05:32:04Z

Under this revision, StellarGraph is no longer also a NetworkX - it now has its own interface. The NetworkXStellarGraph implementation class (formerly StellarGraphBase) now wraps the supplied NetworkX object.

review-notebook-app · 2019-10-10T05:32:09Z

Check out this pull request on

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

codeclimate · 2019-10-10T05:33:04Z

stellargraph/data/explorer.py

-        node_type_attribute="label",
-        seed=None,
-    ):
+    def run(self, nodes=None, n=None, length=None, metapaths=None, seed=None):


Refactor this function to reduce its Cognitive Complexity from 16 to the 15 allowed.

codeclimate · 2019-10-10T05:33:05Z

stellargraph/data/explorer.py

-        seed=None,
-        weighted=False,
-        edge_weight_label="weight",
+        self, nodes=None, n=None, p=1.0, q=1.0, length=None, seed=None, weighted=False


Method "run" has 8 parameters, which is greater than the 7 authorized.

codeclimate · 2019-10-10T05:33:05Z

stellargraph/core/graph.py

+        )
+
+    # XXX This has not yet been standardised in the interface.
+    def adjacency_types(self, graph_schema: GraphSchema):


Cyclomatic complexity is too high in method adjacency_types. (7)

stellargraph/core/graph.py

+    Implementation based on encapsulating a NetworkX graph.
+    """
+
+    def __init__(self, graph=None, is_directed=False, **attr):


stellargraph/core/graph.py

+    Implementation based on encapsulating a NetworkX graph.
+    """
+
+    def __init__(self, graph=None, is_directed=False, **attr):


codeclimate · 2019-10-10T05:33:07Z

Code Climate has analyzed commit e70e637 and detected 23 issues on this pull request.

Here's the issue category breakdown:

Category	Count
Complexity	18
Security	5

Note: there are 2 critical issues.

View more on Code Climate.

huonw · 2019-11-18T05:32:35Z

stellargraph/core/graph.py

+        Returns:
+             bool: The graph directedness status.
+        """
+        raise NotImplementedError


These are essentially "abstract" methods for the StellarGraph class, right? Is it worth using something more specific to convey this, like the abc module https://docs.python.org/3/library/abc.html?

Purely abstract, yes. I would have to look more closely at ABC classes, but I have no objection.

Yes, I agree. Using the abc module and decorating the functions with @abstractclass would be better.

huonw · 2019-11-18T05:38:18Z

stellargraph/core/graph.py

+            return type.__call__(cls, *args, **kwargs)
+
+
+class StellarGraph(metaclass=StellarGraphFactory):


Instead of having this be an abstract class, and using a metaclass here, did you consider storing the implementation graph internally? This is thinking along the lines of "Composition over inheritance".

This might look like:

class StellarGraph: def __init__(self, *args, **kwargs): if self.__class__ is StellarGraph: self._graph = NetworkXStellarGraph(*args, is_directed=False, **kwargs) elif self.__class__ is StellarDiGraph: self._graph = NetworkXStellarGraph(*args, is_directed=False, **kwargs) else: ... def is_directed(self): return self._graph.is_directed() ...

This seems to simplify the relationship between things, and means users only see StellarGraph (or StellarDiGraph) as the class, rather than being able to observe the NetworkXStellarGraph class values directly.

Any thoughts?

While I like the ability to have different backends to support different underlying data types, I think that it is confusing that instantiating the StellarGraph class gives back a different class e.g. NetworkXStellarGraph.

Options for encapsulation:

Subclassing StellarGraph:

NetworkXStellarGraph

PandasStellarGraph

Use graph class as member
StellarGraph.graph = NetworkXGraphInterface(G)
StellarGraph.graph = PandasGraphInterface(nodes_pd, edges_pd)

Convert all inputs to an underlying format:
StellarGraph.graph = convert_from_nx(G)
StellarGraph.graph = convert_from_pandas(nodes_pd, edges_pd)

I tend to think 2 & 3 are best for users, they see a consistent StellarGraph object. I favour option 3 myself.

I'm happy to go ahead with the MetaClass style for now on the develop branch but I think we should think about this carefully from a user perspective before we release.

I'm not convinced composition would buy us anything here. Firstly, every implemented method would have to delegate to the underlying implementation. Secondly, if the user instantiates StellarGraph with a NetworkX graph, then I don't see a particular problem if they get a NetworkXStellarGraph back.

Don't forget here that we still do not know exactly what methods the StellarGraph interface requires. This is being found by trial and error. Thus, the initial NetworkXStellarGraph keeps accessible implementation details that are not necessarily yet part of StellarGraph. I know this is a hack, but the point of the ticket was simply to make sure NetworkXStellarGraph no longer inherited from NetworkX graph types, without breaking existing functionality (where possible).

I'm not convinced composition would buy us anything here

It buys a simpler relationship between the various classes, because there's no metaclass trying to swap in different implementations.

But fair enough, about the rest of your points.

As I have said before, there have been no firm requirements for this work, and we can treat this code as purely exploratory if we want. If the requirement is that users instantiating StellarGraph get StellarGraph, we can work to that, but it will mean a performance hit due to delegating all calls to the underlying implementation.

On the other hand, Java programmers don't seem to have a problem asking for a class implementing interface X and actually getting Y. Given that that g=StellarGraph() gives back something that satisfies isinstance(g,StellarGraph), I'm not sure that casual users will notice, and expert users shouldn't be surprised.

The only odd thing about my implementation is that g=StellarDiGraph() satisfies isinstance(g,StellarGraph) by design, which might be a bit unexpected.

It's true that it's not that unusual to have classes change, but the ramifications of this logic are more than just the types that get returned; e.g. the user has to be able to work out what arguments get passed to the StellarGraph constructor: documenting this usefully is much harder when there isn't an __init__ that lists them all, and tooling like PyCharm & mypy likely doesn't work that great.

This raises the question once again whether it is better to document the interface in the class or the __init__. In this case, the NetworkX-y interface is documented in the class, leaving the init free. Answers on the web suggest that people are more likely to type help(StellarGraph) than help(StellarGraph.__init__), so class docs seem the best place.

Again, the problem we have is that we do not really have fully-defined arguments, due to organic growth of functionality. Multiply this by the polymorphism inherent in trying to support multiple interfaces, and I don't really see a neat encapsulation to be able to be explicit about all arguments.

This raises the question once again whether it is better to document the interface in the class or the init. In this case, the NetworkX-y interface is documented in the class, leaving the init free. Answers on the web suggest that people are more likely to type help(StellarGraph) than help(StellarGraph.init), so class docs seem the best place.

I think the location of the documentation is an orthogonal issue (also, help(StellarGraph) includes the documentation for __init__, and, in particular, its signature, which is what I care most about).

In particular, I'm getting at a more fundamental issue: having a computer-understandable definition of how a StellarGraph can be constructed. The best case for a library (and, indeed, anything) is no documentation needed, because the interface is "perfectly intuitive", meaning a user is guided down the right path automatically without having to read about how to use it ("the easy path is the right path"). Obviously, doing that entirely is unattainable for anything non-trivial, but it's still something that can be kept as a goal.

IME, a critical part of this is making it easy for computers to understand the easy path, because then the user's computer can tell the user when they've gone wrong, as soon as possible. In this specific case, this means having the expected arguments visible to the user and their editor/IDE/tooling. As this code is written, the only way to even know the names of the arguments is to read the documentation, which is something a computer cannot easily do.

Another part of it is making it easy for library maintainers to keep the documentation up to date, so when a user is forced to consult the docs, they are useful. With highly dynamic approaches, more documentation is required (i.e. more to maintain), and, there's less chance of tooling helping maintain it (e.g. if we remove or rename an argument, remind us to update the docs for it).

For instance, you may already know that IntelliJ/pycharm can be helpful about constructing things and calling functions:

It can even show type annotations, if they exist on the function in question:

However, with the dynamic metaclasses, it can't show anything like this:

There's no hint that I might want to be passing two arguments in there.

(NB. my example of IntelliJ/PyCharm is just a proxy for any tooling that might want to reason about our python code, other examples are mypy, pylint and even documentation generators.)

Again, the problem we have is that we do not really have fully-defined arguments, due to organic growth of functionality. Multiply this by the polymorphism inherent in trying to support multiple interfaces, and I don't really see a neat encapsulation to be able to be explicit about all arguments.

This is a bit unfortunate, but it seems to me we should still think quite carefully about steps that make it harder to have great tooling.

I understand that, and agree with everything you said. BUT in this case we still have the problem of having two incompatible class argument signatures, and will continue to have until major rationalisation of the NetworkX-based version can be done.

That said, the NodeData, EdgeData and EdgeCache interfaces could all be simplified back to standard init arguments.

Moving forward, I think we can drop the NetworkX backend. I don't see it as useful moving forward. This will mean we can control our class arguments more easily. I see this PR is a temporary exploration and test for isolating our API from NetworkX.

huonw · 2019-11-18T21:36:42Z

stellargraph/core/graph.py

+        self._graph = graph
+
+        # Name of optional attribute for edge weights
+        self._edge_weight_label = attr.get("edge_weight_label", "weight")


This seems to require that there's a single notion of "edge weight" for any given graph, but it seems to me that's not quite true. E.g. in a transport network one could weight edges (roads/paths) by:

distance

time while driving

time while cycling

time while walking

cost

elevation change

arbitrary combinations of the above

Do you think having a single one is an appropriate trade-off?

The existing use-cases seem to pick exactly one attribute, typically weight by default, to be the edge weight. I agree that there can be multiple types of edge weights, but if these exist at the same time, I imagine they would become edge features (not yet implemented). Again, I chose the simplest interface to match actual usage. I have no objection to this changing in future if necessary.

I guess I don't understand why this had to move into the StellarGraph class instead of remaining in the algorithms as it was before. It seems more flexible to be part of the algorithm, because different invocations can chose different attributes, if they're available, and, on the surface, it seems like this change isn't related to breaking the networkx-inheritance.

Could you expand?

I think this is an intermediate solution for now. In the future we need to think about something more general that will store arbitrary node & edge features in a memory and performance efficient way.

Also, I think that edge weights should be "just" a special edge feature expected by the random walk algorithms, I don't think that the StellarGraph class needs to know that it is special.

Ok, line 489 and down (edge_weight_label, etc.) were existing code, I think (been a while). What part of this has moved into StellarGraph? Ok, maybe you are talking about why this has to be defined up front instead of passed to each algorithm individually? Future creep, I guess.

A fundamental assertion of the subsequent branch (which includes the new StandardStellarGraph implementation) is that node data and edge data need to be fully encapsulated up front into standardised interfaces. Whilst it would be possible to include a method giving back data for an arbitrary 'column', I instead made it part of the design to pre-designate all necessary columns.

Based on the usages I have seen, it didn't seem problematic to just have different StellarGraph instances if one wanted to experiment with different edge weight variables. We can rethink this if it is or becomes a major decision point.

What part of this has moved into StellarGraph? Ok, maybe you are talking about why this has to be defined up front instead of passed to each algorithm individually?

Yes, this edge_weight_label argument and the self._edge_weight_label processing has moved from each algorithm into the StellarGraph class.

A fundamental assertion of the subsequent branch (which includes the new StandardStellarGraph implementation) is that node data and edge data need to be fully encapsulated up front into standardised interfaces

Nodes support arbitrary attributes in some sort of standardised interface, I imagine?

Whilst it would be possible to include a method giving back data for an arbitrary 'column', I instead made it part of the design to pre-designate all necessary columns.

I don't think edge weight is a necessary column (it doesn't seem to even be necessary for most of the algorithms that support obeying weights). How does this model an unweighted graph? Does it force the user to manually specify "weight": 1 for every edge?

From a neon/platform perspective, the more generic approach of arbitrary edge attributes is the right one: e.g. the bitcoin graph has timestamps on the edges, which don't make sense to pretend to be weights (and, some feature engineering uses these timestamps, although that processing is currently done purely in spark).

My plan is to implement edge features along the lines of node features (see #520). Whether or not this solves the problem of 'arbitrary' columns, I don't know. By default, if you don't specify an edge weight, it is usually assumed to be 1 - typically this would be a virtual value.

If we can come up with some simple use-cases for different sorts of edge information, then I am happy to have a (group) rethink about the design. I've been purely guessing so far.

Whether or not this solves the problem of 'arbitrary' columns, I don't know

I would think so; in my mind, edge features and columns of data on edges are synonymous (but differently presented views) of the same concept?

By default, if you don't specify an edge weight, it is usually assumed to be 1 - typically this would be a virtual value

I think the piece I was missing is that all of the downstream code handles this defaulting already (given I didn't notice any changes along these lines already), e.g. to_scipy_sparse_matrix handles seems to ignore edges that don't have the specified weight property, and BiasedRandomWalk seems to just branch on if weighted: at a very early point.

If we can come up with some simple use-cases for different sorts of edge information, then I am happy to have a (group) rethink about the design. I've been purely guessing so far.

One example: the feature engineering done on bitcoin uses edge attributes; you can see some of this done in PySpark in https://github.com/stellargraph/neon/blob/develop/scenarios/scenario-2/Bitcoin_1_Features_Engineering.ipynb in "Add transaction features" where it aggregates the transactions (edges) per day, using the timestamp associated with those edges.

stellargraph/data/converter.py

huonw · 2019-11-18T21:58:36Z

tests/data/test_biased_random_walker.py

-                seed=seed,
-                weighted="unknown",
-                edge_weight_label="weight",
+                nodes=nodes, n=n, p=p, q=q, length=length, seed=seed, weighted="unknown"
            )

        with pytest.raises(ValueError):
            # edge weight labels are by default called weight as is in networkx but they can be any string value if user specified


I think this part of the test is no longer relevant, now that edge weight labels aren't specified in run.

(Also, I think it was never testing what this comment implies it was testing: ValueError will be thrown because of the weighted="unknown", not because of edge_weight_label=None.)

Probably. SEP if you read HHGTTG.

It's definitely the problem of a PR that deletes/moves some functionality to remove outdated tests, otherwise there's just an endless accrual of weird and wacky tests that aren't doing anything useful. Fortunately this particular one isn't too entangled and we can just delete the lines.

Ok, if this is a hang-over from me shifting arguments out of methods into the constructor, then I agree it is my problem. Will review.

Still not convinced I made any changes relevant to this code, but I removed the duplicate test and reworded the comment.

👍 (You deleted the edge_weight_label argument, which this second test was attempting to validate.)

tests/mapper/test_node_mappers.py

adocherty

See comments in code.

adocherty · 2019-11-18T22:28:21Z

stellargraph/core/graph.py

+            return type.__call__(cls, *args, **kwargs)
+
+
+class StellarGraph(metaclass=StellarGraphFactory):


While I like the ability to have different backends to support different underlying data types, I think that it is confusing that instantiating the StellarGraph class gives back a different class e.g. NetworkXStellarGraph.

Options for encapsulation:

Subclassing StellarGraph:

NetworkXStellarGraph

PandasStellarGraph

Use graph class as member
StellarGraph.graph = NetworkXGraphInterface(G)
StellarGraph.graph = PandasGraphInterface(nodes_pd, edges_pd)

Convert all inputs to an underlying format:
StellarGraph.graph = convert_from_nx(G)
StellarGraph.graph = convert_from_pandas(nodes_pd, edges_pd)

I tend to think 2 & 3 are best for users, they see a consistent StellarGraph object. I favour option 3 myself.

I'm happy to go ahead with the MetaClass style for now on the develop branch but I think we should think about this carefully from a user perspective before we release.

adocherty · 2019-11-18T22:29:13Z

stellargraph/core/graph.py

+        Returns:
+             bool: The graph directedness status.
+        """
+        raise NotImplementedError


Yes, I agree. Using the abc module and decorating the functions with @abstractclass would be better.

adocherty · 2019-11-18T22:30:11Z

stellargraph/core/graph.py

+
+    def __call__(cls, *args, **kwargs):
+        if cls is StellarGraph:
+            return NetworkXStellarGraph(*args, is_directed=False, **kwargs)


If the user specifies is_directed in kwargs then this will fail.

e.g.

def test_directed_graph_from_nx(): Gnx = nx.karate_club_graph() > sg = StellarGraph(Gnx, is_directed=False) cls = <class 'stellargraph.core.graph.StellarGraph'>, args = (<networkx.classes.graph.Graph object at 0x13c65ef28>,) kwargs = {'is_directed': False} def __call__(cls, *args, **kwargs): if cls is StellarGraph: > return NetworkXStellarGraph(*args, is_directed=False, **kwargs) E TypeError: StellarGraphFactory object got multiple values for keyword argument 'is_directed' stellargraph/core/graph.py:249: TypeError

I agree, I left it up to the Python infrastructure to complain, since is_directed was left undocumented. The other thing we could do is raise an error saying to not explicitly set is_directed.

I think raising an error if is_directed and the class-specified directedness don't agree would be good.

adocherty · 2019-11-18T23:05:18Z

stellargraph/core/graph.py

+        self._graph = graph
+
+        # Name of optional attribute for edge weights
+        self._edge_weight_label = attr.get("edge_weight_label", "weight")


I think this is an intermediate solution for now. In the future we need to think about something more general that will store arbitrary node & edge features in a memory and performance efficient way.

Also, I think that edge weights should be "just" a special edge feature expected by the random walk algorithms, I don't think that the StellarGraph class needs to know that it is special.

adocherty · 2019-11-18T23:09:30Z

stellargraph/core/graph.py

+    def node_degrees(self) -> Mapping[Any, int]:
+        return self._graph.degree()
+
+    def adjacency_weights(self):


I think this would be better called to_adjacency_matrix, and have an option of using edge weights or not.

There is a conceptual problem here in that this ticket is (supposedly) about making minimal changes to stop StellarGraph inheriting from NetworkX, whereas the subsequent ticket (see #520) is about distilling the StellarGraph interface and making a time- and memory-efficient implementation that does not use NetworkX, including encapsulating edge and node data in a standard way. So some of the concerns, and some of my answers, keep crossing between these two tickets.

This is one of the reasons I've been trying to push ahead with crystallising what the StellarGraph interface currently supports (i.e. everything that NetworkXStellarGraph can do, rightly or wrongly, including create_graph_schema()), in advance of trying to optimise what it actually should do. And I wholeheartedly agree that this approach runs the risk of implementing a suboptimal design, but we're working in the dark otherwise. We all need a good think about this, because I don't have any ready answers.

I agree with your points, but as I understand it this function was introduced to the class in this ticket, or am I mistaken?

The actual function adjacency_weights() was created in this ticket, but the functionality itself was already in use in several places. This change just hid away the fact that the code was using NetworkX-specific structures and calls.

OK, so can we can change the name to to_adjacency_matrix?

adocherty · 2019-11-19T04:35:27Z

stellargraph/data/explorer.py

-                    # even when the seed is set.
-                    adj[et][n1] = sorted(neigh_et, key=str)
-            self.adj_types = adj
+            self.adj_types = adj = self.graph.adjacency_types(self.graph_schema)


I think that this object should be replaced with functionality inside the stellargraph class, this is just used for efficiency in sampling therefore we can use a typed adjacency list (as proposed as MDAL in this document https://docs.google.com/document/d/1ztdDOmIUFerSVu-R0fgKP3vYTTdZBmY9qbSoIXb3NWQ)

Which it is in #520 (see my previous comment).

Agreed. Let's move that to be decided in the next ticket and in future API discussions.

adocherty · 2019-11-19T04:44:13Z

stellargraph/data/explorer.py

            self._raise_error("node {} not in graph".format(node))
-        return list(nx.neighbors(self.graph, node))
+        return list(self.graph.neighbour_nodes(node))


This involves accessing the graph twice (once to check, once to get neighbours). Better to use a try/except:

try: neighbours = list(self.graph.neighbour_nodes(node)) except KeyError: self._raise_error("node {} not in graph".format(node))

adocherty · 2019-11-19T04:52:28Z

stellargraph/data/node_splitter.py

+        if isinstance(G, StellarGraph):
+            nodes = list(G.nodes())
+        else:
+            nodes = list(G)


Doesn't list(G.nodes()) work for NetworkX objects too? I don't think we need the test, just the single line:

nodes = list(G.nodes())

adocherty · 2019-11-19T05:03:29Z

stellargraph/data/unsupervised_sampler.py

+#
+# Copyright 2019 Data61, CSIRO
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.


Nice catch .. how did we miss this?!

adocherty · 2019-11-19T05:30:05Z

stellargraph/core/graph.py

+    def number_of_edges(self) -> int:
+        return self._graph.number_of_edges()
+
+    def nodes(self) -> Iterable[Any]:


We have list(G.nodes()) everywhere we need to get the nodes, as NetworkX returns an iterator. Can we just convert to a list in this method, rather than in most calls to it? This can be part of the API.

The answer is two-fold: 1. Always converting to a list is potentially a waste for a big graph if all we want to do is iterate over the nodes. 2. If we have a complex data-type (e.g. a node-type to node-data mapping), using itertools.chain() or generators is very convenient (see #520).

This is also an example of the design principle to prefer abstract types (e.g. Iterable) over concrete implementations (e.g. list). Part of the 'D' in 'SOLID', I think.

The user having to do list(g.nodes()) is an indicator that they understand they will be allocating additional memory to store the nodes, rather than that choice being forced on them.

Agreed. Although somehow we always end up converting to a list!

Usually because we want to sub-sample the nodes. But I think we occasionally just iterate through the nodes.

geoffj-d61 · 2019-12-11T03:08:43Z

@geoffj-d61

I think we should merge this ASAP. To wrap this up I think we should do the following:

Rename the graph .neighbour_nodes method as .neighbors.

Done.

Merge with develop and fix merge conflicts (mainly around the time/memory benchmarks @huonw has written)

Done.

Change the name of adjacency_weights to to_adjacency_matrix

Done.

Standardize the return type of the .neighbors method. Currently it seems to return a set if it's directed and an iterator otherwise.

Not sure about this one. I created both StellarGraph(gnx) and StellarDiGraph(gnx), and .nodes() gave me a NodeView (which is Iterable) in each case. Where does the set value come into play? Addendum: Doh! It's neighbors() not nodes() you were talking about - will recheck.

Okay, so my observation still stands. For an undirected graph, NetworkX .neighbors() returns a dict_keyiterator that is Iterable, so NetworkXStellarGraph just returns this. For a directed graph, NetworkX returns another dict_keyiterator that only contains the out-nodes, so NetworkXStellarGraph corrects this by also including the in-nodes in a set (to deduplicate in the case of self edges). Since a set is also Iterable, both outputs obey the Iterable interface defined in StellarGraph - the underlying type of the return value is an implementation issue, and users should neither care nor rely on a specific type.

Check all demos to ensure they are working.

Checked - okay.

Check the benchmarks checked in by @huonw on develop and on this branch to check for any major performance issues.

Checked - not happy with results.
I've run the benchmarks for both develop and this branch, and this branch seems slower (don't know why). I will rerun both tests after lunch, first shutting down everything non-essential, and put the comparison in here.

After this, I'm good with merging.

geoffj-d61 · 2019-12-11T05:43:45Z

         develop                       drop-networkx
       Mean      (StdDev)            Mean      (StdDev)
-------------------------------------------------------------------------------------------------------------------------------
test_benchmark_biasedrandomwalk
           63.2991 (    9.0200)          75.2602 (    14.4769)
test_benchmark_uniformrandomwalk
           70.4323 (   14.1472)          78.3540 (    10.7312)
test_benchmark_uniformrandommetapathwalk
           97.3489 (   20.6781)         128.1268 (    16.6644)
test_benchmark_bfs_walk
          164.7183 (   17.4904)         179.8372 (    17.1532)
test_benchmark_biasedweightedrandomwalk
          304.9791 (   33.4000)         421.5909 (    64.5393)
test_benchmark_sampledheterogeneousbreadthfirstwalk
          406.8566 (   43.5902)         409.1014 (    43.4972)
test_benchmark_bfs_walk
        1,471.4640 (   97.9828)       1,330.2763 (   150.8902)
-------------------------------------------------------------------------------------------------------------------------------
test_allocation_benchmark_creation_from_networkx[None-0-0]
        1,260.8000 (  288.3553)       1,688.4000 (   264.1782)
test_allocation_benchmark_creation_from_networkx[100-0-0]
        1,529.2000 (  112.6978)       1,872.8000 (    75.1319)
test_allocation_benchmark_creation_from_networkx[None-100-200]
      137,684.8000 (4,196.4479)     138,900.8000 ( 4,166.7269)
test_allocation_benchmark_creation_from_networkx[100-100-200]
      227,676.4000 (4,211.7674)     230,750.0000 ( 4,075.8318)
test_allocation_benchmark_creation_from_networkx[None-1000-5000]
    3,171,509.6000 (4,103.9143    3,178,549.6000 ( 4,099.7130)
test_allocation_benchmark_creation_from_networkx[100-1000-5000]
    4,010,263.2000 (4,106.4258)   4,011,485.6000 ( 3,996.6412)
-------------------------------------------------------------------------------------------------------------------------------
test_benchmark_get_neighbours
           21.7508 (    6.3989)          19.0489 (     3.5435)
-------------------------------------------------------------------------------------------------------------------------------
test_benchmark_get_features[specify-4]
           26.0074 (    6.4494)          25.8620 (     3.3190)
test_benchmark_get_features[specify-1]
           28.2127 (    6.8341)          28.5375 (     8.4647)
test_benchmark_get_features[infer-4]
           67.2492 (   17.3558           68.9817 (    16.6034)
test_benchmark_get_features[infer-1]
           67.2902 (   10.9388)          73.6866 (    20.1374)
-------------------------------------------------------------------------------------------------------------------------------
test_benchmark_setup_generator_small
            2.6805 (    0.2293)           2.6566 (     0.1555)
test_benchmark_node_generator_small
           94.3042 (    2.9019           97.3943 (     3.4163)
test_benchmark_link_generator_small
           94.6126 (    0.7385)          98.4902 (     0.5390)
test_benchmark_setup_generator_large
          121.5673 (    2.2757)         130.5395 (    13.5090)
test_benchmark_node_generator_large
          120.6086 (    0.7242)         126.7655 (     1.2720)
test_benchmark_link_generator_large
          124.5168 (    0.1740)         130.4071 (     0.1578)
-------------------------------------------------------------------------------------------------------------------------------

geoffj-d61 · 2019-12-11T06:01:07Z

@adocherty , still not sure about the Set vs Iterable problem (note that a set is also iterable).
Have added the benchmark comparisons - could you see if you are okay with them?

huonw · 2019-12-11T23:26:57Z

The deltas seem like pretty close (within a standard deviation, at least), except for some of the node_generator and link_generator ones. Do you suspect there's something deeper going on?

geoffj-d61 · 2019-12-11T23:37:15Z

The deltas seem like pretty close (within a standard deviation, at least), except for some of the node_generator and link_generator ones. Do you suspect there's something deeper going on?

My first runs had every timing benchmark and memory allocation higher for drop-networkx than for develop. Which is why I shut down everything else to rerun the tests. Plus I did some twice just in case switching between branches was causing Python recompilation.

These comparisons don't seem too bad, though still disappointingly slow in some cases.

huonw · 2019-12-11T23:45:44Z

My first runs had every timing benchmark and memory allocation higher for drop-networkx than for develop

Ah, I see. I'm glad the quiet run was better 👍

These comparisons don't seem too bad, though still disappointingly slow in some cases.

Could you clarify what you mean by this? My understanding is that this patch is not expected to make any performance difference, other than potentially introducing fractionally more time overhead due to having to do some extra function calls and member look-ups, and fractionally more memory use due to having the extra wrapper object.

geoffj-d61 · 2019-12-11T23:51:05Z

As far as I recall, this branch mostly just shifted existing functionality from an implementation class called StellarGraph to a new implementation class called NetworkXStellarGraph. As a consequence, I wouldn't have expected timing benchmarks to change. Allocation I expected to maybe be a little higher due to the new interface, etc.

So my concern is I don't know why some tests seem slower on average.

adocherty · 2019-12-12T02:11:19Z

@adocherty , still not sure about the Set vs Iterable problem (note that a set is also iterable).
Have added the benchmark comparisons - could you see if you are okay with them?

Sorry, not iteratable but iterator. Undirected graphs return an iterator from NetworkX; however, for directed graphs a set is returned (which is iterable but not an iterator). This is done when you collect the set of in and out nodes over the edges for directed graphs. Note that using iter on a set will return an iterator similar to what NetworkX returns:

import stellargraph as sg
import networkx as nx

g = nx.erdos_renyi_graph(100, 0.1)

Undirected graph:

gs=sg.StellarGraph(g)
gs.neighbors(2)

output:

<dict_keyiterator object at 0x137d15638>

Directed graph:

gs=sg.StellarDiGraph(g)
print(gs.neighbors(2))
type(gs.neighbors(2))

output:

{97, 98, 3, 41, 78, 51, 60}
set

I think what you want to return to have more consistency in return types is this:

iter(gs.neighbors(2))

output:

<set_iterator at 0x137c63af8>

adocherty · 2019-12-12T02:16:29Z

As far as I recall, this branch mostly just shifted existing functionality from an implementation class called StellarGraph to a new implementation class called NetworkXStellarGraph. As a consequence, I wouldn't have expected timing benchmarks to change. Allocation I expected to maybe be a little higher due to the new interface, etc.

So my concern is I don't know why some tests seem slower on average.

Most of the tests seem within the margin of variation expected by the std dev. There are a few that seem slower. I assume this is due to the abstraction overhead caused by the new class performing more tests and then calling NetworkX. I'm not sure they are that much worse that we need to worry.

geoffj-d61 · 2019-12-12T02:20:26Z

In my tests, isinstance(g.neighbors(node), Iterable) returns True for both directed and undirected graphs. Specifically, dict_keyiterator is also iterable, as a for n in g.neighbors(node) also verifies.

adocherty · 2019-12-12T03:53:10Z

In my tests, isinstance(g.neighbors(node), Iterable) returns True for both directed and undirected graphs. Specifically, dict_keyiterator is also iterable, as a for n in g.neighbors(node) also verifies.

They are still different return types even though they are both iterable. It would be better to return the same object type regardless of if it's directed or not. Specifically, isinstance(g.neighbors(node), Iterator) will not be True for both. The fact that sets and iterators can both be iterated over is not the point.

My point here is: wrap sets returned in out_nodes and in_nodes with the iter function then we will have the same return type (Iterator) for both directed and undirected graphs.

geoffj-d61 · 2019-12-12T05:05:51Z

They are still different return types even though they are both iterable. It would be better to return the same object type regardless of if it's directed or not. Specifically, isinstance(g.neighbors(node), Iterator) will not be True for both. The fact that sets and iterators can both be iterated over is not the point.

I disagree, I'm afraid. If the interface says you will get something that can be iterated over, and you get something back that can be iterated over, the the interface is satisfied; the actual return type is irrelevant as far as functionality is concerned. I don't see that consistency of return type matters.
If you really want it we can have it, but I really don't want to change the interface from Iterable to Iterator to match - that would be a retrograde step in my book (as it reduces the level of abstraction).

@youph , you're the product owner, so you can make the call.

My point here is: wrap sets returned in out_nodes and in_nodes with the iter function then we will have the same return type (Iterator) for both directed and undirected graphs.

Not sure if you mean that in_nodes() and out_nodes() should return iter(set) or that the union of the sets for a directed graph should be wrapped with the iter? The latter is doable; the former would mess up doing unions.

geoffj-d61 · 2019-12-12T05:11:14Z

@youph , if you want a check added that a NetworkX graph has been supplied, then this logically should go into the implementation NetworkXStellarGraph. It doesn't make sense to be in the StellarGraphFactory, otherwise we have to have an extra dependency in the latter on NetworkX, even if (e.g. in the future) we never supply a NetworkX object for initialisation.

So, I can add a check in NetworkXStellarGraph if you think that is useful, otherwise initialisation will just break on its own.

geoffj-d61 · 2019-12-12T05:18:53Z

@youph , @adocherty , I've added a note in CHANGELOG.md about this being a breaking change - is that the right place to put it?

adocherty · 2019-12-12T21:16:36Z

I disagree, I'm afraid. If the interface says you will get something that can be iterated over, and you get something back that can be iterated over, the the interface is satisfied; the actual return type is irrelevant as far as functionality is concerned. I don't see that consistency of return type matters.
If you really want it we can have it, but I really don't want to change the interface from Iterable to Iterator to match - that would be a retrograde step in my book (as it reduces the level of abstraction).

This was a pretty minor point, but somehow we have ended up arguing about it for far too long.

Not sure if you mean that in_nodes() and out_nodes() should return iter(set) or that the union of the sets for a directed graph should be wrapped with the iter? The latter is doable; the former would mess up doing unions.

You are right – it's difficult to return an Iterator for in and out nodes with the current implementation.

You have addressed my other issues, and this seems to be an important point for you, so I'm OK to go ahead with it this way.

geoffj-d61 · 2019-12-12T22:38:47Z

You have addressed my other issues, and this seems to be an important point for you, so I'm OK to go ahead with it this way.

Cheers!

youph · 2019-12-19T07:16:33Z

@geoff-d61 @adocherty I just discovered that this PR breaks the interpretability demos, e.g., this one: stellargraph/demos/interpretability/gcn/node-link-importance-demo-gcn.ipynb, since they call nx.ego_graph method on the SG object, resulting in this error:
G_ego = nx.ego_graph(G, target_nid, radius=len(gcn.activations))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-3e6b0d6ab545> in <module>
----> 1 G_ego = nx.ego_graph(G, target_nid, radius=len(gcn.activations))

~/anaconda3/envs/stellargraph/lib/python3.6/site-packages/networkx/generators/ego.py in ego_graph(G, n, radius, center, undirected, distance)
     64                                               weight=distance)
     65         else:
---> 66             sp = dict(nx.single_source_shortest_path_length(G, n, cutoff=radius))
     67 
     68     H = G.subgraph(sp).copy()

~/anaconda3/envs/stellargraph/lib/python3.6/site-packages/networkx/algorithms/shortest_paths/unweighted.py in single_source_shortest_path_length(G, source, cutoff)
     59     shortest_path_length
     60     """
---> 61     if source not in G:
     62         raise nx.NodeNotFound('Source {} is not in G'.format(source))
     63     if cutoff is None:

TypeError: argument of type 'NetworkXStellarGraph' is not iterable

How do we fix this?
Currently the interpretability demos don't work.
One way to fix this would be to replace G with Gnx in the above call.

UPDATE: issue #570, PR #571

These were missed in #501 (and #502, after merging with #501) which renamed the `StellarGraphBase` type to `StellarGraph`. See: #632

geoffj-d61 added 8 commits October 1, 2019 13:06

Step 1 - favour composition over inheritance.

1a39104

Step 2, expose interface and hide implementation - mostly.

9e3eaea

Removed explicit dependency on NetworkX.

521175a

Use original NetworkX graph for NetworkX functionality.

2f53991

Use original NetworkX graph for NetworkX functionality.

f904094

Merge branch 'develop' into feature/297-drop-networkx-inheritance

b17d4a4

Fixed tests that were broken after merge from develop.

e4b7d3b

Removed StellarGraph to NetworkX conversion method; use hack for now.

89f4c70

geoffj-d61 added the ml label Oct 10, 2019

geoffj-d61 assigned adocherty and geoffj-d61 Oct 10, 2019

codeclimate bot reviewed Oct 10, 2019

View reviewed changes

geoffj-d61 added 3 commits October 24, 2019 09:36

Merge branch 'develop' into feature/297-drop-networkx-inheritance

3a010e5

Fixed merge conflict

0a5c091

Merge branch 'develop' into feature/297-drop-networkx-inheritance

f62714d

youph requested review from daokunzhang, adocherty and kieranricardo November 4, 2019 03:43

geoffj-d61 added 4 commits November 8, 2019 13:41

Merged from develop with fixes.

ebeba26

Missed one node -> nodes call - fixed.

48c53b1

Merge branch 'develop' into feature/297-drop-networkx-inheritance

3a3390f

Merge branch 'develop' into feature/297-drop-networkx-inheritance

ee12c46

huonw reviewed Nov 18, 2019

View reviewed changes

geoffj-d61 added 4 commits November 19, 2019 11:55

Fixed test comment.

f325c75

Added assertion on special NetworkX test.

52ed545

Use set mutation.

d8afc00

Reblacked code.

172ea05

adocherty approved these changes Nov 19, 2019

View reviewed changes

PyCharm missed one renaming.

41ffc87

geoffj-d61 requested review from adocherty and removed request for adocherty, kieranricardo and daokunzhang December 11, 2019 05:52

Merge branch 'develop' into feature/297-drop-networkx-inheritance

c964dc5

Updated changelog with a breaking change.

e70e637

geoffj-d61 merged commit fba856c into develop Dec 12, 2019

youph mentioned this pull request Dec 19, 2019

Fix broken interpretability demos #570

Closed

5 tasks

huonw deleted the feature/297-drop-networkx-inheritance branch January 3, 2020 06:17

huonw mentioned this pull request Jan 9, 2020

Include StellarGraph in the documentation of the core module #633

Merged

huonw added a commit that referenced this pull request Jan 9, 2020

Include StellarGraph in the documentation of the core module (#633)

51013fb

These were missed in #501 (and #502, after merging with #501) which renamed the `StellarGraphBase` type to `StellarGraph`. See: #632

huonw mentioned this pull request May 1, 2020

FullBatchGenerator still references StellarGraphBase in the Docs #1436

Closed

		return type.__call__(cls, args, *kwargs)


		class StellarGraph(metaclass=StellarGraphFactory):

Replace inheritance of NetworkX by encapsulation #501

Replace inheritance of NetworkX by encapsulation #501

Conversation

geoffj-d61 commented Oct 10, 2019

review-notebook-app bot commented Oct 10, 2019

codeclimate bot Oct 10, 2019

Choose a reason for hiding this comment

codeclimate bot Oct 10, 2019

Choose a reason for hiding this comment

codeclimate bot Oct 10, 2019

Choose a reason for hiding this comment

This comment was marked as off-topic.

This comment was marked as off-topic.

codeclimate bot commented Oct 10, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huonw Nov 18, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adocherty Nov 22, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adocherty left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geoffj-d61 Nov 19, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geoffj-d61 Nov 19, 2019 • edited

Choose a reason for hiding this comment

geoffj-d61 Nov 19, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geoffj-d61 commented Dec 11, 2019 • edited

geoffj-d61 commented Dec 11, 2019 • edited

geoffj-d61 commented Dec 11, 2019

huonw commented Dec 11, 2019

geoffj-d61 commented Dec 11, 2019

huonw commented Dec 11, 2019

geoffj-d61 commented Dec 11, 2019

adocherty commented Dec 12, 2019 • edited

adocherty commented Dec 12, 2019

geoffj-d61 commented Dec 12, 2019 • edited

codeclimate bot commented Oct 10, 2019 •

edited

huonw Nov 18, 2019 •

edited

adocherty Nov 22, 2019 •

edited

adocherty left a comment •

edited

geoffj-d61 Nov 19, 2019 •

edited

geoffj-d61 Nov 19, 2019 •

edited

geoffj-d61 Nov 19, 2019 •

edited

geoffj-d61 commented Dec 11, 2019 •

edited

geoffj-d61 commented Dec 11, 2019 •

edited

adocherty commented Dec 12, 2019 •

edited

geoffj-d61 commented Dec 12, 2019 •

edited

adocherty commented Dec 12, 2019 •

edited

adocherty commented Dec 12, 2019 •

edited

youph commented Dec 19, 2019 •

edited