Refactor connectivity package #1126

jtorrents · 2014-04-29T11:53:35Z

Use the new algorithms and the new interfaces provided by the flow package. I think that the code is quite clearer by using the new interface, and the new algorithms and interface provide a significant speed up in running time. For some problems, this code is one order of magnitude faster (see #1102 for bechmarks). So far the code is still backwards compatible, but we should discuss some possible non backward compatible improvements. I have some doubts about the implementation and documentation that I'd like to discuss.

In this first version I've added the flow_func parameter to all connectivity functions. It is great for tests to be able to use several flow algorithms. Also, the two algorithms that make sense in this scenario (edmonds_karp and shortest_augmenting_path) perform better in different scenarios: the former is faster in very sparse power-law like degree distribution networks, and the latter for denser networks with the edges more evenly distributed among nodes. Thus it makes sense to allow users to pick an algorithm. The implementation takes care of using the optimal parameters (cutoff for both and two_phase for SAP) for this two algorithms. So far I've set the default flow function to edmonds_karp because is faster in a wide set of contexts. I'll prepare more detailed benchmarks between these two algorithms.
Much of the increment of speed comes from reusing the data structures that we use for the underlying maximum flow computations (residual network) and local node|edge connectivity (auxiliary digraph). I think we should explain this to users and show them how they can also do that in their code. I've tried to do that in the docstings of local_node_connectivity and local_edge_connectivity. Do you think that we should document how to reuse the data structures?
The connectivity algorithms rely on two data structures (residual and auxiliary) that are conceptually different, but could be merged in an unique (more complex) data structure. This does not simplify the code a lot (we get rid of a parameter in function calls and a line of code each time we initialize the data structures) and I think it makes more difficult to understand what the code is actually doing. So I'd prefer to keep them separated but I'm open to merging them.
This changes are, so far are backwards compatible, but I think that we can improve the interface: an user that is only interested in computing node|edge cuts|connectivity will have enough using node_connectivity, edge_connectivity, minimum_node_cut and minimum_edge_cut. These functions support computing that for the whole graph and also for two nodes. So I think that these are the functions that we should import to the base NetworkX namespace. The other functions (local_* and minumum_st_*_cuts) are more specialized and should be used by users interested in building connectivity algorithms themselves. These functions accept all parameters of the the flow interface, and some more on their own, in order to reuse of data structures and achieve a significant speed up. So I think that we could keep these functions only to the connectivity package and require users that want to use them to import them explicitly from the connectivity package.
I was tempted to remove some connectivity functions, such as all_pairs_node_connectivity_matrix because it seems easy to write the few lines of code required for computing it. However I'm a bit hesitant because if you implement them without reusing the data structures, they will be a lot slower than the version that we provide here. This could be solved maybe with better documentation.

Use the new algorithms and the new interfaces provided by the flow package. The code is quite clearer, and the new algorithms and interface provide a significant speed up in running time. For some problems, this code is one order of magnitude faster (see networkx#1102 for bechmarks). So far the code is still backwards compatible, but we should discuss some possible non backward compatible improvements.

jtorrents · 2014-04-29T15:42:44Z

Some benchmarks to show the strong and weak points of edmonds_karp and shortest_augmenting_path. The legacy ford_fulkerson implementation is not included because it is too slow but it implements the same algorithm than edmonds_karp.

Graph	`edmonds_karp`	`SAP`	`edmonds_karp`	`SAP`
--------	node-conn	node-conn	edge-conn	edge-conn
Gnp(200, 0.2)	2.858	2.859	0.099	0.139
Gnp(200, 0.5)	97.461	50.318	0.261	0.287
Gnp(200, 0.7)	208.808	115.520	0.327	0.396
--------	-------------	--------------	--------------	-------------
Powerlaw(1000, 2)	2.125	13.560	0.653	3.254
Powerlaw(2000, 2)	12.314	80.617	2.907	17.390
Powerlaw(3000, 2)	31.563	233.510	7.429	49.709

As you can see, each algorithm is strong in different contexts. Thus we should allow the user to select one via the flow_func parameter. I think that edmonds_karp is better as a default because the networks that I care about are very sparse with skewed degree distributions, but I'm open to discuss that.

ysitu · 2014-04-29T16:49:26Z

I am yet to look at the code. Did you enable cutoffin your tests?

jtorrents · 2014-04-29T17:19:59Z

Yes, in the results above both algorithms use cutoff (in this version of the code both algorithms always use cutoff, at least that was my intent).

It must be said that, the run times posted above were computed in a quite slower machine (Xeon X5650) than the run times that I've posted at #1102 (Haswell i7-4700QM). I see that it can be confusing because the run times at #1102 do not use cutoff but are (only a bit!) faster than the figures presented here.

jtorrents · 2014-04-29T17:30:01Z

The code that I'm using for the benchmarks is:

import time
import networkx as nx
from networkx.utils import powerlaw_sequence, create_degree_sequence

flow_funcs = dict(
    edmonds_karp=nx.edmonds_karp,
    shortest_ap=nx.shortest_augmenting_path,
)

def build_power_law(n, exponent=2.0):
    deg_seq = create_degree_sequence(n, powerlaw_sequence, 100)
    G = nx.Graph(nx.configuration_model(deg_seq))
    G.remove_edges_from(G.selfloop_edges())
    G = sorted(nx.connected_component_subgraphs(G), key=len, reverse=True)[0]
    G.name = 'Power law configuration model: {0}'.format(n)
    return G

def benchmark_connectivity():
    graphs = []
    for p in [0.2, 0.5, 0.7]:
        G = nx.fast_gnp_random_graph(200, p)
        graphs.append(G)
    for n in [1000, 2000, 3000]:
        G = build_power_law(n)
        graphs.append(G)
    for G in graphs:
        print(nx.info(G))
        print("Computing node connectivity")
        for fname, flow_func in sorted(flow_funcs.items()):
            start = time.time()
            k = nx.node_connectivity(G, flow_func=flow_func)
            end = time.time() - start
            print(" " * 4 + "{0}:\t{1:.3f} seconds".format(fname, end))
        print("Computing edge connectivity")
        for fname, flow_func in sorted(flow_funcs.items()):
            start = time.time()
            k = nx.edge_connectivity(G, flow_func=flow_func)
            end = time.time() - start
            print(" " * 4 + "{0}:\t{1:.3f} seconds".format(fname, end))

if __name__ == '__main__':
    benchmark_connectivity()

jtorrents · 2014-04-30T15:54:23Z

Hmm, it seems that Travis is failing while trying to install coverage. I'll open an issue.

1. The node mapping needed for node connectivity and minimum node cuts is now a graph attribute of the auxiliary digraph. Thus there is no need for a mapping parameter in the local version of these functions. 2. Change the parameter name for the auxuliary digraph from `aux_digraph` to `auxiliary`. Also be consistent on the auxiliary digraph variable name in the code. Now it is always `H`. 3. Added small sanity check for the auxiliary digraph for node connectivity. If a digraph is passed as a paramater for reuse, check that it has a graph attribute with the node mapping. If not we raise instead of rebuild the auxiliary digraph.

With the addition of the example of how to compute local node connectivity among all pairs of nodes reusing the data structures (added in the previous commit in the docstrings of local_node_connectivity). We can remove this function (which is the only one that has numpy dependency in the connectivity package).

This change imporves significantly the speed of node_connectivity in denser graphs. For very sparse ones increases speed by ~5%.

This change is backwards incompatible. Updated docstrings for all affected functions with example usage. The global functions provide a good enough interface for most uses of connectivity algorithms. More sophisticated uses require explicit imports from the flow package anyway.

jtorrents · 2014-05-03T19:07:43Z

The failure in python 3.2 is unrelated to this PR, it seems that is caused in functions that use scipy:
https://travis-ci.org/networkx/networkx/jobs/24348012#L2166

I've implemented some changes (a few backwards incompatible) in the commits above:

Improved auxiliary digraph for connectivity functions: The node mapping needed for node connectivity and minimum node cuts is now a graph attribute of the auxiliary digraph. Thus there is no need for a mapping parameter in the local version of these functions. Also changed the parameter name for the auxuliary digraph from aux_digraph to auxiliary.
Added examples of reusing data structures in the local version of connectivity an cut functions, and improved docstrings for all functions.
Improve cutoff handling in node_connectivity based in the fact that node connectivity is bounded by minimum degree. I only did this change for edge connectivity in the first commit. This change imporves significantly the speed of node_connectivity in denser graphs (in my tests I see ~20% less time in networks with density 0.7; I should probably update the benchmarks). For very sparse networks the time reductions is ~5% or less.
Remove all_pairs_node_connectivity_matrix function: With the addition of the example of how to compute local node connectivity among all pairs of nodes reusing the data structures, we can remove this function (which is the only one that has numpy dependency in the connectivity package).
Remove local connectivity/cut functions from the base namespace: This change is backwards incompatible but I think it is worth it. Also updated docstrings for all affected functions with example usage. The global functions provide a good enough interface for most uses of connectivity algorithms. More sophisticated uses require explicit imports from the flow package anyway.

ysitu · 2014-05-11T17:04:28Z

networkx/algorithms/connectivity/utils.py

+        H.add_edge('%dA' % i, '%dB' % i, capacity=1)
+
+    edges = []
+    for (source, target) in G.edges():


G.edges_iter()

ysitu · 2014-05-11T18:37:55Z

Save for some minor issues, I think that this is okay.

…phs.

jtorrents · 2014-05-11T20:40:58Z

Thanks for looking at this @ysitu! I've done the changes that you suggested. I'm using now the nice v = min(G, key=G.degree) way of selecting a node with minimum degree. I post here updated benchmarks with the last changes. Using the fact that connectivity is bounded by degree increases significantly the performance in dense networks.

I think that this is ready for merging. The interfaces to connectivity algorithms are only a bit backwards incompatible (parameter mapping is gone and local functions are not imported in the base namespace), but exposing and using the new interfaces to flow algorithms is a big improvement. I did not do detailed benchmarks of this PR against the code in 1.8.1 that uses the legacy ford_fulkerson, but for some problems (sparse networks with skewed degree distributions) it is 10x faster (or more).

Graph	`edmonds_karp`	`SAP`	`edmonds_karp`	`SAP`
--------	node-con	nectivity	edge-conne	ctivity
Gnp(200, 0.2)	1.946	2.520	0.064	0.097
Gnp(200, 0.5)	65.678	39.659	0.157	0.195
Gnp(200, 0.7)	150.120	94.216	0.215	0.265
--------	-------------	--------------	--------------	-------------
Powerlaw(1000, 2)	1.685	12.038	0.470	2.348
Powerlaw(2000, 2)	10.456	79.087	2.185	15.667
Powerlaw(3000, 2)	25.963	212.799	6.667	43.309

ysitu · 2014-05-11T22:39:07Z

networkx/algorithms/connectivity/utils.py

+        nx.set_edge_attributes(H, 'capacity', capacity)
+        return H
+    else:
+        H = G.to_directed()


to_directed/to_undirected will also ends up deepcopying user data of unknown size/copyability. The proper fix is to make them and copy accept a data=False argument. But that belongs to another PR.

ysitu · 2014-05-13T15:29:58Z

networkx/algorithms/connectivity/utils.py

+        for (source, target) in G.edges_iter():
+            H.add_edges_from([(source, target), (target, source)])
+        capacity = dict((e, 1) for e in H.edges())
+        nx.set_edge_attributes(H, 'capacity', capacity)


Same as above.

jtorrents · 2014-05-13T17:54:45Z

Nowhere in the documentation is it mentioned that the graph can be capacitated. The user will be very surprised to see a NetworkXUnbounded due to an uncapacitated edge.

Well, if an user gets a NetworkXUnbounded exception, then it is a bug on our side because what has to be capacited is the auxiliary digraph, not the input graph that the user will use as argument.

All the connectivity and cut functions are supposed to work on graphs with capacities == 1 for all edges. The only exception is minimum_st_edge_cut that, because it uses the new minimum_cut interface, is able to compute weighted cuts. This is the only function that has a capacity parameter. I'll try to clarify its docstrings.

I'll do the changes that you propose shortly, and will also check the docstrings to make sure that we have no back-ticks missing.

ysitu · 2014-05-13T18:13:12Z

If the user specify the capacities of some but not all edges, they will likely get a NetworkXUnbounded when they use the edge connectivity/cut functions.

jtorrents · 2014-05-13T19:16:39Z

Oh I see. You are right, I'll fix that. We should always build the auxiliary network with the unity capacities even if the graph passed to build_auxiliary_edge_connectivity has already a capacity edge attribute.

All connectivity and cut algorithms are supposed to work on unit capacity networks. minimum_st_edge_cut was the only exception, because it uses the new interface to flow algorithms, is potentially able to compute weighted cutsets. However, this complicated the implementation and could result in a NetworkXUnbounded exception if an user used as argument a graph with some but not all edges with an attribute called capacity. Since minimum_cut computes weighted cuts, there is no need to duplicate functionality here.

Always build the auxiliary network and simplify the process of building of the auxiliary digraph.

jtorrents · 2014-05-14T15:03:06Z

After looking at the problem that @ysitu pointed out, I think that the cleanest option is to do not allow weighted computations in minimum_st_edge_cut. All connectivity and cut algorithms are supposed to work on unit capacity networks. minimum_st_edge_cut was the only exception, it is potentially able to compute weighted cutsets because it uses the new interface to flow algorithms. However, this complicated the implementation and could result in a NetworkXUnbounded exception if an user used as argument a graph with some but not all edges with an attribute called capacity. Since minimum_cut computes weighted cuts, there is no need to duplicate functionality here.

Also improved the generation of the auxiliary digraph for edge connectivity and cleaned up the docstrings.

chebee7i · 2014-05-14T16:02:44Z

@jtorrents How common is it to want to calculate node connectivity between all pairs?

If it is quite common, I'd lean towards including a simple function that does this for the user. One general complaint of Python libraries (i.e., in comparison to R), is that sometimes they tend to be too low level. The example you provided is >10 lines right? It involves the use of itertools, auxillary digraphs, and residuals. Users who only want node connectivity between pairs and don't care to learn the NetworkX implementation will thank you for being able to calculate it in one line.

jtorrents · 2014-05-14T16:40:26Z

Good point @chebee7i. We could add again this function. However I think that computing node connectivity between all pairs is not very common, because it is a quite slow computation. For not so big problems, it will not be practical. In fact flow based connectivity algorithms are based in clever ways to avoid computing a minimum cut among all pairs of nodes.

However I agree that if an user needs to compute node connectivity among all pairs, they will have to dive in implementation details ... and that might no be so pleasant for them as it is for us ;). So I'll add it again. I'm not sure of which would be the best data structure for returning it. In the previous version it returned a 2d numpy array, but I'm thinking that a plain old dict might do. What do you think?

chebee7i · 2014-05-14T18:51:35Z

I'm fine with either. The NumPy array is more efficient, but it is a dependency. So dict works.

This reverts commit 7cb64b6.

1. Change function name from all_pairs_node_connectiviy_matrix to all_pairs_node_connectivity. 2. The function now returns a dict instead of a numpy 2d array. 3. New parameter nbunch for computing node connectivity only among pairs of nodes in the container nbunch. 4. Added old and new tests for all_pairs_node_connectiviy to test_connectivity.py.

jtorrents · 2014-05-15T15:28:19Z

I've added again the function all_pairs_connectivity_matrix with some modifications:

Change function name from all_pairs_node_connectiviy_matrix to
all_pairs_node_connectivity.
The function now returns a dict instead of a numpy 2d array.
New parameter nbunch for computing node connectivity only among
pairs of nodes in nbunch.
Added old and new tests for all_pairs_node_connectiviy to
test_connectivity.py.

jtorrents · 2014-05-15T15:50:58Z

Almost forgot to comment that I've added the function all_pairs_node_connectivity to the base NetworkX namespace. Not sure if this is necessary. We can also keep it to the connectivity package and require an explicit import.

ysitu · 2014-05-15T16:33:32Z

Also need to put it in the Sphinx source.

jtorrents · 2014-05-15T17:06:17Z

Added all_pairs_node_connectivity to the sphinx sources. Also added the functions for building auxiliary digraphs to the package documentation. And did a small fix in stoer_wagner docstrings: only the first line shows up as summary of the function in the generated documentation.

ysitu · 2014-05-16T00:36:44Z

How about adding a test or two to check edge_connectivity against stoer_wagner?

jtorrents · 2014-05-16T11:33:42Z

Good idea @ysitu! I've added a test that uses several platonic graphs to check edge connectivity against stoer_wagner.

jtorrents · 2014-05-19T17:46:06Z

Any other comment on this?

Refactor connectivity package

jtorrents added 2 commits April 30, 2014 13:05

Add cutoff test for local connectivity functions.

59abdde

Increase test coverage in cuts module.

59748a8

Fix error in node var names in minimum_node_cut.

0413915

hagberg added this to the networkx-1.9 milestone May 2, 2014

hagberg assigned jtorrents May 2, 2014

jtorrents added 5 commits May 2, 2014 19:18

Added examples reusing data structures and improved docstrings.

c6b6233

Improve cutoff handling: node connectivity is bounded by minimum degree.

628a244

This change imporves significantly the speed of node_connectivity in denser graphs. For very sparse ones increases speed by ~5%.

jtorrents mentioned this pull request May 3, 2014

Travis failures in python 3.2 related to scipy #1137

Closed

Merge 'master' to get the Travis CI fix.

4045ffa

ysitu reviewed May 11, 2014
View reviewed changes

jtorrents added 2 commits May 11, 2014 22:00

Improve selection of node with minimum degree in cut functions.

9cb2247

Avoid deep copy and always use iterators for building auxiliary digra…

406f1da

…phs.

ysitu reviewed May 11, 2014
View reviewed changes

ysitu reviewed May 13, 2014
View reviewed changes

jtorrents added 3 commits May 14, 2014 13:26

Improve generation of auxiliary digraph for edge connectivity.

79e2e7a

Always build the auxiliary network and simplify the process of building of the auxiliary digraph.

Clean up docstrings.

0383c87

ysitu added the 1.9-relnotes label May 14, 2014

jtorrents added 3 commits May 15, 2014 13:04

Revert "Remove all_pairs_node_connectivity_matrix function."

7919dd0

This reverts commit 7cb64b6.

Remove obsolete tests for all_pairs_connectivity_matrix.

e75906d

jtorrents added 2 commits May 15, 2014 18:50

Update sphinx sources for the connectivity package.

7755f57

Cosmetic change in stoer_wagner docstrings.

85b85a2

Add test to check edge connectivity against stoer_wagner.

86d2727

ysitu added a commit that referenced this pull request May 20, 2014

Merge pull request #1126 from jtorrents/refactor-connectivity

4017b0c

Refactor connectivity package

ysitu merged commit 4017b0c into networkx:master May 20, 2014

jtorrents mentioned this pull request Mar 14, 2015

Add fast approximation for node connectivity. #1405

Merged

dschult mentioned this pull request Jan 12, 2016

Uses correct graph class when copying without data #1917

Closed

jtorrents deleted the refactor-connectivity branch February 5, 2016 16:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor connectivity package #1126

Refactor connectivity package #1126

jtorrents commented Apr 29, 2014

jtorrents commented Apr 29, 2014

ysitu commented Apr 29, 2014

jtorrents commented Apr 29, 2014

jtorrents commented Apr 29, 2014

jtorrents commented Apr 30, 2014

jtorrents commented May 3, 2014

ysitu May 11, 2014

ysitu commented May 11, 2014

jtorrents commented May 11, 2014

ysitu May 11, 2014

ysitu May 13, 2014

jtorrents commented May 13, 2014

ysitu commented May 13, 2014

jtorrents commented May 13, 2014

jtorrents commented May 14, 2014

chebee7i commented May 14, 2014

jtorrents commented May 14, 2014

chebee7i commented May 14, 2014

jtorrents commented May 15, 2014

jtorrents commented May 15, 2014

ysitu commented May 15, 2014

jtorrents commented May 15, 2014

ysitu commented May 16, 2014

jtorrents commented May 16, 2014

jtorrents commented May 19, 2014

Refactor connectivity package #1126

Refactor connectivity package #1126

Conversation

jtorrents commented Apr 29, 2014

jtorrents commented Apr 29, 2014

ysitu commented Apr 29, 2014

jtorrents commented Apr 29, 2014

jtorrents commented Apr 29, 2014

jtorrents commented Apr 30, 2014

jtorrents commented May 3, 2014

ysitu May 11, 2014

Choose a reason for hiding this comment

ysitu commented May 11, 2014

jtorrents commented May 11, 2014

ysitu May 11, 2014

Choose a reason for hiding this comment

ysitu May 13, 2014

Choose a reason for hiding this comment

jtorrents commented May 13, 2014

ysitu commented May 13, 2014

jtorrents commented May 13, 2014

jtorrents commented May 14, 2014

chebee7i commented May 14, 2014

jtorrents commented May 14, 2014

chebee7i commented May 14, 2014

jtorrents commented May 15, 2014

jtorrents commented May 15, 2014

ysitu commented May 15, 2014

jtorrents commented May 15, 2014

ysitu commented May 16, 2014

jtorrents commented May 16, 2014

jtorrents commented May 19, 2014