New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dinitz correction #6968
Dinitz correction #6968
Conversation
This dinitz implementation was largely added via #1978 and there may be comments there which are helpful to understand the implementation. The cited paper may also help understand any difference between this implementation and the original dinitz algorithm. Hopefully it also addresses the complexity. But I’m not sure it gives complexity for the algorithms you are wondering about. There is also a comment there with code used to time the results (there may be better timing tools today — that was 8 years ago). The comments there suggest that Edmonds-Karp is the faster implementation for the cases looked at. Does your version of the code produce the same results as the current version? Have you done any time tests? |
Thank you for your reply. I read the thread and the reference. The problem is that the current implementation augment only 1 path after each BFS while Dinitz's algorithm should augment all the available paths. I also find previously the implementation is consistent with the design but later the commit "Refactor Dinitz' algorithm implementation." makes it not. I am sorry that I need more time to present the test results. |
Thanks -- that description is very helpful for me to see where this PR is coming from. |
Hi, I have uploaded my test script. It is largely based on the work in this thread. The output values of my implementation are consistent with the ones of Edmonds-Karp. Here are the timing tests (and the time is measured in seconds). The result on the 1st graph:
The new implementation:
The result on the 2nd graph:
The new implementation:
The result on graphs of different densities.
The new implementation:
I find the overhead somewhat out of my expectation. I also find the thread and the reference interesting. I figure out that my implementation matches Cherkassky's design. |
Thanks for those timings and the checking into other threads about these algorithms. I'm not sure what you meant by "overhead". Can you explain what surprised you? import scipy
from time import time
import numpy as np
import networkx as nx
from scipy.sparse import rand
from scipy.sparse.csgraph import maximum_flow
from networkx.algorithms.flow import edmonds_karp
from networkx.algorithms.flow import shortest_augmenting_path
from networkx.algorithms.flow import preflow_push
from networkx.algorithms.flow import dinitz
from scipy.sparse import coo_matrix, csr_matrix
"""
n = 1000
density = 0.1
for k in range(100):
m = (scipy.sparse.rand(n, n, density=density, format='csr',
random_state=k)*100).astype(np.int32)
G = nx.from_numpy_matrix(m.toarray(), create_using=nx.DiGraph())
Edmonds_Karp_max_flow = nx.algorithms.flow.maximum_flow_value(G, 0, n-1, capacity='weight', flow_func=edmonds_karp)
Dinitz_max_flow = nx.algorithms.flow.maximum_flow_value(G, 0, n-1, capacity='weight', flow_func=dinitz)
assert Edmonds_Karp_max_flow == Dinitz_max_flow
"""
"""
n = 1000
density = 0.1
m = (scipy.sparse.rand(n, n, density=density, format='csr', random_state=42)*100).astype(np.int32)
G = nx.from_numpy_matrix(m.toarray(), create_using=nx.DiGraph())
begin = time()
for itr in range(3):
flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")
begin = time()
for itr in range(3):
flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")
begin = time()
for itr in range(3):
flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}")
"""
"""
n = 500
np.random.seed(42)
a = np.zeros((n, n), dtype=np.int32)
for k in range(n - 1):
for j in range(-50, 50):
if j != 0 and k + j >= 0 and k + j < n:
a[k, k + j] = np.random.randint(1, 1000)
m = csr_matrix(a)
G = nx.from_numpy_matrix(a, create_using=nx.DiGraph())
begin = time()
for itr in range(3):
flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")
begin = time()
for itr in range(3):
flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")
begin = time()
for itr in range(3):
flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}")
"""
def make_data(density):
m = (rand(1000, 1000, density=density, format='coo', random_state=42)*100).astype(np.int32)
return np.vstack([m.row, m.col, m.data]).T
data01 = make_data(0.1)
data03 = make_data(0.3)
data05 = make_data(0.5)
def networkx_max_flow(data, primitive):
m = coo_matrix((data[:, 2], (data[:, 0], data[:, 1])))
G = nx.from_numpy_array(m.toarray(), create_using=nx.DiGraph())
return nx.maximum_flow_value(G, 0, 999, capacity='weight', flow_func=primitive)
def scipy_max_flow(data):
m = csr_matrix((data[:, 2], (data[:, 0], data[:, 1])))
return maximum_flow(m, 0, 999).flow_value
begin = time()
for itr in range(3):
networkx_max_flow(data01, nx.algorithms.flow.edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")
begin = time()
for itr in range(3):
networkx_max_flow(data03, nx.algorithms.flow.edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")
begin = time()
for itr in range(3):
networkx_max_flow(data05, nx.algorithms.flow.edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")
begin = time()
for itr in range(3):
networkx_max_flow(data01, nx.algorithms.flow.shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")
begin = time()
for itr in range(3):
networkx_max_flow(data03, nx.algorithms.flow.shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")
begin = time()
for itr in range(3):
networkx_max_flow(data05, nx.algorithms.flow.shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")
begin = time()
for itr in range(3):
networkx_max_flow(data01, nx.algorithms.flow.dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}")
begin = time()
for itr in range(3):
networkx_max_flow(data03, nx.algorithms.flow.dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}")
begin = time()
for itr in range(3):
networkx_max_flow(data05, nx.algorithms.flow.dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is slow because of the repeated look-ups into R_pred[u][v]
.
But there may be other reasons too.
Is there a reason this doesn't just use the NetworkX breadth_first_search and depth_first_search? I could believe these custom loops might be faster than the general case, but I thought I would ask if there is a reason you know of not to use the general one.
I've got a couple of comments/questions on code.
I am sorry for the confusion. I did not expect that the committed file differs from the local file. The code I used for testing is the version of the commit "Update:". The commit "Resubmit and use tuples" change the deque part according to the review. I designed another test based on the networkx's flow test. Part of the result is presented below. The new implementation used here is the latest version and the time is measured in seconds. The result on gl1:
The result on gw1:
The result on pyramid:
The result on wlm3:
I was thinking about whether we can speed the code up with some technique, and I mentioned "overhead", but after some research I find it fair to compare the implementation with other algorithms in Python. I believe that coding customized BFS/DFS is to make it more convenient for augmenting operations. The original implementation uses customized search functions and my implementation is based on this setting. |
I'm comign back around to this PR. It looks like you have done a number of timing tests on different graphs and different algorithms. It looks like In your original post you asked about Is this code ready to go? Anything else you want to include here? |
Nice to hear from you! I still do not quite understand the efficiency of I believe this code is complete and ready to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for those answers.
I have one more question about the code implementation this time.
Can pairwise
avoid the index tracking currently being done to track u and v in the DFS portion of the code?
I find the use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, though I'm no Dinitz expert. For added testing I went ahead and ran this against the nx-guide article on Dinitz and there was no change in behavior, as expected.
It would be great if we could add a test which would demonstrate the buggy behavior and demonstrate the fix. I.e. find a minimal example graph which fails on main
due to the Dinitz bug, but which passes with these changes. That's not a blocker though!
Finally, while reviewing I noticed that this appears to be one of those cases where we provided the ability for users to specify attribute names ("capacity" in this case) but we then hard-code the attribute name in the implementation. Let's go ahead and treat that issue separately though (if it is indeed an issue).
Thanks @YVWX !
The "bug" is mainly about implementation and the performance (efficiency). The previous implementation is a max flow algorithm, and its correctness is also guaranteed, but it is not Dinitz's algorithm. Its time complexity does not match Dinitz's O(n^2m), and it can be much slower than the proper implementation of Dinitz's algorithm on dense graphs. |
Thanks @YVWX |
* Update dinitz_alg.py * Update: * Resubmit and use tuples * Modify the for loop * Use reversed pairwise in the for loop
[Edited by dschult: text added from email to the NetworkX-discuss list from @YVWX ]
just made a PR (#6968). I find the current implementation of Dinitz's algorithm is actually an Edmonds-Karp algorithm, so I try to make a change.
The second issue is about the documentation. There is a shortest_augmenting_path algorithm. As I see, the core part of the algorithm is the same as Edmonds_Karp. However, its time complexity mentioned in the user guide differs from Edmonds_Karp (O(n^2m) and O(nm^2)). Could you help check this problem?
Thank you in advance!