Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dinitz correction #6968

Merged
merged 6 commits into from Dec 6, 2023
Merged

Dinitz correction #6968

merged 6 commits into from Dec 6, 2023

Conversation

YVWX
Copy link
Contributor

@YVWX YVWX commented Oct 1, 2023

[Edited by dschult: text added from email to the NetworkX-discuss list from @YVWX ]
just made a PR (#6968). I find the current implementation of Dinitz's algorithm is actually an Edmonds-Karp algorithm, so I try to make a change.

The second issue is about the documentation. There is a shortest_augmenting_path algorithm. As I see, the core part of the algorithm is the same as Edmonds_Karp. However, its time complexity mentioned in the user guide differs from Edmonds_Karp (O(n^2m) and O(nm^2)). Could you help check this problem?

Thank you in advance!

@dschult
Copy link
Member

dschult commented Oct 1, 2023

This dinitz implementation was largely added via #1978 and there may be comments there which are helpful to understand the implementation. The cited paper may also help understand any difference between this implementation and the original dinitz algorithm. Hopefully it also addresses the complexity. But I’m not sure it gives complexity for the algorithms you are wondering about. There is also a comment there with code used to time the results (there may be better timing tools today — that was 8 years ago). The comments there suggest that Edmonds-Karp is the faster implementation for the cases looked at.

Does your version of the code produce the same results as the current version? Have you done any time tests?
Why do you think the current algorithm implements Edmonds-Karp instead of Dinitz (how should I be looking for the difference between the algorithms?)

@YVWX
Copy link
Contributor Author

YVWX commented Oct 2, 2023

Thank you for your reply. I read the thread and the reference. The problem is that the current implementation augment only 1 path after each BFS while Dinitz's algorithm should augment all the available paths. I also find previously the implementation is consistent with the design but later the commit "Refactor Dinitz' algorithm implementation." makes it not.

I am sorry that I need more time to present the test results.

@dschult
Copy link
Member

dschult commented Oct 3, 2023

Thanks -- that description is very helpful for me to see where this PR is coming from.

@YVWX
Copy link
Contributor Author

YVWX commented Oct 3, 2023

Hi, I have uploaded my test script. It is largely based on the work in this thread.

The output values of my implementation are consistent with the ones of Edmonds-Karp.

Here are the timing tests (and the time is measured in seconds).

The result on the 1st graph:

Edmonds Karp: 0.5524605909983317
Shortest augmenting path: 0.5986862182617188
Original Dinitz: 9.085975567499796

The new implementation:

New Dinitz: 4.414525429407756

The result on the 2nd graph:

Edmonds Karp: 3.7168155511220298
Shortest augmenting path: 0.2403879165649414
Original Dinitz: 9.335507074991861

The new implementation:

New Dinitz: 1.760817050933838

The result on graphs of different densities.

Algorithms and Densities 0.1 0.3 0.5
Edmonds Karp 0.7506591478983561 2.2270760536193848 4.318867206573486
Shortest augmenting path 0.9440836906433105 2.7835280895233154 4.686661005020142
Original Dinitz 9.547501881917318 60.46248229344686 165.1355938911438

The new implementation:

0.1 0.3 0.5
4.325359106063843 30.983328580856323 109.96503067016602

I find the overhead somewhat out of my expectation.

I also find the thread and the reference interesting. I figure out that my implementation matches Cherkassky's design.

@networkx networkx deleted a comment from YVWX Oct 3, 2023
@dschult
Copy link
Member

dschult commented Oct 3, 2023

Thanks for those timings and the checking into other threads about these algorithms.
I deleted a duplicate message of those results and I'll include the testing code for long term usage in the details below. Now its time to review the code, bugs from before, and surrounding stuff.

I'm not sure what you meant by "overhead". Can you explain what surprised you?

import scipy
from time import time
import numpy as np
import networkx as nx
from scipy.sparse import rand
from scipy.sparse.csgraph import maximum_flow
from networkx.algorithms.flow import edmonds_karp
from networkx.algorithms.flow import shortest_augmenting_path
from networkx.algorithms.flow import preflow_push
from networkx.algorithms.flow import dinitz
from scipy.sparse import coo_matrix, csr_matrix
"""
n = 1000
density = 0.1
for k in range(100):
    m = (scipy.sparse.rand(n, n, density=density, format='csr',
                           random_state=k)*100).astype(np.int32)
    G = nx.from_numpy_matrix(m.toarray(), create_using=nx.DiGraph())
    Edmonds_Karp_max_flow = nx.algorithms.flow.maximum_flow_value(G, 0, n-1, capacity='weight', flow_func=edmonds_karp)
    Dinitz_max_flow = nx.algorithms.flow.maximum_flow_value(G, 0, n-1, capacity='weight', flow_func=dinitz)
    assert Edmonds_Karp_max_flow == Dinitz_max_flow
"""
"""
n = 1000
density = 0.1
m = (scipy.sparse.rand(n, n, density=density, format='csr', random_state=42)*100).astype(np.int32)
G = nx.from_numpy_matrix(m.toarray(), create_using=nx.DiGraph())

begin = time()
for itr in range(3):
    flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}")
"""
"""
n = 500
np.random.seed(42)
a = np.zeros((n, n), dtype=np.int32)
for k in range(n - 1):
    for j in range(-50, 50):
        if j != 0 and k + j >= 0 and k + j < n:
            a[k, k + j] = np.random.randint(1, 1000)
m = csr_matrix(a)
G = nx.from_numpy_matrix(a, create_using=nx.DiGraph())

begin = time()
for itr in range(3):
    flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    flow = nx.algorithms.flow.maximum_flow_value(G, 0, n - 1, capacity='weight', flow_func=dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}")
"""

def make_data(density):
    m = (rand(1000, 1000, density=density, format='coo', random_state=42)*100).astype(np.int32)
    return np.vstack([m.row, m.col, m.data]).T

data01 = make_data(0.1)
data03 = make_data(0.3)
data05 = make_data(0.5)

def networkx_max_flow(data, primitive):
    m = coo_matrix((data[:, 2], (data[:, 0], data[:, 1])))
    G = nx.from_numpy_array(m.toarray(), create_using=nx.DiGraph())
    return nx.maximum_flow_value(G, 0, 999, capacity='weight', flow_func=primitive)

def scipy_max_flow(data):
    m = csr_matrix((data[:, 2], (data[:, 0], data[:, 1])))
    return maximum_flow(m, 0, 999).flow_value

begin = time()
for itr in range(3):
    networkx_max_flow(data01, nx.algorithms.flow.edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    networkx_max_flow(data03, nx.algorithms.flow.edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    networkx_max_flow(data05, nx.algorithms.flow.edmonds_karp)
end = time()
print(f"Edmonds Karp: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    networkx_max_flow(data01, nx.algorithms.flow.shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    networkx_max_flow(data03, nx.algorithms.flow.shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    networkx_max_flow(data05, nx.algorithms.flow.shortest_augmenting_path)
end = time()
print(f"Shortest augmenting path: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    networkx_max_flow(data01, nx.algorithms.flow.dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    networkx_max_flow(data03, nx.algorithms.flow.dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}")

begin = time()
for itr in range(3):
    networkx_max_flow(data05, nx.algorithms.flow.dinitz)
end = time()
print(f"New Dinitz: {(end - begin) / 3}")

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is slow because of the repeated look-ups into R_pred[u][v].
But there may be other reasons too.

Is there a reason this doesn't just use the NetworkX breadth_first_search and depth_first_search? I could believe these custom loops might be faster than the general case, but I thought I would ask if there is a reason you know of not to use the general one.

I've got a couple of comments/questions on code.

networkx/algorithms/flow/dinitz_alg.py Outdated Show resolved Hide resolved
networkx/algorithms/flow/dinitz_alg.py Outdated Show resolved Hide resolved
@YVWX
Copy link
Contributor Author

YVWX commented Oct 5, 2023

I am sorry for the confusion. I did not expect that the committed file differs from the local file. The code I used for testing is the version of the commit "Update:".

The commit "Resubmit and use tuples" change the deque part according to the review.

I designed another test based on the networkx's flow test. Part of the result is presented below. The new implementation used here is the latest version and the time is measured in seconds.

The result on gl1:

Algorithms Time
New Dinitz 0.5481297969818115
Original Dinitz 2.6284141540527344
Edmonds Karp 1.9945318698883057
Shortest augmenting path 0.4946422576904297

The result on gw1:

Algorithms Time
New Dinitz 0.43953585624694824
Original Dinitz 2.1068835258483887
Edmonds Karp 0.7500441074371338
Shortest augmenting path 0.3566248416900635

The result on pyramid:

Algorithms Time
New Dinitz 2.923945903778076
Original Dinitz 47.78926753997803
Edmonds Karp 39.295607566833496
Shortest augmenting path 1.2902932167053223

The result on wlm3:

Algorithms Time
New Dinitz 0.2362527847290039
Original Dinitz 6.118213653564453
Edmonds Karp 3.0536985397338867
Shortest augmenting path 0.1755540370941162

I was thinking about whether we can speed the code up with some technique, and I mentioned "overhead", but after some research I find it fair to compare the implementation with other algorithms in Python. I believe that coding customized BFS/DFS is to make it more convenient for augmenting operations. The original implementation uses customized search functions and my implementation is based on this setting.

@dschult
Copy link
Member

dschult commented Nov 28, 2023

I'm comign back around to this PR. It looks like you have done a number of timing tests on different graphs and different algorithms. It looks like shortest_augmmenting_path is consistently the fastest. Also the new dinitz is faster than the old.

In your original post you asked about shortest_augmenting_path and how it differs from Edmonds-Karp. Have you figured that out yet?

Is this code ready to go? Anything else you want to include here?

@YVWX
Copy link
Contributor Author

YVWX commented Nov 28, 2023

Nice to hear from you!

I still do not quite understand the efficiency of shortest_augmenting_path. I am afraid that it needs some further research.

I believe this code is complete and ready to go.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for those answers.
I have one more question about the code implementation this time.
Can pairwise avoid the index tracking currently being done to track u and v in the DFS portion of the code?

networkx/algorithms/flow/dinitz_alg.py Outdated Show resolved Hide resolved
@YVWX
Copy link
Contributor Author

YVWX commented Nov 30, 2023

Thanks for those answers. I have one more question about the code implementation this time. Can pairwise avoid the index tracking currently being done to track u and v in the DFS portion of the code?

I find the use of pairwise indeed natural and reasonable here. A new commit is created.

Copy link
Member

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.
Thanks!

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, though I'm no Dinitz expert. For added testing I went ahead and ran this against the nx-guide article on Dinitz and there was no change in behavior, as expected.

It would be great if we could add a test which would demonstrate the buggy behavior and demonstrate the fix. I.e. find a minimal example graph which fails on main due to the Dinitz bug, but which passes with these changes. That's not a blocker though!

Finally, while reviewing I noticed that this appears to be one of those cases where we provided the ability for users to specify attribute names ("capacity" in this case) but we then hard-code the attribute name in the implementation. Let's go ahead and treat that issue separately though (if it is indeed an issue).

Thanks @YVWX !

@YVWX
Copy link
Contributor Author

YVWX commented Dec 6, 2023

Thanks @dschult and @rossbar!

The "bug" is mainly about implementation and the performance (efficiency). The previous implementation is a max flow algorithm, and its correctness is also guaranteed, but it is not Dinitz's algorithm. Its time complexity does not match Dinitz's O(n^2m), and it can be much slower than the proper implementation of Dinitz's algorithm on dense graphs.

@dschult
Copy link
Member

dschult commented Dec 6, 2023

Thanks @YVWX

@dschult dschult merged commit 03b73f8 into networkx:main Dec 6, 2023
38 checks passed
@jarrodmillman jarrodmillman added this to the 3.3 milestone Dec 6, 2023
cvanelteren pushed a commit to cvanelteren/networkx that referenced this pull request Apr 22, 2024
* Update dinitz_alg.py

* Update:

* Resubmit and use tuples

* Modify the for loop

* Use reversed pairwise in the for loop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants