WIP : ENH: Refactored _hungarian.py for speed and added a minimize/maximize… #9800

batconjurer · 2019-02-10T17:35:42Z

… switch

Important Update :

After uncovering and fixing a serious bug, there is no longer a speed improvement. This pull request should not be approved for now unless I can speed it up again, which will take time.

A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py.
The use of state machine has been removed and the algorithm more
closely follows the standard exposition in the literature (e.g. Wikipedia).
This yields a performance improvement. Benchmarking has also been added.

Also, there is now an optional parameter to specify if minimization or
maximization of the assignment is desired.

The results of the benchmarking for hungarian.py in v1.2.2:

[ 0.00%] · For scipy commit a3e96ed <v1.2.2>:
[ 0.00%] ·· Benchmarking virtualenv-py3.6-Cython0.27.3-Tempita-numpy-pytest-six
[ 50.00%] ··· Running (hungarian.Hungarian.time_solve--).
[100.00%] ··· hungarian.Hungarian.time_solve ok
[100.00%] ··· ============= ============
matrix_size
------------- ------------
100 45.7±0.2ms
300 1.31±0.1s
500 7.51±0.01s
700 25.6±0.01s
============= ============

The results of the benchmarking for hungarian.py in v1.2.2:

[ 0.00%] · For scipy commit 832e219 <performance_refactored_hungarian~1>:
[ 0.00%] ·· Benchmarking virtualenv-py3.6-Cython0.27.3-Tempita-numpy-pytest-six
[ 0.00%] ··· Running (hungarian.Hungarian.time_solve--).
[ 0.01%] ··· hungarian.Hungarian.time_solve ok
[ 0.01%] ··· ============= ============
matrix_size
------------- ------------
100 48.8±0.1ms
300 565±3ms
500 2.58±0.01s
700 7.75±0s
============= ============

… switch A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py. The use of state machine has been removed and the algorithm more closely follows the standard exposition in the literature (e.g. Wikipedia). This yields a performance improvement. Benchmarking can be found in my personal development repository: https://github.com/batconjurer/munkres-algorithm Also, there is now an optional parameter to specify if minimization or maximization of the assignment is desired.

rgommers

Thanks @batconjurer, a performance increase would be nice here. I made a lot of detailed comments; at a high level:

could you add your benchmarks to https://github.com/scipy/scipy/tree/master/benchmarks/benchmarks (in a new file probably, look at optimize_zeros.py there for an example)?
could you address the many style issues (use e.g. pycodestyle or autopep8 to check)?

I haven't looked at the algorithm yet, too many other things need fixing first

scipy/optimize/_hungarian.py

scipy/optimize/tests/test_hungarian.py

scipy/optimize/_hungarian.py

* Corrected header copyright and authorship info * Typeset math properly in main docstring * Fixed style issues * Added clearer references to where the algorithm was taken * Removed np.matrix * Refactored tests to use public API as much as possible * Move the public docstring inside of main public API function * Corrected the copy method for arrays * Added benchmarking

…urer/scipy into performance_refactored_hungarian

…my formatting as a section called 'Definitions' which it was not expecting

switch A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py. The use of state machine has been removed and the algorithm more closely follows the standard exposition in the literature (e.g. Wikipedia). This yields a performance improvement. Benchmarking has also been added. Also, there is now an optional parameter to specify if minimization or maximization of the assignment is desired.

scipy/optimize/_hungarian.py

rgommers · 2019-02-27T07:24:45Z

This looks much better now.

Pinging @larsmans as the original author of this code, in case he's still around and wants to have a look.

switch A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py. The use of state machine has been removed and the algorithm more closely follows the standard exposition in the literature (e.g. Wikipedia). This yields a performance improvement. Benchmarking has also been added. Also, there is now an optional parameter to specify if minimization or maximization of the assignment is desired.

benchmarks/benchmarks/hungarian.py

…eswitch A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py. The use of state machine has been removed and the algorithm more closely follows the standard exposition in the literature (e.g. Wikipedia). This yields a performance improvement. Benchmarking can be found in my personal development repository: https://github.com/batconjurer/munkres-algorithm Also, there is now an optional parameter to specify if minimization or maximization of the assignment is desired.

tylerjereddy

test coverage looks solid & no old tests modified
reasonably positive feedback from Ralf, though detailed review of the algorithm may not have happened
rebase might be good given 25 days since last push
even better would be a resident optimize expert signing off

tylerjereddy · 2019-04-24T23:37:41Z

scipy/optimize/_hungarian.py

+        for row in np.nonzero(self.row_saturated == False)[0]:
+            path_row, path_col = self._aug_path(row)
+            if not path_col:
+                continue


The continue is never hit in the tests

tylerjereddy · 2019-04-24T23:38:11Z

scipy/optimize/_hungarian.py

+            path_row, path_col = self._aug_path(row)
+            if not path_col:
+                continue
+            if not len(path_row + path_col) % 2:


the if not is always true in tests

tylerjereddy · 2019-04-24T23:39:01Z

scipy/optimize/_hungarian.py

+                    break
+
+        # Recreate augmented path from linked list if possible
+        if last_col is not None:


always true in tests

tylerjereddy · 2019-04-25T17:50:42Z

Bumping milestone, but let's try to get this in soon. I noticed yesterday that attempting to rebase on latest master was a little messy, probably because of all the commits.

…eswitch A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py. The use of state machine has been removed and the algorithm more closely follows the standard exposition in the literature (e.g. Wikipedia). This yields a performance improvement. Benchmarking can be found in my personal development repository: https://github.com/batconjurer/munkres-algorithm Also, there is now an optional parameter to specify if minimization or maximization of the assignment is desired. ** Update ** A logic error was discovered in the Wikipedia entry of the Hungarian algorithm which was propagated into the code. This has now been fixed and covered with a test.

batconjurer · 2019-05-02T21:39:15Z

Bad news. In reviewing tests, I discovered an error that originated in Wikipedia and that I unfortunately duplicated in the code. It is fixed now, but the speed improvement is evaporated. So this pull request should be deleted or put on hold (not sure the proper protocol here) while I work on speeding it up if possible.

rgommers · 2019-05-02T21:45:47Z

That's too bad. Good you caught it though!

You can just change your PR title to WIP, that's a sign that it's not ready to merge. Then you can just leave it open and come back to it later.

pmla · 2019-06-10T17:34:56Z

Is there any appetite for a C extension for this algorithm? The Hungarian algorithm is important for many applications, and it appears that we could get orders of magnitude better performance with C code.

The py-lapsolver looks like a good candidate. It it MIT licensed and the benchmarks demonstrate its performance: https://github.com/cheind/py-lapsolver
I am happy to adapt the code and do the necessary integration and testing.

@rgommers What do you think?

rgommers · 2019-06-10T20:22:16Z

@pmla those benchmarks do seem to indicate that py-lapsolver is faster than our version. So yes, there's interest. A couple of thoughts:

py-lapsolver relies on pybind11. We don't have that build dependency yet, but are biting that bullet right now in ENH: Add new scipy.fft module using pocketfft #10238 so likely fine (it did require a bit of messing with the build system there, that then should be moved to scipy/_build_utils probably).
The original author of linear_sum_assignment is not around anymore, which makes it harder for us to review code at the moment. There's multiple interested people though, so ideally you could collaborate on this - if two people agree that the algorithmic details are right, that makes the job of the person doing the final review/merge a lot easier.
The "do we accept C/C++/Cython (or even Fortran)" is always a tradeoff between performance and maintenance cost. In this case, the benchmarks are comprehensive so it seems like a no-brainer to me to accept a C++ version here.
Always nice to ask the original author how they feel about inclusion of their code. So pinging @cheind here :)

pmla · 2019-06-11T04:35:10Z

@rgommers It turns out that py-lapsolver does not support rectangular matrices, which the current implementation does. Apologies.

Instead, I have written an implementation of the Jonker-Volgenant algorithm in C++. It supports everything the current implementation does, as well as the maximization option proposed in this pull request and the feasible infinities I proposed in #10293.

I will open a separate pull request when I have finished writing tests and documentation.

cheind · 2019-06-11T05:02:40Z

Hi, I'm the author of py-lapsolver package mentioned.

@rgommers pybind11 is a header only build dependency. With some work the package should be easily adapted to any C-interfacing that scipy already supports.
Thanks for putting me in the loop, I have no objections that my code might be part of scipy. However, I have to say that my spare time resources are insufficient for maintaining / adapting it to scipy standards.
@pmla py-lapsolver supports rectangular matrices out of the box. See this test for example. The package is heavily used by py-motmetrics, a benchmark multiple object trackers, which has attained significant usage throughout the computer vision community (i.e Facebook). py-lapsolver is the primary solver (if installed) for huge rectangular matrices when matching object hypothesis and ground truth objects.
py-lapsolver uses partial template specialization in C++ that allows it to run faster if the cost matrix is integer based rather than double floating point [1].
Please note py-lapsolver benchmarks are based on random cost matrices. @kylemcdonald noted [1] that this is a rather unrealistic assumption for certain applications. In [1] he proposes an implementation that performs even faster (at for the given use-case).

[1] https://github.com/cheind/py-lapsolver/issues/1

berhane · 2019-06-11T13:35:10Z

Hi All,
I had benchmarked six different LAP solvers including scipy much like @cheind . You may find the result informative or enjoy rerunning them yourself.

Module	Python or C/C++/Cython	Algorithm	Link
scipy.optimize.linear_sum_assignment	Python	Hungarian	https://github.com/scipy/scipy/
munkres.Munkres	Python	Hungarian	https://github.com/bmc/munkres
hungarian.lap	C++	Hungarian	https://github.com/Hrldcpr/Hungarian
lap.lapjv	C++	Jonker-Volgenant	https://github.com/gatagat/lap
lapjv.lapjv	C++	Jonker-Volgenant	https://github.com/src-d/lapjv
lapsolver.solve_dense	C++	shortest augmenting path	https://github.com/cheind/py-lapsolver

pmla · 2019-06-11T22:24:50Z

@cheind apologies for misrepresenting your package. I was browsing your repository and saw that the Stanford ACM-ICPC code only supports square matrices. I see now that you pad the cost matrix.
@berhane thanks for the list of solvers.

I have written some benchmarks and tests for a selection of solvers. The code is a bit preliminary but you can find it here: https://github.com/pmla/linear-assignment-benchmarks

Any replacement for the current scipy implementation should have the same functionality. I have defined this as:

Support for rectangular matrices, both where m > n and m < n.
Correctly finding the optimal-cost solution for every input.

Other than the scipy implementation itself, none of the other modules I tested met these requirements. The C++ module I have written, jonkervolgenant, meets these requirements. It can also be found in the benchmark repository.

Here are the results of my tests and benchmarks:

square tests: (400, 400)      time (ms)        error sum    status
==================================================================

     random_uniform
        scipy                   2697.23             zero      pass
        jonkervolgenant            7.05             zero      pass
        lapsolver                  7.53             zero      pass
        gatagat_lapjv              4.21             zero      pass
        srcd_lapjv                 2.39             zero      pass
        hungarian                 16.76             zero      pass

     random_spatial
        scipy                   5332.93             zero      pass
        jonkervolgenant           10.78             zero      pass
        lapsolver                 17.84             zero      pass
        gatagat_lapjv             28.28             zero      pass
        srcd_lapjv                76.55             zero      pass
        hungarian                 35.89             zero      pass

     random_logarithmic
        scipy                   2856.26             zero      pass
        jonkervolgenant            7.54             zero      pass
        lapsolver                  2.02        1.480e-08      fail
        gatagat_lapjv              5.23             zero      pass
        srcd_lapjv                 4.53             zero      pass
        hungarian                 17.40             zero      pass

     random_integer
        scipy                   1374.58             zero      pass
        jonkervolgenant            7.01             zero      pass
        lapsolver                  8.30             zero      pass
        gatagat_lapjv              1.55             zero      pass
        srcd_lapjv                 1.66             zero      pass
        hungarian                 12.07             zero      pass

retangular tests: (400, 800)  time (ms)        error sum    status
==================================================================

     random_uniform
        scipy                   4247.94             zero      pass
        jonkervolgenant            5.02             zero      pass
        lapsolver                158.46             zero      pass
        gatagat_lapjv            199.59             zero      pass
        srcd_lapjv            ---------        ---------      fail
        hungarian             ---------        ---------      fail

     random_spatial
        scipy                   5256.62             zero      pass
        jonkervolgenant            4.60             zero      pass
        lapsolver                168.17             zero      pass
        gatagat_lapjv            197.78             zero      pass
        srcd_lapjv            ---------        ---------      fail
        hungarian             ---------        ---------      fail

     random_logarithmic
        scipy                   3600.59             zero      pass
        jonkervolgenant            4.55             zero      pass
        lapsolver                  3.11        1.747e-08      fail
        gatagat_lapjv              8.14        6.727e-18      fail
        srcd_lapjv            ---------        ---------      fail
        hungarian             ---------        ---------      fail

     random_integer
        scipy                   2045.29             zero      pass
        jonkervolgenant            6.03             zero      pass
        lapsolver                154.98             zero      pass
        gatagat_lapjv             55.70             zero      pass
        srcd_lapjv            ---------        ---------      fail
        hungarian             ---------        ---------      fail

The fastest package varies from test to test. I think correctness and backwards compatibility are more important than speed though. Any C or C++ replacement is also a lot faster than the current implementation.

rgommers · 2019-06-14T10:01:48Z

lapsolver 2.02 1.480e-08 fail

The error is about np.sqrt(np.finfo(np.float64).eps) here. Is that really a hard fail? I'm not sure what's expected, but zero in those comparisons is in general never the case for floating point computations. It looks more like lapsolver is finding the right solution, but perhaps stops a little early.

cheind · 2019-06-14T10:48:04Z

I wanted to raise some concerns about the benchmarks from @pmla. My concerns stem from probably
not fully understanding your test systematic.

(1) I'm wondering why the tests stop after 400x800, which I consider a rather moderate matrix size?
(2.1) Regarding runtime: I don't think its ideal to measure the execution time of a single call, especially when matrix size is small. AFAIK you don't account for the overhead involved, do you?
(2.2) You don't seem to account for the variance in execution times, do you?

ad (2) I recommend using a package such as timeit or pytest-benchmarks.

(3) How is correctness measured? I've found the following related snippet

        # verify result
        a, b = result
        if cost_matrix.shape[1] > cost_matrix.shape[0]:
            b, a = result
        assert len(np.unique(a)) == len(a)
        assert (np.sort(b) == np.arange(len(b))).all()

but this does not account for the true cost, does it?

pmla · 2019-06-14T20:12:11Z

@rgommers Regarding the hard fail: if the elements of the cost matrix are sufficiently small, lapsolver returns the identity permutation every time. To illustrate,
lapsolver.solve_dense(np.random.uniform(0, 1E-10, (n, n)))
appears to return the identity permutation for any value of n.
This is probably not an issue for most applications, but the current scipy implementation finds the minimum cost solution for all inputs (or, presumably, at least to machine precision) and I don't think the replacement package should give different results. The hard-coded tolerance in lapsolver could perhaps be adjusted to correct this effect. I will leave this to @cheind to test.

@cheind Thanks for your comments. The benchmark sizes are indeed fairly arbitrary. I have included one square and one rectangular size. The times are taken as the mean of 10 runs. I have only included the call to the solver in the timing, not the generation of the cost matrix or the post-processing needed to get the results into the scipy format. Correctness is determined by the result with the lowest sum-of-costs. The error sum is the sum of differences from these values.

Here are some benchmarks for larger matrices:

square tests: (2000, 2000)        time (ms)        error sum
============================================================

     random_uniform
        jonkervolgenant              326.78             zero
        lapsolver                    418.63             zero
        gatagat_lapjv                166.21             zero
        srcd_lapjv                    88.92             zero
        hungarian                   1815.14             zero

     random_spatial
        jonkervolgenant              512.18             zero
        lapsolver                   1037.70             zero
        gatagat_lapjv               2735.41             zero
        srcd_lapjv                  2387.73             zero
        hungarian                   3799.04             zero

     random_logarithmic
        jonkervolgenant              366.02             zero
        lapsolver                     89.95        8.602e-08
        gatagat_lapjv                207.00             zero
        srcd_lapjv                   149.12             zero
        hungarian                   1862.73             zero

     random_integer
        jonkervolgenant              228.14             zero
        lapsolver                    448.89             zero
        gatagat_lapjv                 48.07             zero
        srcd_lapjv                    86.94             zero
        hungarian                   1223.96             zero

     random_binary
        jonkervolgenant               88.92             zero
        lapsolver                     76.69             zero
        gatagat_lapjv                 47.19             zero
        srcd_lapjv                    74.27             zero
        hungarian                     62.33             zero

retangular tests: (2000, 4000)    time (ms)        error sum
============================================================

     random_uniform
        jonkervolgenant              156.71             zero
        lapsolver                  21540.89             zero
        gatagat_lapjv              33055.64             zero
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

     random_spatial
        jonkervolgenant              173.52             zero
        lapsolver                  25569.14             zero
        gatagat_lapjv              36219.39             zero
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

     random_logarithmic
        jonkervolgenant              159.40             zero
        lapsolver                    274.15        8.434e-08
        gatagat_lapjv                334.27        4.849e-18
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

     random_integer
        jonkervolgenant              257.10             zero
        lapsolver                  11266.59             zero
        gatagat_lapjv               3528.06             zero
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

     random_binary
        jonkervolgenant              190.70             zero
        lapsolver                    294.99             zero
        gatagat_lapjv                430.82             zero
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

I have not included the current scipy implementation as it is slow.

rgommers · 2019-06-14T21:27:44Z

Regarding the hard fail: if the elements of the cost matrix are sufficiently small, lapsolver returns the identity permutation every time

Ah okay it's a scaling issue. We've had that a lot with other parts of optimize, and yes that's indeed a real issue that's best avoided. Luckily the fix is usually not difficult.

rgommers · 2019-07-18T14:12:53Z

A complete rewrite in C++ that's way faster was merged in gh-10296, so closing this.

Thanks @batconjurer for this PR and everyone for the interesting discussion.

pvanmulbregt added the scipy.optimize label Feb 19, 2019

rgommers added the enhancement A new feature or improvement label Feb 22, 2019

rgommers requested changes Feb 22, 2019

View reviewed changes

batconjurer added 9 commits February 25, 2019 23:50

Added further style corrections based on pycodestyle and autoflake

36c9367

Merge branch 'master' into performance_refactored_hungarian

c9ee472

Correction in one of the tests in test_hungarian.py

d501124

Merge branch 'performance_refactored_hungarian' of github.com:batconj…

55de25a

…urer/scipy into performance_refactored_hungarian

Correction in one of the tests in test_hungarian.py

54f0425

Fied more style issues and benchmarking

216c302

An issue with the docstrings seems to have occurred -- it recognized …

90d202f

…my formatting as a section called 'Definitions' which it was not expecting

rgommers self-requested a review February 27, 2019 07:19

rgommers approved these changes Feb 27, 2019

View reviewed changes

rgommers reviewed Feb 27, 2019

View reviewed changes

scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved

rgommers reviewed Feb 27, 2019

View reviewed changes

scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved

rgommers reviewed Feb 27, 2019

View reviewed changes

scipy/optimize/_hungarian.py Show resolved Hide resolved

rgommers added this to the 1.3.0 milestone Feb 28, 2019

rgommers reviewed Mar 28, 2019

View reviewed changes

benchmarks/benchmarks/hungarian.py Show resolved Hide resolved

tylerjereddy reviewed Apr 24, 2019

View reviewed changes

tylerjereddy modified the milestones: 1.3.0, 1.4.0 Apr 25, 2019

batconjurer changed the title ~~ENH: Refactored _hungarian.py for speed and added a minimize/maximize…~~ WIP : ENH: Refactored _hungarian.py for speed and added a minimize/maximize… May 6, 2019

rgommers mentioned this pull request Jun 10, 2019

ENH: hungarian algorithm accepts feasible infinities in cost matrix #10293

Closed

cheind mentioned this pull request Jun 11, 2019

Hello from py-lapsolver berhane/LAP-solvers#1

Closed

pmla mentioned this pull request Jun 12, 2019

ENH: C++ extension for linear assignment problem #10296

Merged

rgommers closed this Jul 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP : ENH: Refactored _hungarian.py for speed and added a minimize/maximize… #9800

WIP : ENH: Refactored _hungarian.py for speed and added a minimize/maximize… #9800

batconjurer commented Feb 10, 2019 •

edited

Loading

rgommers left a comment

rgommers commented Feb 27, 2019

tylerjereddy left a comment

tylerjereddy Apr 24, 2019

tylerjereddy Apr 24, 2019

tylerjereddy Apr 24, 2019

tylerjereddy commented Apr 25, 2019

batconjurer commented May 2, 2019

rgommers commented May 2, 2019

pmla commented Jun 10, 2019

rgommers commented Jun 10, 2019

pmla commented Jun 11, 2019

cheind commented Jun 11, 2019

berhane commented Jun 11, 2019

pmla commented Jun 11, 2019

rgommers commented Jun 14, 2019

cheind commented Jun 14, 2019 •

edited

Loading

pmla commented Jun 14, 2019

rgommers commented Jun 14, 2019

rgommers commented Jul 18, 2019

WIP : ENH: Refactored _hungarian.py for speed and added a minimize/maximize… #9800

WIP : ENH: Refactored _hungarian.py for speed and added a minimize/maximize… #9800

Conversation

batconjurer commented Feb 10, 2019 • edited Loading

Important Update :

The results of the benchmarking for hungarian.py in v1.2.2:

The results of the benchmarking for hungarian.py in v1.2.2:

rgommers left a comment

Choose a reason for hiding this comment

rgommers commented Feb 27, 2019

tylerjereddy left a comment

Choose a reason for hiding this comment

tylerjereddy Apr 24, 2019

Choose a reason for hiding this comment

tylerjereddy Apr 24, 2019

Choose a reason for hiding this comment

tylerjereddy Apr 24, 2019

Choose a reason for hiding this comment

tylerjereddy commented Apr 25, 2019

batconjurer commented May 2, 2019

rgommers commented May 2, 2019

pmla commented Jun 10, 2019

rgommers commented Jun 10, 2019

pmla commented Jun 11, 2019

cheind commented Jun 11, 2019

berhane commented Jun 11, 2019

pmla commented Jun 11, 2019

rgommers commented Jun 14, 2019

cheind commented Jun 14, 2019 • edited Loading

pmla commented Jun 14, 2019

rgommers commented Jun 14, 2019

rgommers commented Jul 18, 2019

batconjurer commented Feb 10, 2019 •

edited

Loading

cheind commented Jun 14, 2019 •

edited

Loading