Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP : ENH: Refactored _hungarian.py for speed and added a minimize/maximize… #9800

Closed

Conversation

batconjurer
Copy link

@batconjurer batconjurer commented Feb 10, 2019

… switch

Important Update :

After uncovering and fixing a serious bug, there is no longer a speed improvement. This pull request should not be approved for now unless I can speed it up again, which will take time.

A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py.
The use of state machine has been removed and the algorithm more
closely follows the standard exposition in the literature (e.g. Wikipedia).
This yields a performance improvement. Benchmarking has also been added.

Also, there is now an optional parameter to specify if minimization or
maximization of the assignment is desired.

The results of the benchmarking for hungarian.py in v1.2.2:

[ 0.00%] · For scipy commit a3e96ed <v1.2.2>:
[ 0.00%] ·· Benchmarking virtualenv-py3.6-Cython0.27.3-Tempita-numpy-pytest-six
[ 50.00%] ··· Running (hungarian.Hungarian.time_solve--).
[100.00%] ··· hungarian.Hungarian.time_solve ok
[100.00%] ··· ============= ============
matrix_size
------------- ------------
100 45.7±0.2ms
300 1.31±0.1s
500 7.51±0.01s
700 25.6±0.01s
============= ============

The results of the benchmarking for hungarian.py in v1.2.2:

[ 0.00%] · For scipy commit 832e219 <performance_refactored_hungarian~1>:
[ 0.00%] ·· Benchmarking virtualenv-py3.6-Cython0.27.3-Tempita-numpy-pytest-six
[ 0.00%] ··· Running (hungarian.Hungarian.time_solve--).
[ 0.01%] ··· hungarian.Hungarian.time_solve ok
[ 0.01%] ··· ============= ============
matrix_size
------------- ------------
100 48.8±0.1ms
300 565±3ms
500 2.58±0.01s
700 7.75±0s
============= ============

… switch

A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py.
The use of state machine has been removed and the algorithm more
closely follows the standard exposition in the literature (e.g. Wikipedia).
This yields a performance improvement. Benchmarking can be found
in my personal development repository:
https://github.com/batconjurer/munkres-algorithm
Also, there is now an optional parameter to specify if minimization or
maximization of the assignment is desired.
@rgommers rgommers added the enhancement A new feature or improvement label Feb 22, 2019
Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @batconjurer, a performance increase would be nice here. I made a lot of detailed comments; at a high level:

I haven't looked at the algorithm yet, too many other things need fixing first

scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved
scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved
scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved
scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved
scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved
scipy/optimize/tests/test_hungarian.py Outdated Show resolved Hide resolved
scipy/optimize/tests/test_hungarian.py Show resolved Hide resolved
scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved
scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved
scipy/optimize/_hungarian.py Outdated Show resolved Hide resolved
 * Corrected header copyright and authorship info
 * Typeset math properly in main docstring
 * Fixed style issues
 * Added clearer references to where the algorithm was taken
 * Removed np.matrix
 * Refactored tests to use public API as much as possible
 * Move the public docstring inside of main public API function
 * Corrected the copy method for arrays
 * Added benchmarking
…urer/scipy into performance_refactored_hungarian
…my formatting as a section called 'Definitions' which it was not expecting
switch

A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py.
The use of state machine has been removed and the algorithm more
closely follows the standard exposition in the literature (e.g. Wikipedia).
This yields a performance improvement. Benchmarking has also been added.

Also, there is now an optional parameter to specify if minimization or
maximization of the assignment is desired.
@rgommers rgommers self-requested a review February 27, 2019 07:19
@rgommers
Copy link
Member

This looks much better now.

Pinging @larsmans as the original author of this code, in case he's still around and wants to have a look.

switch

A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py.
The use of state machine has been removed and the algorithm more
closely follows the standard exposition in the literature (e.g. Wikipedia).
This yields a performance improvement. Benchmarking has also been added.

Also, there is now an optional parameter to specify if minimization or
maximization of the assignment is desired.
@rgommers rgommers added this to the 1.3.0 milestone Feb 28, 2019
…eswitch

A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py.
The use of state machine has been removed and the algorithm more
closely follows the standard exposition in the literature (e.g. Wikipedia).
This yields a performance improvement. Benchmarking can be found
in my personal development repository:
https://github.com/batconjurer/munkres-algorithm
Also, there is now an optional parameter to specify if minimization or
maximization of the assignment is desired.
Copy link
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • test coverage looks solid & no old tests modified
  • reasonably positive feedback from Ralf, though detailed review of the algorithm may not have happened
  • rebase might be good given 25 days since last push
  • even better would be a resident optimize expert signing off

for row in np.nonzero(self.row_saturated == False)[0]:
path_row, path_col = self._aug_path(row)
if not path_col:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continue is never hit in the tests

path_row, path_col = self._aug_path(row)
if not path_col:
continue
if not len(path_row + path_col) % 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the if not is always true in tests

break

# Recreate augmented path from linked list if possible
if last_col is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always true in tests

@tylerjereddy
Copy link
Contributor

Bumping milestone, but let's try to get this in soon. I noticed yesterday that attempting to rebase on latest master was a little messy, probably because of all the commits.

@tylerjereddy tylerjereddy modified the milestones: 1.3.0, 1.4.0 Apr 25, 2019
…eswitch

A refactoring of scipy.optimize.linear_sum_assignment in _hungarian.py.
The use of state machine has been removed and the algorithm more
closely follows the standard exposition in the literature (e.g. Wikipedia).
This yields a performance improvement. Benchmarking can be found
in my personal development repository:
https://github.com/batconjurer/munkres-algorithm
Also, there is now an optional parameter to specify if minimization or
maximization of the assignment is desired.

** Update **
A logic error was discovered in the Wikipedia entry of
the Hungarian algorithm which was propagated into the code. This
has now been fixed and covered with a test.
@batconjurer
Copy link
Author

Bad news. In reviewing tests, I discovered an error that originated in Wikipedia and that I unfortunately duplicated in the code. It is fixed now, but the speed improvement is evaporated. So this pull request should be deleted or put on hold (not sure the proper protocol here) while I work on speeding it up if possible.

@rgommers
Copy link
Member

rgommers commented May 2, 2019

That's too bad. Good you caught it though!

You can just change your PR title to WIP, that's a sign that it's not ready to merge. Then you can just leave it open and come back to it later.

@batconjurer batconjurer changed the title ENH: Refactored _hungarian.py for speed and added a minimize/maximize… WIP : ENH: Refactored _hungarian.py for speed and added a minimize/maximize… May 6, 2019
@pmla
Copy link
Contributor

pmla commented Jun 10, 2019

Is there any appetite for a C extension for this algorithm? The Hungarian algorithm is important for many applications, and it appears that we could get orders of magnitude better performance with C code.

The py-lapsolver looks like a good candidate. It it MIT licensed and the benchmarks demonstrate its performance: https://github.com/cheind/py-lapsolver
I am happy to adapt the code and do the necessary integration and testing.

@rgommers What do you think?

@rgommers
Copy link
Member

@pmla those benchmarks do seem to indicate that py-lapsolver is faster than our version. So yes, there's interest. A couple of thoughts:

  • py-lapsolver relies on pybind11. We don't have that build dependency yet, but are biting that bullet right now in ENH: Add new scipy.fft module using pocketfft #10238 so likely fine (it did require a bit of messing with the build system there, that then should be moved to scipy/_build_utils probably).
  • The original author of linear_sum_assignment is not around anymore, which makes it harder for us to review code at the moment. There's multiple interested people though, so ideally you could collaborate on this - if two people agree that the algorithmic details are right, that makes the job of the person doing the final review/merge a lot easier.
  • The "do we accept C/C++/Cython (or even Fortran)" is always a tradeoff between performance and maintenance cost. In this case, the benchmarks are comprehensive so it seems like a no-brainer to me to accept a C++ version here.
  • Always nice to ask the original author how they feel about inclusion of their code. So pinging @cheind here :)

@pmla
Copy link
Contributor

pmla commented Jun 11, 2019

@rgommers It turns out that py-lapsolver does not support rectangular matrices, which the current implementation does. Apologies.

Instead, I have written an implementation of the Jonker-Volgenant algorithm in C++. It supports everything the current implementation does, as well as the maximization option proposed in this pull request and the feasible infinities I proposed in #10293.

I will open a separate pull request when I have finished writing tests and documentation.

@cheind
Copy link

cheind commented Jun 11, 2019

Hi, I'm the author of py-lapsolver package mentioned.

  • @rgommers pybind11 is a header only build dependency. With some work the package should be easily adapted to any C-interfacing that scipy already supports.
  • Thanks for putting me in the loop, I have no objections that my code might be part of scipy. However, I have to say that my spare time resources are insufficient for maintaining / adapting it to scipy standards.
  • @pmla py-lapsolver supports rectangular matrices out of the box. See this test for example. The package is heavily used by py-motmetrics, a benchmark multiple object trackers, which has attained significant usage throughout the computer vision community (i.e Facebook). py-lapsolver is the primary solver (if installed) for huge rectangular matrices when matching object hypothesis and ground truth objects.
  • py-lapsolver uses partial template specialization in C++ that allows it to run faster if the cost matrix is integer based rather than double floating point [1].
  • Please note py-lapsolver benchmarks are based on random cost matrices. @kylemcdonald noted [1] that this is a rather unrealistic assumption for certain applications. In [1] he proposes an implementation that performs even faster (at for the given use-case).

[1] https://github.com/cheind/py-lapsolver/issues/1

@berhane
Copy link

berhane commented Jun 11, 2019

Hi All,
I had benchmarked six different LAP solvers including scipy much like @cheind . You may find the result informative or enjoy rerunning them yourself.

Module Python or C/C++/Cython Algorithm Link
scipy.optimize.linear_sum_assignment Python Hungarian https://github.com/scipy/scipy/
munkres.Munkres Python Hungarian https://github.com/bmc/munkres
hungarian.lap C++ Hungarian https://github.com/Hrldcpr/Hungarian
lap.lapjv C++ Jonker-Volgenant https://github.com/gatagat/lap
lapjv.lapjv C++ Jonker-Volgenant https://github.com/src-d/lapjv
lapsolver.solve_dense C++ shortest augmenting path https://github.com/cheind/py-lapsolver

@pmla
Copy link
Contributor

pmla commented Jun 11, 2019

@cheind apologies for misrepresenting your package. I was browsing your repository and saw that the Stanford ACM-ICPC code only supports square matrices. I see now that you pad the cost matrix.
@berhane thanks for the list of solvers.

I have written some benchmarks and tests for a selection of solvers. The code is a bit preliminary but you can find it here: https://github.com/pmla/linear-assignment-benchmarks

Any replacement for the current scipy implementation should have the same functionality. I have defined this as:

  1. Support for rectangular matrices, both where m > n and m < n.
  2. Correctly finding the optimal-cost solution for every input.

Other than the scipy implementation itself, none of the other modules I tested met these requirements. The C++ module I have written, jonkervolgenant, meets these requirements. It can also be found in the benchmark repository.

Here are the results of my tests and benchmarks:

square tests: (400, 400)      time (ms)        error sum    status
==================================================================

     random_uniform
        scipy                   2697.23             zero      pass
        jonkervolgenant            7.05             zero      pass
        lapsolver                  7.53             zero      pass
        gatagat_lapjv              4.21             zero      pass
        srcd_lapjv                 2.39             zero      pass
        hungarian                 16.76             zero      pass

     random_spatial
        scipy                   5332.93             zero      pass
        jonkervolgenant           10.78             zero      pass
        lapsolver                 17.84             zero      pass
        gatagat_lapjv             28.28             zero      pass
        srcd_lapjv                76.55             zero      pass
        hungarian                 35.89             zero      pass

     random_logarithmic
        scipy                   2856.26             zero      pass
        jonkervolgenant            7.54             zero      pass
        lapsolver                  2.02        1.480e-08      fail
        gatagat_lapjv              5.23             zero      pass
        srcd_lapjv                 4.53             zero      pass
        hungarian                 17.40             zero      pass

     random_integer
        scipy                   1374.58             zero      pass
        jonkervolgenant            7.01             zero      pass
        lapsolver                  8.30             zero      pass
        gatagat_lapjv              1.55             zero      pass
        srcd_lapjv                 1.66             zero      pass
        hungarian                 12.07             zero      pass

retangular tests: (400, 800)  time (ms)        error sum    status
==================================================================

     random_uniform
        scipy                   4247.94             zero      pass
        jonkervolgenant            5.02             zero      pass
        lapsolver                158.46             zero      pass
        gatagat_lapjv            199.59             zero      pass
        srcd_lapjv            ---------        ---------      fail
        hungarian             ---------        ---------      fail

     random_spatial
        scipy                   5256.62             zero      pass
        jonkervolgenant            4.60             zero      pass
        lapsolver                168.17             zero      pass
        gatagat_lapjv            197.78             zero      pass
        srcd_lapjv            ---------        ---------      fail
        hungarian             ---------        ---------      fail

     random_logarithmic
        scipy                   3600.59             zero      pass
        jonkervolgenant            4.55             zero      pass
        lapsolver                  3.11        1.747e-08      fail
        gatagat_lapjv              8.14        6.727e-18      fail
        srcd_lapjv            ---------        ---------      fail
        hungarian             ---------        ---------      fail

     random_integer
        scipy                   2045.29             zero      pass
        jonkervolgenant            6.03             zero      pass
        lapsolver                154.98             zero      pass
        gatagat_lapjv             55.70             zero      pass
        srcd_lapjv            ---------        ---------      fail
        hungarian             ---------        ---------      fail

The fastest package varies from test to test. I think correctness and backwards compatibility are more important than speed though. Any C or C++ replacement is also a lot faster than the current implementation.

@rgommers
Copy link
Member

lapsolver 2.02 1.480e-08 fail

The error is about np.sqrt(np.finfo(np.float64).eps) here. Is that really a hard fail? I'm not sure what's expected, but zero in those comparisons is in general never the case for floating point computations. It looks more like lapsolver is finding the right solution, but perhaps stops a little early.

@cheind
Copy link

cheind commented Jun 14, 2019

I wanted to raise some concerns about the benchmarks from @pmla. My concerns stem from probably
not fully understanding your test systematic.

(1) I'm wondering why the tests stop after 400x800, which I consider a rather moderate matrix size?
(2.1) Regarding runtime: I don't think its ideal to measure the execution time of a single call, especially when matrix size is small. AFAIK you don't account for the overhead involved, do you?
(2.2) You don't seem to account for the variance in execution times, do you?

ad (2) I recommend using a package such as timeit or pytest-benchmarks.

(3) How is correctness measured? I've found the following related snippet

        # verify result
        a, b = result
        if cost_matrix.shape[1] > cost_matrix.shape[0]:
            b, a = result
        assert len(np.unique(a)) == len(a)
        assert (np.sort(b) == np.arange(len(b))).all()

but this does not account for the true cost, does it?

@pmla
Copy link
Contributor

pmla commented Jun 14, 2019

@rgommers Regarding the hard fail: if the elements of the cost matrix are sufficiently small, lapsolver returns the identity permutation every time. To illustrate,
lapsolver.solve_dense(np.random.uniform(0, 1E-10, (n, n)))
appears to return the identity permutation for any value of n.
This is probably not an issue for most applications, but the current scipy implementation finds the minimum cost solution for all inputs (or, presumably, at least to machine precision) and I don't think the replacement package should give different results. The hard-coded tolerance in lapsolver could perhaps be adjusted to correct this effect. I will leave this to @cheind to test.

@cheind Thanks for your comments. The benchmark sizes are indeed fairly arbitrary. I have included one square and one rectangular size. The times are taken as the mean of 10 runs. I have only included the call to the solver in the timing, not the generation of the cost matrix or the post-processing needed to get the results into the scipy format. Correctness is determined by the result with the lowest sum-of-costs. The error sum is the sum of differences from these values.

Here are some benchmarks for larger matrices:

square tests: (2000, 2000)        time (ms)        error sum
============================================================

     random_uniform
        jonkervolgenant              326.78             zero
        lapsolver                    418.63             zero
        gatagat_lapjv                166.21             zero
        srcd_lapjv                    88.92             zero
        hungarian                   1815.14             zero

     random_spatial
        jonkervolgenant              512.18             zero
        lapsolver                   1037.70             zero
        gatagat_lapjv               2735.41             zero
        srcd_lapjv                  2387.73             zero
        hungarian                   3799.04             zero

     random_logarithmic
        jonkervolgenant              366.02             zero
        lapsolver                     89.95        8.602e-08
        gatagat_lapjv                207.00             zero
        srcd_lapjv                   149.12             zero
        hungarian                   1862.73             zero

     random_integer
        jonkervolgenant              228.14             zero
        lapsolver                    448.89             zero
        gatagat_lapjv                 48.07             zero
        srcd_lapjv                    86.94             zero
        hungarian                   1223.96             zero

     random_binary
        jonkervolgenant               88.92             zero
        lapsolver                     76.69             zero
        gatagat_lapjv                 47.19             zero
        srcd_lapjv                    74.27             zero
        hungarian                     62.33             zero

retangular tests: (2000, 4000)    time (ms)        error sum
============================================================

     random_uniform
        jonkervolgenant              156.71             zero
        lapsolver                  21540.89             zero
        gatagat_lapjv              33055.64             zero
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

     random_spatial
        jonkervolgenant              173.52             zero
        lapsolver                  25569.14             zero
        gatagat_lapjv              36219.39             zero
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

     random_logarithmic
        jonkervolgenant              159.40             zero
        lapsolver                    274.15        8.434e-08
        gatagat_lapjv                334.27        4.849e-18
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

     random_integer
        jonkervolgenant              257.10             zero
        lapsolver                  11266.59             zero
        gatagat_lapjv               3528.06             zero
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

     random_binary
        jonkervolgenant              190.70             zero
        lapsolver                    294.99             zero
        gatagat_lapjv                430.82             zero
        srcd_lapjv                ---------        ---------
        hungarian                 ---------        ---------

I have not included the current scipy implementation as it is slow.

@rgommers
Copy link
Member

Regarding the hard fail: if the elements of the cost matrix are sufficiently small, lapsolver returns the identity permutation every time

Ah okay it's a scaling issue. We've had that a lot with other parts of optimize, and yes that's indeed a real issue that's best avoided. Luckily the fix is usually not difficult.

@rgommers
Copy link
Member

A complete rewrite in C++ that's way faster was merged in gh-10296, so closing this.

Thanks @batconjurer for this PR and everyone for the interesting discussion.

@rgommers rgommers closed this Jul 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants