Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] MAINT Remove get_memview_* helpers in neighbours.BinaryTree #19893

Merged
merged 5 commits into from
Apr 20, 2021

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Apr 14, 2021

Reference Issues/PRs

Follow up of and closes #15097
See also #4217

What does this implement/fix? Explain your changes.

As indicated by @rth in #15097, the get_memview_* Cython helpers in neighbours.BinaryTree were used because at the time numpy and Cython didn't fully support memoryviews.

This removes this adaptor which are not longer needed.

Any other comments?

This is a draft as benchmarks have to be run to access any regression in performance from main.

@jjerphan
Copy link
Member Author

Here are results of some benchmarks. It looks like performances generally are better.

I'd like to try on another machine with more cores before concluding.

 → asv continuous -b Binary main rm-get_memview-nn-routines
· Creating environments
· Discovering benchmarks
· Running 4 total benchmarks (2 commits * 1 environments * 2 benchmarks)
[  0.00%] · For scikit-learn commit 33c85c85 <rm-get_memview-nn-routines> (round 1/1):
[  0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[ 25.00%] ··· binary_tree.BinaryTreeStatsCreation.time_creation                                                                                                                                                                                                                                                              ok
[ 25.00%] ··· ======================================= ======= ============ ============ ============ ============ ============= ============
              --                                                                              d / leaf_size
              ----------------------------------------------- ------------------------------------------------------------------------------
                             BinaryTree                  n      10 / 10      10 / 100     100 / 10    100 / 100     1000 / 10    1000 / 100
              ======================================= ======= ============ ============ ============ ============ ============= ============
                 sklearn.neighbors._kd_tree.KDTree      100     117±6μs      75.0±3μs     507±20μs     133±8μs      4.52±0.2ms    716±50μs
                 sklearn.neighbors._kd_tree.KDTree      1000   1.26±0.2ms    546±20μs    8.48±0.9ms   4.14±0.1ms     84.1±4ms    42.6±0.9ms
                 sklearn.neighbors._kd_tree.KDTree     10000   13.7±0.6ms    9.44±1ms     127±6ms      86.4±3ms     1.25±0.03s    831±20ms
               sklearn.neighbors._ball_tree.BallTree    100     102±3μs      71.6±2μs     335±7μs      95.9±4μs    2.66±0.09ms    298±6μs
               sklearn.neighbors._ball_tree.BallTree    1000    722±20μs     416±10μs    4.96±0.1ms   2.82±0.2ms     52.6±3ms     28.0±1ms
               sklearn.neighbors._ball_tree.BallTree   10000   9.53±0.2ms   6.76±0.2ms    82.8±3ms    56.6±0.7ms     825±5ms      590±10ms
              ======================================= ======= ============ ============ ============ ============ ============= ============

[ 50.00%] ··· binary_tree.BinaryTreeStatsQuery.time_query                                                                                                                                                                                                                                                           4/36 failed
[ 50.00%] ··· ======================================= ======= ============ ============ ============= ============= ============ ============
              --                                                                               d / leaf_size
              ----------------------------------------------- -------------------------------------------------------------------------------
                             BinaryTree                  n      10 / 10      10 / 100      100 / 10     100 / 100    1000 / 10    1000 / 100
              ======================================= ======= ============ ============ ============= ============= ============ ============
                 sklearn.neighbors._kd_tree.KDTree      100     480±10μs     208±5μs     3.37±0.08ms   1.29±0.05ms   32.6±0.7ms   13.4±0.7ms
                 sklearn.neighbors._kd_tree.KDTree      1000    27.6±1ms    14.0±0.3ms     317±20ms      150±20ms    3.06±0.1s    1.41±0.01s
                 sklearn.neighbors._kd_tree.KDTree     10000    799±10ms     703±4ms      40.2±0.6s     23.6±0.7s      failed       failed
               sklearn.neighbors._ball_tree.BallTree    100     253±7μs      207±6μs     1.41±0.05ms   1.19±0.04ms   13.1±0.4ms   11.2±0.3ms
               sklearn.neighbors._ball_tree.BallTree    1000   18.0±0.6ms   14.4±0.5ms    129±0.7ms      118±2ms     1.39±0.01s    1.25±0s
               sklearn.neighbors._ball_tree.BallTree   10000   1.11±0.03s   1.79±0.02s    23.9±0.3s     21.1±0.2s      failed       failed
              ======================================= ======= ============ ============ ============= ============= ============ ============

[ 50.00%] · For scikit-learn commit 7fa2e6e2 <main> (round 1/1):
[ 50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl..
[ 50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[ 75.00%] ··· binary_tree.BinaryTreeStatsCreation.time_creation                                                                                                                                                                                                                                                              ok
[ 75.00%] ··· ======================================= ======= ============= ============ ============ ============= ============= ============
              --                                                                               d / leaf_size
              ----------------------------------------------- --------------------------------------------------------------------------------
                             BinaryTree                  n       10 / 10      10 / 100     100 / 10     100 / 100     1000 / 10    1000 / 100
              ======================================= ======= ============= ============ ============ ============= ============= ============
                 sklearn.neighbors._kd_tree.KDTree      100      147±4μs      96.5±3μs     583±10μs      157±5μs      4.74±0.1ms    715±20μs
                 sklearn.neighbors._kd_tree.KDTree      1000   1.07±0.03ms    564±10μs    8.07±0.2ms   4.26±0.09ms     87.7±1ms    45.2±0.5ms
                 sklearn.neighbors._kd_tree.KDTree     10000    13.1±0.3ms   8.87±0.2ms    125±2ms      84.3±0.7ms    1.21±0.02s    839±4ms
               sklearn.neighbors._ball_tree.BallTree    100      119±3μs      87.7±2μs     365±10μs      111±2μs     2.74±0.06ms    316±8μs
               sklearn.neighbors._ball_tree.BallTree    1000     737±20μs     448±10μs    5.45±0.1ms   2.77±0.06ms     57.3±2ms    29.9±0.7ms
               sklearn.neighbors._ball_tree.BallTree   10000    10.1±0.2ms   6.93±0.2ms   85.0±0.7ms    59.9±0.8ms     845±20ms     634±8ms
              ======================================= ======= ============= ============ ============ ============= ============= ============

[100.00%] ··· binary_tree.BinaryTreeStatsQuery.time_query                                                                                                                                                                                                                                                           4/36 failed
[100.00%] ··· ======================================= ======= ============ ============ ============= ============= ============ ============
              --                                                                               d / leaf_size
              ----------------------------------------------- -------------------------------------------------------------------------------
                             BinaryTree                  n      10 / 10      10 / 100      100 / 10     100 / 100    1000 / 10    1000 / 100
              ======================================= ======= ============ ============ ============= ============= ============ ============
                 sklearn.neighbors._kd_tree.KDTree      100     488±20μs     217±7μs     3.41±0.09ms   1.30±0.04ms    33.4±1ms    13.9±0.4ms
                 sklearn.neighbors._kd_tree.KDTree      1000   31.5±0.6ms   16.5±0.5ms     311±5ms       168±30ms    3.49±0.1s    1.45±0.05s
                 sklearn.neighbors._kd_tree.KDTree     10000    808±30ms     734±10ms     38.6±0.1s     24.8±0.3s      failed       failed
               sklearn.neighbors._ball_tree.BallTree    100     268±8μs      209±5μs     1.36±0.04ms   1.22±0.03ms   13.2±0.3ms   11.5±0.2ms
               sklearn.neighbors._ball_tree.BallTree    1000   17.4±0.4ms   14.7±0.6ms     137±3ms       124±6ms     1.41±0.04s   1.22±0.03s
               sklearn.neighbors._ball_tree.BallTree   10000   1.17±0.04s   1.83±0.08s    23.1±0.04s    21.3±0.8s      failed       failed
              ======================================= ======= ============ ============ ============= ============= ============ ============

       before           after         ratio
     [7fa2e6e2]       [33c85c85]
     <main>           <rm-get_memview-nn-routines>
+     1.07±0.03ms       1.26±0.2ms     1.17  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 1000, 10, 10)
-      31.5±0.6ms         27.6±1ms     0.87  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._kd_tree.KDTree'>, 1000, 10, 10)
-        583±10μs         507±20μs     0.87  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 100, 100, 10)
-         111±2μs         95.9±4μs     0.87  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 100, 100, 100)
-         119±3μs          102±3μs     0.85  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 100, 10, 10)
-      16.5±0.5ms       14.0±0.3ms     0.85  binary_tree.BinaryTreeStatsQuery.time_query(<class 'sklearn.neighbors._kd_tree.KDTree'>, 1000, 10, 100)
-         157±5μs          133±8μs     0.85  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 100, 100, 100)
-        87.7±2μs         71.6±2μs     0.82  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._ball_tree.BallTree'>, 100, 10, 100)
-         147±4μs          117±6μs     0.80  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 100, 10, 10)
-        96.5±3μs         75.0±3μs     0.78  binary_tree.BinaryTreeStatsCreation.time_creation(<class 'sklearn.neighbors._kd_tree.KDTree'>, 100, 10, 100)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

@jjerphan jjerphan marked this pull request as ready for review April 20, 2021 09:05
@jjerphan jjerphan changed the title MAINT Remove get_memview_* helpers in neighbours.BinaryTree [MRG] MAINT Remove get_memview_* helpers in neighbours.BinaryTree Apr 20, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the code is simpler, the tests still pass and there is no performance regression (as expected). +1 for merge as it is.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jjerphan !

@rth rth merged commit dd7b7e5 into scikit-learn:main Apr 20, 2021
@jjerphan
Copy link
Member Author

Thanks @rth: this was an easy win thanks to your original work. 🙂

@jjerphan jjerphan deleted the rm-get_memview-nn-routines branch April 20, 2021 09:16
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants