FIX Fixes performance regression in trees #23404

thomasjpfan · 2022-05-17T23:50:36Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR adds the heapsort part of introsort back into simultaneous_sort as a flag.

Using the benchmark for low cardinality, I get 3.24 s on main, 0.11 s with this PR, and 0.07 s on 1.0.X.

This PR makes the performance about the same compared to main, but still much faster compared to 1.0.1.

Original with high cardinality benchmark

from time import perf_counter
import json
from statistics import mean, stdev

from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_classification
from collections import defaultdict


N_SAMPLES = [1_000, 5_000, 10_000, 20_000]
N_REPEATS = 5

results = defaultdict(list)

for n_samples in N_SAMPLES:
    for n_repeat in range(N_REPEATS):
        X, y = make_classification(
            random_state=n_repeat, n_samples=n_samples, n_features=100
        )
        tree = DecisionTreeClassifier(random_state=n_repeat)
        start = perf_counter()
        tree.fit(X, y)
        duration = perf_counter() - start
        results[n_samples].append(duration)
    results_mean, results_stdev = mean(results[n_samples]), stdev(results[n_samples])
    print(f"n_samples={n_samples} with {results_mean:.3f} +/- {results_stdev:.3f}")

This PR

n_samples=1000 with 0.043 +/- 0.006
n_samples=5000 with 0.410 +/- 0.116
n_samples=10000 with 1.085 +/- 0.078
n_samples=20000 with 3.276 +/- 0.484

main

n_samples=1000 with 0.044 +/- 0.006
n_samples=5000 with 0.398 +/- 0.108
n_samples=10000 with 1.048 +/- 0.077
n_samples=20000 with 3.179 +/- 0.466

1.0.1

n_samples=1000 with 0.049 +/- 0.007
n_samples=5000 with 0.472 +/- 0.128
n_samples=10000 with 1.240 +/- 0.086
n_samples=20000 with 3.810 +/- 0.560

ogrisel · 2022-05-18T08:14:13Z

I also tried the low cardinality benchmark:

on this branch:

In [2]: %timeit tree.fit(X, y)
175 ms ± 1.71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

on 1.0.2:

In [2]: %timeit tree.fit(X, y)
97.2 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So this is almost a 2x slowdown performance regression on low cardinality data.

I am not sure the small perf gain we get on the high cardinality case is worth it.

EDIT: The 2x slowdown factor stays when I switched n_samples = 50k to 200k in the low cardinality benchmark.

ogrisel · 2022-05-18T09:21:29Z

I have tried to re-implement the old way to do the partitioning in quick sort as part of thomasjpfan#109. However this degrades the performance even more.

The only thing that I have not tried to reimplement the manual tail-call elimination optimisation of the second recursive call.

ogrisel · 2022-05-18T07:56:11Z

doc/whats_new/v1.1.rst

+:mod:`sklearn.tree`
+...................
+
+- |Fix| Fixes performance regression :class:`tree.DecisionTreeClassifier`,


Suggested change

- |Fix| Fixes performance regression :class:`tree.DecisionTreeClassifier`,

- |Fix| Fixes performance regression with low cardinality features for

:class:`tree.DecisionTreeClassifier`,

ogrisel · 2022-05-18T10:33:16Z

I explored the hypothesis of using tail call elimination in thomasjpfan#110 and this is the cause of the slowdown either.

I also tried to use log instead of log2 to configure the switch to heapsort and that did not seem to be significant either...

ogrisel

I think we should try some profiling to try to identify the discrepancy.

We could also try to reimplement this on top of a fork, prior to the switch to typed memory views. Not sure if it's related or not.

In the mean time here are nitpicks.

ogrisel · 2022-05-18T11:31:58Z

sklearn/utils/_sorting.pyx

+    if use_introsort == 1:
+        _simultaneous_sort(values, indices, size, 2 * <int>log2(size), 1)
+    else:
+        _simultaneous_sort(values, indices, size, -1, 0)


The return type is int but we never return anything (0 is implicit in this case I guess).

Let's switch to void?

ogrisel · 2022-05-18T11:32:24Z

sklearn/utils/_sorting.pyx

-                              size - pivot_idx - 1)
+            _simultaneous_sort(values + pivot_idx + 1,
+                               indices + pivot_idx + 1,
+                               size - pivot_idx - 1, max_depth - 1, use_introsort)
    return 0


Same comment for the private helper function: the return type should be void.

ogrisel · 2022-05-18T11:33:21Z

sklearn/utils/_sorting.pyx

+
+cdef inline void heapsort(
+    floating* values,
+    ITYPE_t* samples,


This should be renamed to indices to use a consistent notation with _simultaneous_sort.

glemaitre · 2022-05-18T13:44:55Z

I used the current sorting with pointers (instead of the typed memory views) and I get the following bench:

with 1.0.2:

In [3]: %timeit tree.fit(X, y)
69.7 ms ± 554 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

with the current PR:

In [3]: %timeit tree.fit(X, y)
114 ms ± 320 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using pointers instead of typed memoryviews:

In [4]: %timeit tree.fit(X, y)
114 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

So it is not coming from the memoryviews.

ogrisel

I have no idea what can explain de remaining perf difference. I tried to do some profiling with py-spy record --format=speedscope -o out.speedscope --native -- python bench_script.py and indeed the difference seem to happen below the sorting calls in BestSplitter but since all the code is inline I cannot see below that level.

Maybe linux perf could help. Alternatively one could instrument the code with counters to record the number of times each of the outer quicksort and inner heapsort functions are called in both branches.

However I don't have the time to do it now, so I am fine with merging this PR because its already fixing most of the regression.

ogrisel · 2022-05-18T14:14:29Z

sklearn/utils/_sorting.pyx

-cdef int simultaneous_sort(
+cdef inline void sift_down(
+    floating* values,
+    ITYPE_t* samples,


samples => indices to be renamed here as well.

lesteve · 2022-05-18T14:26:10Z

FWIW reverting #22868 goes back to 1.0.2 performance for me, so it does indicate that #22868 is the only thing causing the performance regression (i.e. that there is no other changes at play):

# reverts https://github.com/scikit-learn/scikit-learn/pull/22868
git revert 4cf932d98
make in
# run your benchmarks here

ogrisel · 2022-05-18T14:50:08Z

@lesteve can you please do a side-PR that does this on top of the current main? As far as I understand there is a bit of tweaking to do to use the memory views for Xf and samples.

lesteve · 2022-05-18T15:31:32Z

@lesteve can you please do a side-PR that does this on top of the current main? As far as I understand there is a bit of tweaking to do to use the memory views for Xf and samples.

I opened #23410. And indeed there were some conflicts to fix, I was probably navigating the history too much so I reverted for somewhere else than main and the revert did not have any conflicts when I posted my previous message ...

I get the same performance as 1.0.2 in #23410.

ogrisel · 2022-05-19T07:50:49Z

Closing in favor of #23410 that still requires a custom backport.

thomasjpfan added 2 commits May 17, 2022 18:14

FIX Fixes performance regression in tree for low cardinality data

b382af5

DOC Adds whats new

527bb22

github-actions bot added module:tree module:utils cython labels May 17, 2022

thomasjpfan added 2 commits May 17, 2022 18:50

DOC Adds whats new

0a7e16f

ENH Simplify logic

62ffd5e

ogrisel mentioned this pull request May 18, 2022

[NOMRG] PERF DEBUG try to restore the previous way to partition in quicksort thomasjpfan/scikit-learn#109

Closed

ogrisel reviewed May 18, 2022

View reviewed changes

glemaitre self-requested a review May 18, 2022 09:22

ogrisel mentioned this pull request May 18, 2022

[NOMRG] PERF DEBUG try to implement manual tail call elimination in quicksort thomasjpfan/scikit-learn#110

Closed

scikit-learn deleted a comment from ogrisel May 18, 2022

ogrisel reviewed May 18, 2022

View reviewed changes

ogrisel approved these changes May 18, 2022

View reviewed changes

lesteve mentioned this pull request May 18, 2022

FIX fix performance regression in trees with low-cardinality features #23410

Merged

ogrisel closed this May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX Fixes performance regression in trees #23404

FIX Fixes performance regression in trees #23404

thomasjpfan commented May 17, 2022 •

edited

ogrisel commented May 18, 2022 •

edited

ogrisel commented May 18, 2022 •

edited

ogrisel May 18, 2022

ogrisel commented May 18, 2022

ogrisel left a comment

ogrisel May 18, 2022

ogrisel May 18, 2022

ogrisel May 18, 2022

glemaitre commented May 18, 2022

ogrisel left a comment

ogrisel May 18, 2022

lesteve commented May 18, 2022 •

edited

ogrisel commented May 18, 2022 •

edited

lesteve commented May 18, 2022 •

edited

ogrisel commented May 19, 2022 •

edited

	- \|Fix\| Fixes performance regression :class:`tree.DecisionTreeClassifier`,
	- \|Fix\| Fixes performance regression with low cardinality features for
	:class:`tree.DecisionTreeClassifier`,

FIX Fixes performance regression in trees #23404

FIX Fixes performance regression in trees #23404

Conversation

thomasjpfan commented May 17, 2022 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR

main

1.0.1

ogrisel commented May 18, 2022 • edited

ogrisel commented May 18, 2022 • edited

ogrisel May 18, 2022

Choose a reason for hiding this comment

ogrisel commented May 18, 2022

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel May 18, 2022

Choose a reason for hiding this comment

ogrisel May 18, 2022

Choose a reason for hiding this comment

ogrisel May 18, 2022

Choose a reason for hiding this comment

glemaitre commented May 18, 2022

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel May 18, 2022

Choose a reason for hiding this comment

lesteve commented May 18, 2022 • edited

ogrisel commented May 18, 2022 • edited

lesteve commented May 18, 2022 • edited

ogrisel commented May 19, 2022 • edited

thomasjpfan commented May 17, 2022 •

edited

ogrisel commented May 18, 2022 •

edited

ogrisel commented May 18, 2022 •

edited

lesteve commented May 18, 2022 •

edited

ogrisel commented May 18, 2022 •

edited

lesteve commented May 18, 2022 •

edited

ogrisel commented May 19, 2022 •

edited