TST Speed up slow test linear_model/tests/test_quantile.py::test_asymmetric_error #21546

simonandras · 2021-11-03T23:38:13Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Currently the test with the 3 different settings runs in 6-7 seconds on my computer. The goal is to achieve less then 2-3 sec runtime (less then 1 sec / one quantile).

About the test so far:

The test_asymmetric_error function tests the expected behavior of the QuantileRegressor estimator on a generated data where the quantile is linear and known. The data is assymetric, which means that it consists of 2 parts which are different (see the plot).

Ideas so far:

The basic idea is to reduce the sample size of the test data (currently it is 1000). However if i do that the assertions will fail with the current error tolerance. Noticed that if i increase the data size to 1100 the tests will fail also. For first it looks wierd, and the error bounds seems for me arbitrary so far.

Any other comments?

The work is in progress.

ogrisel · 2021-11-04T11:22:37Z

I have the feeling that this test cannot really be easily and significantly be sped-up without altering it too much.

I think it's an important test so I think we can leave it as is. The fact that it is quite sensitive to the seed of the dataset random generation procedure is bad but I don't see any easy what to improve this.

ogrisel · 2021-11-04T11:56:53Z

Unless @lorentzenchr has a better idea.

lorentzenchr · 2021-11-04T16:27:15Z

I consider test_asymmetric_error as one of the most important tests for the quantile regression. Is it slow because of the fitor because of the minimize(..., method="Nelder-Mead")?
If it is the former, we could use skip_if with scipy >= 1.15 and use solver="highs". That should give a good speedup.

ogrisel · 2021-11-05T09:31:46Z

I consider test_asymmetric_error as one of the most important tests for the quantile regression.

I agree.

Is it slow because of the fitor because of the minimize(..., method="Nelder-Mead")?

Note sure. @simonandras would you be interested in timing each part of the test to know their relative durations?

If it is the former, we could use skip_if with scipy >= 1.15 and use solver="highs". That should give a good speedup.

You mean skip if scipy < 1.15, right? I think that's an interesting idea.

simonandras · 2021-11-05T14:36:03Z

Using the interior-point in fit:

the first fit (with 0 alpha) runs in 2.08 seconds
the second fit with (0.01 alpha) runs in 2.63 seconds
the minimize function with the Nelder-Mead method runs in 0.12 seconds

So the fits are the slow part. If we use the highs solver there:

the first fit (with 0 alpha) runs in 0.05 seconds
the second fit with (0.01 alpha) runs in 0.05 seconds

It is much faster now but unfortunately it fails one assertion in line 169:
assert_allclose(np.mean(model.predict(X) > y), quantile).
The default rtol of assert_allclose is 1e-7, but the difference is 1e-3. The other 6 assertions are ok.

(All of the 3 quantile cases behaves similarily with the runtime and with the failing one assertion. The above timing was made on the 0.2 quantile case.)

ogrisel · 2021-11-05T15:20:37Z

It is much faster now but unfortunately it fails one assertion in line 169:
assert_allclose(np.mean(model.predict(X) > y), quantile). The default rtol of assert_allclose is 1e-7, but the difference is 1e-3.

I think we can relax the rtol.

lorentzenchr · 2021-11-05T15:33:16Z

It is much faster now but unfortunately it fails one assertion in line 169:
assert_allclose(np.mean(model.predict(X) > y), quantile). The default rtol of assert_allclose is 1e-7, but the difference is 1e-3.

I think we can relax the rtol.

Maybe, we can first improve the fit precision via solver_options (have a look at the scipy docs for highs) and then relax rtol the missing amount. 1e-3 is not really tight.

simonandras · 2021-11-05T16:03:16Z

A cheap solution but if we set the random seed to 2 then it works without modification. What do you think?

simonandras · 2021-11-05T16:10:36Z

It is much faster now but unfortunately it fails one assertion in line 169:
assert_allclose(np.mean(model.predict(X) > y), quantile). The default rtol of assert_allclose is 1e-7, but the difference is 1e-3.

I think we can relax the rtol.

Maybe, we can first improve the fit precision via solver_options (have a look at the scipy docs for highs) and then relax rtol the missing amount. 1e-3 is not really tight.

In case of highs i think it is the ipm_optimality_tolerance parameter. Unfortunately even if i set it to the minimum it gives the same result.

simonandras · 2021-11-06T15:57:42Z

A cheap solution but if we set the random seed to 2 then it works without modification. What do you think?

I correct myself here: actually there is no other seed that is good with the current error tolerance in both the 3 quantile case. I think we have to change the tolerance of that one failing assertion if we want to use the highs solver.

We have already an assertion like that in the alpha=0.01 case, where the rtol is even larger (we can actually decrease that now and some other rtols too).

ogrisel · 2021-12-06T17:29:00Z

I think as a general rule, we should use tolerance levels that works with many arbitrary seeds. If the test relies on seed cherry-picking, it will be too brittle and can randomly fail in the future on different platforms or on upgrades of dependencies such as numpy/scipy/openblas.

sklearn/linear_model/tests/test_quantile.py

ogrisel

I think the approach to use the highs solver is the right way to tackle the performance issue of this test. Here is a few more feedback to make this test more stable and less likely to fail arbitrarily in the future.

sklearn/linear_model/tests/test_quantile.py

…to quantile_regression_test_speedup

simonandras · 2021-12-08T19:59:56Z

So we should use atol 1e-2 where it is enough and use larger, may seed dependent atol where it is needed? @ogrisel @lorentzenchr

sklearn/linear_model/tests/test_quantile.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

ogrisel · 2021-12-09T14:18:33Z

Let me push the suggestions of the code review to see if everything is good now.

ogrisel · 2021-12-09T14:20:50Z

So we should use atol 1e-2 where it is enough and use larger, may seed dependent atol where it is needed? @ogrisel @lorentzenchr

I pushed a commit with the switch to atol=1e-2 for the quantile related assertions because 1% tolerance on a quantile values (between 0 and 1) is easy to interpret. For the assertions on coef and intercept I left them the way they were (with rtol) as I have no intuition of what is a good threshold for those quantities.

ogrisel

LGTM! Thanks for the contrib @simonandras!

lorentzenchr

LGTM with one nitpick.

sklearn/linear_model/tests/test_quantile.py

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

…metric_error (scikit-learn#21546)

…metric_error (#21546)

…metric_error (scikit-learn#21546)

WIP

1aca668

github-actions bot added the module:linear_model label Nov 3, 2021

ogrisel mentioned this pull request Nov 4, 2021

Meta-issue: accelerate the slowest running tests #21407

Closed

24 tasks

Changing the solver to highs and set the rtol parameters to the minumum

bd7a8ab

simonandras force-pushed the quantile_regression_test_speedup branch from d97d11f to bd7a8ab Compare November 6, 2021 16:19

Changing rtol parameters to the default, where the minimum <= 1e-7

1254241

simonandras changed the title ~~[WIP] speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error~~ Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error Nov 7, 2021

simonandras changed the title ~~Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error~~ [MRG] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error Nov 12, 2021

ogrisel added the No Changelog Needed label Dec 6, 2021

lorentzenchr reviewed Dec 6, 2021

View reviewed changes

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved

ogrisel reviewed Dec 6, 2021

View reviewed changes

simonandras added 5 commits December 8, 2021 19:50

Merge branch 'main' of https://github.com/simonandras/scikit-learn in…

ec5bd42

…to quantile_regression_test_speedup

debug: np.min to np.max

487de2d

add comment about solver choice

8ca96cf

comment about solver

a7ed0ae

comment about solver

a279b89

simonandras force-pushed the quantile_regression_test_speedup branch from 0386307 to a279b89 Compare December 8, 2021 19:25

add inline comment instead of assert message

5526d46

simonandras changed the title ~~[MRG] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error~~ [WIP] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error Dec 8, 2021

lorentzenchr reviewed Dec 8, 2021

View reviewed changes

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved

Apply suggestions from code review

9a90be8

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

ogrisel approved these changes Dec 9, 2021

View reviewed changes

simonandras changed the title ~~[WIP] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error~~ [MRG] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error Dec 9, 2021

lorentzenchr approved these changes Dec 10, 2021

View reviewed changes

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved

simonandras and others added 4 commits December 10, 2021 11:09

Update sklearn/linear_model/tests/test_quantile.py

474275f

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

Merge branch 'scikit-learn:main' into quantile_regression_test_speedup

bf69dbe

Update test_quantile.py

139d93d

Update test_quantile.py

3dcded7

lorentzenchr changed the title ~~[MRG] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error~~ TST Speed up slow test linear_model/tests/test_quantile.py::test_asymmetric_error Dec 10, 2021

lorentzenchr merged commit d72bd02 into scikit-learn:main Dec 10, 2021

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021

TST Speed up slow test linear_model/tests/test_quantile.py::test_asym…

53b1984

…metric_error (scikit-learn#21546)

glemaitre pushed a commit that referenced this pull request Dec 25, 2021

TST Speed up slow test linear_model/tests/test_quantile.py::test_asym…

c381cef

…metric_error (#21546)

simonandras deleted the quantile_regression_test_speedup branch April 3, 2022 14:25

mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022

TST Speed up slow test linear_model/tests/test_quantile.py::test_asym…

3e7629d

…metric_error (scikit-learn#21546)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST Speed up slow test linear_model/tests/test_quantile.py::test_asymmetric_error #21546

TST Speed up slow test linear_model/tests/test_quantile.py::test_asymmetric_error #21546

simonandras commented Nov 3, 2021

ogrisel commented Nov 4, 2021

ogrisel commented Nov 4, 2021

lorentzenchr commented Nov 4, 2021

ogrisel commented Nov 5, 2021

simonandras commented Nov 5, 2021 •

edited

ogrisel commented Nov 5, 2021

lorentzenchr commented Nov 5, 2021

simonandras commented Nov 5, 2021

simonandras commented Nov 5, 2021 •

edited

simonandras commented Nov 6, 2021 •

edited

ogrisel commented Dec 6, 2021

ogrisel left a comment

simonandras commented Dec 8, 2021

ogrisel commented Dec 9, 2021

ogrisel commented Dec 9, 2021

ogrisel left a comment

lorentzenchr left a comment

TST Speed up slow test linear_model/tests/test_quantile.py::test_asymmetric_error #21546

TST Speed up slow test linear_model/tests/test_quantile.py::test_asymmetric_error #21546

Conversation

simonandras commented Nov 3, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

About the test so far:

Ideas so far:

Any other comments?

ogrisel commented Nov 4, 2021

ogrisel commented Nov 4, 2021

lorentzenchr commented Nov 4, 2021

ogrisel commented Nov 5, 2021

simonandras commented Nov 5, 2021 • edited

ogrisel commented Nov 5, 2021

lorentzenchr commented Nov 5, 2021

simonandras commented Nov 5, 2021

simonandras commented Nov 5, 2021 • edited

simonandras commented Nov 6, 2021 • edited

ogrisel commented Dec 6, 2021

ogrisel left a comment

Choose a reason for hiding this comment

simonandras commented Dec 8, 2021

ogrisel commented Dec 9, 2021

ogrisel commented Dec 9, 2021

ogrisel left a comment

Choose a reason for hiding this comment

lorentzenchr left a comment

Choose a reason for hiding this comment

simonandras commented Nov 5, 2021 •

edited

simonandras commented Nov 5, 2021 •

edited

simonandras commented Nov 6, 2021 •

edited