Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST Speed up slow test linear_model/tests/test_quantile.py::test_asymmetric_error #21546

Merged

Conversation

simonandras
Copy link
Contributor

Reference Issues/PRs

Towards #21407

What does this implement/fix? Explain your changes.

Currently the test with the 3 different settings runs in 6-7 seconds on my computer. The goal is to achieve less then 2-3 sec runtime (less then 1 sec / one quantile).

About the test so far:

The test_asymmetric_error function tests the expected behavior of the QuantileRegressor estimator on a generated data where the quantile is linear and known. The data is assymetric, which means that it consists of 2 parts which are different (see the plot).

Ideas so far:

The basic idea is to reduce the sample size of the test data (currently it is 1000). However if i do that the assertions will fail with the current error tolerance. Noticed that if i increase the data size to 1100 the tests will fail also. For first it looks wierd, and the error bounds seems for me arbitrary so far.

Any other comments?

The work is in progress.

quantile_regression_test_data_plot

@ogrisel
Copy link
Member

ogrisel commented Nov 4, 2021

I have the feeling that this test cannot really be easily and significantly be sped-up without altering it too much.

I think it's an important test so I think we can leave it as is. The fact that it is quite sensitive to the seed of the dataset random generation procedure is bad but I don't see any easy what to improve this.

@ogrisel
Copy link
Member

ogrisel commented Nov 4, 2021

Unless @lorentzenchr has a better idea.

@lorentzenchr
Copy link
Member

I consider test_asymmetric_error as one of the most important tests for the quantile regression. Is it slow because of the fitor because of the minimize(..., method="Nelder-Mead")?
If it is the former, we could use skip_if with scipy >= 1.15 and use solver="highs". That should give a good speedup.

@ogrisel
Copy link
Member

ogrisel commented Nov 5, 2021

I consider test_asymmetric_error as one of the most important tests for the quantile regression.

I agree.

Is it slow because of the fitor because of the minimize(..., method="Nelder-Mead")?

Note sure. @simonandras would you be interested in timing each part of the test to know their relative durations?

If it is the former, we could use skip_if with scipy >= 1.15 and use solver="highs". That should give a good speedup.

You mean skip if scipy < 1.15, right? I think that's an interesting idea.

@simonandras
Copy link
Contributor Author

simonandras commented Nov 5, 2021

Using the interior-point in fit:

  • the first fit (with 0 alpha) runs in 2.08 seconds
  • the second fit with (0.01 alpha) runs in 2.63 seconds
  • the minimize function with the Nelder-Mead method runs in 0.12 seconds

So the fits are the slow part. If we use the highs solver there:

  • the first fit (with 0 alpha) runs in 0.05 seconds
  • the second fit with (0.01 alpha) runs in 0.05 seconds

It is much faster now but unfortunately it fails one assertion in line 169:
assert_allclose(np.mean(model.predict(X) > y), quantile).
The default rtol of assert_allclose is 1e-7, but the difference is 1e-3. The other 6 assertions are ok.

(All of the 3 quantile cases behaves similarily with the runtime and with the failing one assertion. The above timing was made on the 0.2 quantile case.)

@ogrisel
Copy link
Member

ogrisel commented Nov 5, 2021

It is much faster now but unfortunately it fails one assertion in line 169:
assert_allclose(np.mean(model.predict(X) > y), quantile). The default rtol of assert_allclose is 1e-7, but the difference is 1e-3.

I think we can relax the rtol.

@lorentzenchr
Copy link
Member

It is much faster now but unfortunately it fails one assertion in line 169:
assert_allclose(np.mean(model.predict(X) > y), quantile). The default rtol of assert_allclose is 1e-7, but the difference is 1e-3.

I think we can relax the rtol.

Maybe, we can first improve the fit precision via solver_options (have a look at the scipy docs for highs) and then relax rtol the missing amount. 1e-3 is not really tight.

@simonandras
Copy link
Contributor Author

A cheap solution but if we set the random seed to 2 then it works without modification. What do you think?

@simonandras
Copy link
Contributor Author

simonandras commented Nov 5, 2021

It is much faster now but unfortunately it fails one assertion in line 169:
assert_allclose(np.mean(model.predict(X) > y), quantile). The default rtol of assert_allclose is 1e-7, but the difference is 1e-3.

I think we can relax the rtol.

Maybe, we can first improve the fit precision via solver_options (have a look at the scipy docs for highs) and then relax rtol the missing amount. 1e-3 is not really tight.

In case of highs i think it is the ipm_optimality_tolerance parameter. Unfortunately even if i set it to the minimum it gives the same result.

@simonandras
Copy link
Contributor Author

simonandras commented Nov 6, 2021

A cheap solution but if we set the random seed to 2 then it works without modification. What do you think?

I correct myself here: actually there is no other seed that is good with the current error tolerance in both the 3 quantile case. I think we have to change the tolerance of that one failing assertion if we want to use the highs solver.

We have already an assertion like that in the alpha=0.01 case, where the rtol is even larger (we can actually decrease that now and some other rtols too).

@simonandras simonandras force-pushed the quantile_regression_test_speedup branch from d97d11f to bd7a8ab Compare November 6, 2021 16:19
@simonandras simonandras changed the title [WIP] speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error Nov 7, 2021
@simonandras simonandras changed the title Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error [MRG] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error Nov 12, 2021
@ogrisel
Copy link
Member

ogrisel commented Dec 6, 2021

I think as a general rule, we should use tolerance levels that works with many arbitrary seeds. If the test relies on seed cherry-picking, it will be too brittle and can randomly fail in the future on different platforms or on upgrades of dependencies such as numpy/scipy/openblas.

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach to use the highs solver is the right way to tackle the performance issue of this test. Here is a few more feedback to make this test more stable and less likely to fail arbitrarily in the future.

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Show resolved Hide resolved
@simonandras simonandras force-pushed the quantile_regression_test_speedup branch from 0386307 to a279b89 Compare December 8, 2021 19:25
@simonandras simonandras changed the title [MRG] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error [WIP] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error Dec 8, 2021
@simonandras
Copy link
Contributor Author

So we should use atol 1e-2 where it is enough and use larger, may seed dependent atol where it is needed? @ogrisel @lorentzenchr

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
@ogrisel
Copy link
Member

ogrisel commented Dec 9, 2021

Let me push the suggestions of the code review to see if everything is good now.

@ogrisel
Copy link
Member

ogrisel commented Dec 9, 2021

So we should use atol 1e-2 where it is enough and use larger, may seed dependent atol where it is needed? @ogrisel @lorentzenchr

I pushed a commit with the switch to atol=1e-2 for the quantile related assertions because 1% tolerance on a quantile values (between 0 and 1) is easy to interpret. For the assertions on coef and intercept I left them the way they were (with rtol) as I have no intuition of what is a good threshold for those quantities.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the contrib @simonandras!

@simonandras simonandras changed the title [WIP] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error [MRG] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error Dec 9, 2021
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nitpick.

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
@lorentzenchr lorentzenchr changed the title [MRG] Speed up slow test: linear_model/tests/test_quantile.py::test_asymmetric_error TST Speed up slow test linear_model/tests/test_quantile.py::test_asymmetric_error Dec 10, 2021
@lorentzenchr lorentzenchr merged commit d72bd02 into scikit-learn:main Dec 10, 2021
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
glemaitre pushed a commit that referenced this pull request Dec 25, 2021
@simonandras simonandras deleted the quantile_regression_test_speedup branch April 3, 2022 14:25
mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants