Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST: fix CI failures in test_all_nograd_minimizers #20699

Merged
merged 4 commits into from
May 16, 2024

Conversation

dschmitz89
Copy link
Contributor

Reference issue

#19994 (comment)

What does this implement/fix?

Increase test timeout to 40 seconds

@dschmitz89 dschmitz89 requested a review from mdhaber May 11, 2024 20:20
@dschmitz89 dschmitz89 requested a review from andyfaff as a code owner May 11, 2024 20:20
@github-actions github-actions bot added scipy.optimize CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure maintenance Items related to regular maintenance tasks labels May 11, 2024
@mdhaber
Copy link
Contributor

mdhaber commented May 11, 2024

Thanks! This would work, but the test took only 3s before adding COBYQA, and now it takes over 20s. Is it that slow locally? It might be worth a look at what's going on.

@@ -212,6 +212,7 @@ def test_all_minimizers(self):
niter=self.niter, disp=self.disp)
assert_almost_equal(res.x, self.sol[i], self.tol)

@pytest.mark.fail_slow(40)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, the test runs this fast locally on an i9-13900K:

5.13s call     build-install/lib/python3.11/site-packages/scipy/optimize/tests/test__basinhopping.py::TestBasinHopping::test_all_nograd_minimizers

If I do Matt's suggested experiment of parametrization, the timings drop off pretty fast after that new method.

4.43s call     build-install/lib/python3.11/site-packages/scipy/optimize/tests/test__basinhopping.py::TestBasinHopping::test_all_nograd_minimizers[COBYQA]
0.15s call     build-install/lib/python3.11/site-packages/scipy/optimize/tests/test__basinhopping.py::TestBasinHopping::test_all_nograd_minimizers[COBYLA]
0.13s call     build-install/lib/python3.11/site-packages/scipy/optimize/tests/test__basinhopping.py::TestBasinHopping::test_all_nograd_minimizers[CG]
0.08s call     build-install/lib/python3.11/site-packages/scipy/optimize/tests/test__basinhopping.py::TestBasinHopping::test_all_nograd_minimizers[TNC]
0.07s call     build-install/lib/python3.11/site-packages/scipy/optimize/tests/test__basinhopping.py::TestBasinHopping::test_all_nograd_minimizers[BFGS]

If you did parametrize, I believe you could use pytest.param to apply the timing exception or another marker to just that one test case, so that other regressions don't sneak in. If you apply xslow, then seems like it would only make sense to apply it to the one case that way.

Copy link
Contributor

@mdhaber mdhaber May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tolerance to pass the test is quite coarse, though. Can the termination settings be adjusted to avoid unnecessary iterations? Or why else is the new method so much slower than the others on this problem?

@andyfaff
Copy link
Contributor

I wonder why cobyqa is taking much longer than the others? @ragonneau, is there something about the architecture of the minimiser that leads to longer runtime?

@mdhaber
Copy link
Contributor

mdhaber commented May 14, 2024

To get this passing, how about niter = 10 if method == 'COBYQA' else self.niter? (and specify niter=niter in the call to basinhopping)? Locally, the test still passes, and it's takes ~1/10 the time.

But it would also be good to see why the wall clock time is so high.

@dschmitz89
Copy link
Contributor Author

The objective function used in that tests might just be a pathological edge case for cobyqa. Limiting the number of iterations sounds good to me.

@andyfaff
Copy link
Contributor

Looks like the latest commit didn't improve the situation.

I'm not sure I like having a test fail in CI because it takes too long. It seems like a recipe for flaky CI performance. Performance of CI runners can be variable sometimes, so we might get random fails. If the aim is to reduce time used by CI we'd be better off working on other areas.

Lastly, it'd be nice to know why cobyqa takes so long for this specific test.

@andyfaff
Copy link
Contributor

I'm not sure that rosen could be considered pathological, I would regard it as being well behaved. From some preliminary profiling the time spent evaluating the objective function is a small percentage, my impression is that there's a lot of overhead/inefficiencies in the cobyqa code.

e.g.

  • merit method calculates everything in this if block, even if there are no constraints or bounds. These should be shortcut if there are no constraints or bounds, and they should all be cached for a given x if they've already been calculated in that position.
  • that merit method is called repeatedly from Framework.set_best_index. It's called in a loop which makes inefficiencies in merit worse.
  • elsewhere in the Framework.set_best_index method there are repeated calls to [Problem.maxcv]. There are no shortcuts when there are no bounds, no linear constraints, or no nonlinearconstraints, and the computation time for Problem.maxcv is non negligible because there are various computations being carried out that don't need to be.

If I do a naive patch to skip the code that has no effect (because I'm doing an N=10 rosen with no bounds, constraints), then I can speed up by ~25%, and these are the ones I found from a quick look.

[skip cirrus] [skip circle]
@dschmitz89
Copy link
Contributor Author

Improving COBYQA's performance would be the best option but take a longer time. Until then we should fix this annoying failure still.

@mdhaber mdhaber merged commit f2d4775 into scipy:main May 16, 2024
29 of 30 checks passed
@dschmitz89 dschmitz89 deleted the fix_nograd_minimizer_test branch May 16, 2024 04:36
@mdhaber
Copy link
Contributor

mdhaber commented May 16, 2024

The iteration reduction brought the duration from over 20s to 5s - but just a hair over 5s, which is why the test still failed. I increased the time limit to 10s on this test to fix CI, but yeah, I think it's worth looking deeper into why it's so slow.

Having slow tests fail is an experiment. Let's let it run for a while. This failure came up in main only because CI didn't run on the COBYQA PR after the pytest_fail_slow PR merged. In the future, the idea is that these sort of issues get fixed as part of PRs so they don't show up in main. Personally, I think pytest_fail_slow already found something valuable here. If CI variability causes too many intermittent failures, we can increase the default time limit or remove it entirely, but please bear with us while we assess whether the cure is worse than the disease here.

FWIW, I don't necessarily think that this is the best way to keep CI time down either. I just don't like it when my tests are in the spotlight for being too slow (e.g. gh-13859, gh-20420), so I thought I'd try to fix the problem once and for all for everyone rather than putting out fires every once in a while.

@ragonneau
Copy link
Contributor

Hi everyone,

Sorry for my very late reply, I am recently very busy with some personal matters. First of all, thank you everyone for looking into this matter. We designed COBYQA for solving DFO problems, which usually requires several minutes/hours/days to perform one function evaluation. Hence, we did not attempt to tune the linear algebra of COBYQA (although we did everything to avoid doing stupid calculations). I agree that the code can definitely be improved in terms of linear algebra (to fasten the execution when the objective function evaluations does not require a long time).

Concerning the Rosenbrock function, it is indeed a pathological case, as it was designed as such. Even with $n = 2$, its valley shape makes it hard for optimisers to find the solution. It is normal that a consequent number of function evaluations is required, especially when $n$ increases (recall that in DFO, high dimension can sometimes mean $n \approx 50$).

@andyfaff, I agree with what you said. This is a typical example of things that can definitely be improved in the source code of COBYQA. I see you opened on issue on that matter, I will take a look at it this weekend. Again, thank you very much for the help! This is much appreciated.

I will try to be more present for this matter,
Thank you again everyone for the help,
Tom.

@andyfaff
Copy link
Contributor

THanks for getting the suggested PR merged @ragonneau.

@ragonneau
Copy link
Contributor

THanks for getting the suggested PR merged @ragonneau.

It's my pleasure. Again, thank you very much for your great help, and sorry for the delay in the merge.

Cheers,
Tom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure maintenance Items related to regular maintenance tasks scipy.optimize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants