Fix test stability #249

rcurtin · 2021-02-07T17:45:15Z

Occasionally, ensmallen tests fail randomly, because often the tests we run are non-deterministic. This can cause a bad user experience (and can confuse reviewers of journal articles for ensmallen too :)) so I spent some time refactoring ensmallen's test suite so that I was able to run it 1000 times with no failures at all.

A number of major changes were done during this process:

I added a GetFinalPoint() method to most of the functions in ensmallen_bits/problems/. I didn't do this for functions with multiple global minima. Maybe we could do that another time. I also added GetFinalObjective().
Some methods in ensmallen_bits/problems/ used different initial points in the tests than GetInitialPoint() returned. So I updated the implementation of GetInitialPoint() to use what was used in the tests.
I added a function called FunctionTest() to test_function_tools.hpp, that expects an input optimizer and the type of a function, then uses GetInitialPoint() and GetFinalPoint() to run the optimization and check convergence. It also supports a number of trials---so for instances where the test often fails, you can set trials to, e.g., 3 and it will run the test up to 3 times.
I made a standalone LogisticRegressionFunctionTest() utility in test_function_tools.hpp, since FunctionTest() doesn't work quite right for our logistic regression tests (as we are checking the accuracy, not the objective). This also supports multiple trials.
Wherever possible, I refactored all of the tests to use FunctionTest() and LogisticRegressionFunctionTest(), and set the number of trials to what I observed was necessary for no failures in 1000 runs.
I removed tests for SGDTestFunction as much as possible. The reason for this is that SGDTestFunction is actually really not a good test function for SGD. In order to optimize it correctly, an optimizer must consider each of the three different objective functions it contains, but, when a batch size of 1 is used, the gradient of each individual function is extremely different! This means that lots of oscillations will be introduced for most SGD-like optimizers, and as a result these tests were very prone to failure. I found the logistic regression test to be more suitable in these instances. However, there are a couple tests that still use SGDTestFunction that seem to work okay.

Hopefully after this we will see very few reports of test failures! :) I suppose, I should think about how to set up an automatic job that will run the tests 1000 times each month or something like that.

zoq · 2021-02-07T20:46:36Z

I'll fix the static analysis job.

zoq · 2021-02-07T20:52:02Z

include/ensmallen_bits/problems/ackley_function.hpp

  template<typename MatType = arma::mat>
-  MatType GetInitialPoint() const { return MatType("-5.0; 5.0"); }
+  MatType GetInitialPoint() const { return MatType("0.02; 0.02"); }


I can see you changed some of the Initial points, to be closer to the final point, is that for performance reasons?

Yeah---they did not always seem to converge with the given initial points. So I took the starting points that were actually being used in adam_test.cpp and used those for GetInitialPoint(). I wrote the original values in a comment in case we ever want to change back.

Not sure but I guess the comment can be confusing for users, as they don't really have a reference where this comes from, we could just browse the commit history to check what the old value was?

Fair point---I removed those comments in 0aeab3c. 👍

rcurtin · 2021-02-09T21:40:55Z

I noticed that the master build test fails because it does not properly run catch tests---it was still configured for Boost tests. I updated that in the Jenkins configuration; hopefully that helps.

rcurtin · 2021-02-09T21:41:03Z

@mlpack-jenkins test this please

zoq

Wow, looks like that this was quite some work, but if the end result is a more-stable test suite it was worth the time 👍

mlpack-bot

Second approval provided automatically after 24 hours. 👍

conradsnicta · 2021-02-11T08:34:21Z

Since this is in effect a large change, I suggest to bump the version to 2.16 (and adjust the corresponding version number in the JMLR cover letter as well).

The version number doesn't have to be only for indicating changes/expansions to the optimisers.
It can also be used for general marketing, to indicate other useful changes.
(ie. people like shiny new things with bigger version numbers).

Refactor all tests so that they run for 1k trials without any failures.

4604880

rcurtin added c: tests t: bugfix labels Feb 7, 2021

rcurtin added 2 commits February 7, 2021 12:45

Merge remote-tracking branch 'origin/master' into test-stability

c3af066

Update history.

a1d9efb

zoq reviewed Feb 7, 2021

View reviewed changes

Remove possibly confusing comments.

0aeab3c

zoq approved these changes Feb 9, 2021

View reviewed changes

mlpack-bot bot approved these changes Feb 11, 2021

View reviewed changes

rcurtin merged commit 19a995e into mlpack:master Feb 11, 2021

rcurtin deleted the test-stability branch February 11, 2021 23:01

rcurtin mentioned this pull request Feb 11, 2021

Release version 2.16.0: "Severely Dented Can Of Polyurethane" #251

Merged

This was referenced Feb 21, 2021

Fix compilation issue when ENS_USE_OPENMP is specified. #255

Merged

AdaBelief - Adapting Stepsizes by the Belief in Observed Gradients #233

Merged

rcurtin mentioned this pull request Mar 16, 2021

Bounds of multiobjective problems should be handled automatically. #272

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test stability #249

Fix test stability #249

rcurtin commented Feb 7, 2021

zoq commented Feb 7, 2021

zoq Feb 7, 2021

rcurtin Feb 7, 2021

zoq Feb 9, 2021

rcurtin Feb 9, 2021

rcurtin commented Feb 9, 2021

rcurtin commented Feb 9, 2021

zoq left a comment

mlpack-bot bot left a comment

conradsnicta commented Feb 11, 2021 •

edited

Loading

Fix test stability #249

Fix test stability #249

Conversation

rcurtin commented Feb 7, 2021

zoq commented Feb 7, 2021

zoq Feb 7, 2021

Choose a reason for hiding this comment

rcurtin Feb 7, 2021

Choose a reason for hiding this comment

zoq Feb 9, 2021

Choose a reason for hiding this comment

rcurtin Feb 9, 2021

Choose a reason for hiding this comment

rcurtin commented Feb 9, 2021

rcurtin commented Feb 9, 2021

zoq left a comment

Choose a reason for hiding this comment

mlpack-bot bot left a comment

Choose a reason for hiding this comment

conradsnicta commented Feb 11, 2021 • edited Loading

conradsnicta commented Feb 11, 2021 •

edited

Loading