-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix test stability #249
Fix test stability #249
Conversation
I'll fix the static analysis job. |
template<typename MatType = arma::mat> | ||
MatType GetInitialPoint() const { return MatType("-5.0; 5.0"); } | ||
MatType GetInitialPoint() const { return MatType("0.02; 0.02"); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see you changed some of the Initial points, to be closer to the final point, is that for performance reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah---they did not always seem to converge with the given initial points. So I took the starting points that were actually being used in adam_test.cpp
and used those for GetInitialPoint()
. I wrote the original values in a comment in case we ever want to change back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure but I guess the comment can be confusing for users, as they don't really have a reference where this comes from, we could just browse the commit history to check what the old value was?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point---I removed those comments in 0aeab3c. 👍
I noticed that the master build test fails because it does not properly run catch tests---it was still configured for Boost tests. I updated that in the Jenkins configuration; hopefully that helps. |
@mlpack-jenkins test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, looks like that this was quite some work, but if the end result is a more-stable test suite it was worth the time 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
Since this is in effect a large change, I suggest to bump the version to 2.16 (and adjust the corresponding version number in the JMLR cover letter as well). The version number doesn't have to be only for indicating changes/expansions to the optimisers. |
Occasionally, ensmallen tests fail randomly, because often the tests we run are non-deterministic. This can cause a bad user experience (and can confuse reviewers of journal articles for ensmallen too :)) so I spent some time refactoring ensmallen's test suite so that I was able to run it 1000 times with no failures at all.
A number of major changes were done during this process:
I added a
GetFinalPoint()
method to most of the functions inensmallen_bits/problems/
. I didn't do this for functions with multiple global minima. Maybe we could do that another time. I also addedGetFinalObjective()
.Some methods in
ensmallen_bits/problems/
used different initial points in the tests thanGetInitialPoint()
returned. So I updated the implementation ofGetInitialPoint()
to use what was used in the tests.I added a function called
FunctionTest()
totest_function_tools.hpp
, that expects an input optimizer and the type of a function, then usesGetInitialPoint()
andGetFinalPoint()
to run the optimization and check convergence. It also supports a number of trials---so for instances where the test often fails, you can set trials to, e.g., 3 and it will run the test up to 3 times.I made a standalone
LogisticRegressionFunctionTest()
utility intest_function_tools.hpp
, sinceFunctionTest()
doesn't work quite right for our logistic regression tests (as we are checking the accuracy, not the objective). This also supports multiple trials.Wherever possible, I refactored all of the tests to use
FunctionTest()
andLogisticRegressionFunctionTest()
, and set the number of trials to what I observed was necessary for no failures in 1000 runs.I removed tests for
SGDTestFunction
as much as possible. The reason for this is thatSGDTestFunction
is actually really not a good test function for SGD. In order to optimize it correctly, an optimizer must consider each of the three different objective functions it contains, but, when a batch size of 1 is used, the gradient of each individual function is extremely different! This means that lots of oscillations will be introduced for most SGD-like optimizers, and as a result these tests were very prone to failure. I found the logistic regression test to be more suitable in these instances. However, there are a couple tests that still useSGDTestFunction
that seem to work okay.Hopefully after this we will see very few reports of test failures! :) I suppose, I should think about how to set up an automatic job that will run the tests 1000 times each month or something like that.