-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Early stopping in the line searches to compute initial regulariser values #903
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good altogether. Just a few minor comments.
BOOST_REQUIRE_CLOSE(c1, prediction, 3.0); | ||
BOOST_REQUIRE_SMALL(c2, 0.25); | ||
BOOST_REQUIRE_SMALL(c3, 0.25); | ||
BOOST_REQUIRE_SMALL(c4, 0.25); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for catching the error in BOOST_REQUIRE_CLOSE
call. Why the values c2
, ... c4
became so much larger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These numbers are slightly unstable. The reason they were 0 before is that none of these features were selected at all for training because of the threshold on MIC. Once they get selected there is scope for them to be non-zero. Since these are still so much smaller than c1, 0.25 vs ~100, I figured it was ok to just allow some leeway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
#include <maths/CSampling.h> | ||
#include <maths/CTools.h> | ||
|
||
#include <boost/math/distributions/normal.hpp> | ||
#include <boost/optional/optional_io.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this include?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed to print out boost::optional
on this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably oversee something, by boost/optional.hpp
is already include in the header. in the Line 177 I don't see any use of optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header boost/optional.hpp
is include in the header file. Is it still required here? Line 177 doesn't use optional, afaics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line I've added expectedImprovement
is type boost::optional<double>
. This print state doesn't compile if I don't include optional_io.hpp
LOG_TRACE(<< "best = " << xmax.cwiseProduct(m_MaxBoundary - m_MinBoundary).transpose()
<< " EI(best) = " << expectedImprovement);
Unfortunately, the print functionality is not included by optional.hpp
.
minTestLoss.add(testLoss); | ||
testLosses.emplace_back(regularizer, testLoss); | ||
} | ||
while (testLosses.size() > 0 && testLosses.size() < MAX_LINE_SEARCH_ITERATIONS) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to check for the first condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This can happen in the loop which searches for the best downsample factor, i.e. applyRegularizer
can return false for all values in the loop
for (auto regularizer :
{intervalLeftEnd, (2.0 * intervalLeftEnd + intervalRightEnd) / 3.0,
(intervalLeftEnd + 2.0 * intervalRightEnd) / 3.0, intervalRightEnd}) {
...
In that case, we'll just use downsample factor of 1 and this actually SIGSEGVs when trying to compute maximum expected loss without this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
// This has the following steps: | ||
// 1. Coarse search the interval [intervalLeftEnd, intervalRightEnd] using | ||
// fixed steps, | ||
// 2. Fine tune, via Bayesian Optimisation targeting expected improvement, | ||
// and stop if the expected improvement small compared to the current | ||
// minimum test loss, | ||
// 3. Calculate the parameter interval which gives the lowest test losses, | ||
// 4. Fit an OLS quadratic approximation to the test losses in the interval | ||
// from step 3 and use it to estimate the best parameter value, | ||
// 5. Compare the size of the residual errors w.r.t. to the OLS curve from | ||
// step 4 with its variation over the interval from step 3 and truncate | ||
// the returned interval if we can determine there is a low chance of | ||
// missing the best solution by doing so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! 👍
This migrates to using BO for choosing points at which to evaluate the regularisers in the line searches we perform to find their initial values. Importantly, it also thresholds the minimum expected improvement to bother to continue with the line search to be at least 1% of the current test loss. This saves us up to 3 iterations in the loop (since we lower bound the minimum number of iterations we'll use) or around 40% of the cost of the line searches.