Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom early stop callback #377

Closed
Kaltxi opened this issue Sep 2, 2023 · 12 comments
Closed

Custom early stop callback #377

Kaltxi opened this issue Sep 2, 2023 · 12 comments

Comments

@Kaltxi
Copy link

Kaltxi commented Sep 2, 2023

Issue description

I am trying to make a custom callback to be able to stop optimization asynchronously. Documentation says that the callback should return true to end the process, so I made the following callback:

class OptimizationEarlyStop {
public:
  template <typename OptimizerType, typename FunctionType, typename MatType>
  auto Evaluate(OptimizerType& /* optimizer */,
                FunctionType& /* function */,
                const MatType& /* coordinates */,
                const double /* objective */) -> bool {
    return shouldStop_;
  }

  void stop() {
    const std::lock_guard lock(stopMutex_);
    shouldStop_ = true;
  }

private:
  bool shouldStop_{false};
  std::mutex stopMutex_;
};

That said the callback doesn't seem to run. If I make the Evaluate return void the callback runs, but I can't return bool in this case.

Your environment

  • version of ensmallen: 2.19.0
  • operating system: Manjaro, 6.1.44-1-MANJARO kernel
  • compiler: clang 15.0.7
  • version of Armadillo: 12.4.1

Steps to reproduce

Defining the Evaluate() callback with bool return type makes callback unable to be run.

Expected behavior

I expect to be able to define bool Evaluate() and be able to stop optimization with it.

@rcurtin
Copy link
Member

rcurtin commented Sep 2, 2023

Hey there @Kaltxi, can you also provide the code that you are using to run the optimization? If it's a complete example that we can compile and run, it's way easier to debug. Thanks!

@Kaltxi
Copy link
Author

Kaltxi commented Sep 3, 2023

@rcurtin Hello! Here is a minimal example that minimizes a trivial function with a slow down in its Evaluate(). Optimization is launched via std::async and while it is running the callback is called.
ensmallen-early-return.zip

@zoq
Copy link
Member

zoq commented Sep 4, 2023

It looks like you never pass the callback to the Optimize function:

const double minimum = optimizer.Optimize(f, coefs, ens::Report{});

should be something like:

OptimizationEarlyStop early_stop;
const double minimum = optimizer.Optimize(f, coefs, ens::Report{}, early_stop);

@Kaltxi
Copy link
Author

Kaltxi commented Sep 4, 2023

@zoq Yeah, thats a typo, fixing it yields the same behavior. In any case the bool version of OptimizationEarlyStop::Evaluate is never called, when it's supposed to be called on every function evaluation.

@mlpack-bot
Copy link

mlpack-bot bot commented Oct 4, 2023

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! 👍

@mlpack-bot mlpack-bot bot added the s: stale label Oct 4, 2023
@zoq
Copy link
Member

zoq commented Oct 4, 2023

let's keep this open

@mlpack-bot mlpack-bot bot removed the s: stale label Oct 4, 2023
@SuvarshaChennareddy
Copy link
Contributor

Hello @Kaltxi,
I believe this is expected behavior as the Evaluate() callback method does not affect the optimization loop (at least with the DE optimizer). One dirty trick that I thought of (haven't tested it) that you could try incorporating is setting the value of tolerance to infinity (or a very large number) via Tolerance() . The optimization would immediately end once it breaks out of the population loop (in the DE optimizer). If you want to break out immediately, then try setting the populationSize attribute to 0 via the PopulationSize() method. You could then reset the attribute in the EndOptimization() callback method.

Please do correct me if I overlooked something.

@rcurtin
Copy link
Member

rcurtin commented Oct 14, 2023

I spent a little time digging into the situation and I think there is some confusion here. In fact, one example demonstrates the whole situation pretty clearly: https://www.ensmallen.org/docs.html#early-stopping-at-minimum-loss

Here, we have a custom callback example that returns true in EndEpoch() if the optimization should terminate early, but the signature of the function is void.

Throughout the codebase, sometimes we call callbacks and check their return value and terminate if it is true. Other times (DE is one of these cases), we don't. With a quick, trivial, and somewhat wrong grep I see 103 uses where we don't use the return value and 83 where we do.

I believe our original intention was to always check the callback result; see Section 3.2 of the technical report: Each callback may terminate the optimization via its bool return value, where true indicates that the optimization should be stopped.

So, I think what's happened here is just that the code has gotten out of sync with our original intentions. I think the solution here is to do a couple of things:

  • clarify the documentation to indicate that every callback function has the option of returning a bool, which, when true, stops optimization (in addition, the callback functions can return void too, which is just fine, no termination will occur)
  • audit all callback calls (just grep for Callback:: and check the usage) to make sure the return values are being checked
  • (extra credit) add tests for optimizers that ensure that the callback return value is used---this can be done with a custom callback that throws an exception if it is ever called a second time, and returns true always the first time
  • check that @Kaltxi's example works as expected after the changes :)

I'm happy to do this, although I may skip the extra credit 😄 if someone else wants to jump in and do it feel free---it'll be a little while until I get to it.

@Kaltxi I think @SuvarshaChennareddy has provided a usable workaround, but if you want to get your hands directly dirty, you can modify the DE optimizer in ensmallen_bits/de/de_impl.hpp such that any time Callback::Evaluate() is called, its return value is checked. You can replace, e.g., Callback::Evaluate(...) with terminate |= Callback::Evaluate(...) throughout, I think. Thanks again for the report. 👍

@conradsnicta
Copy link
Contributor

clarify the documentation to indicate that every callback function has the option of returning a bool

@rcurtin Rather than having an option, I recommend this to be mandatory. Otherwise inside ensmallen we'd need to explicitly handle two cases (one with void returned, and one with bool returned). This unnecessarily complicates the codebase, which in turn leads to higher maintenance burden, increased risk of bugs, etc, all of which are time wasters.

The mandatory requirement is a good way to enforce not paying the opportunity costs in terms of time. For example, having it mandatory will prevent problems like this issue from occurring again (eg. when someone implements a new optimiser).

The mandatory bool return is not an onerous requirement for user code. If the user doesn't want to take advantage of the functionality, their callback function can simply return false in all cases.

@zoq
Copy link
Member

zoq commented Nov 3, 2023

I agree with @conradsnicta happy to open a PR with the changes.

@DiscreteLogarithm
Copy link
Contributor

I've also encountered this issue (with another callback method). After looking through the code, I found out that the reason is that the return type of bool form of several callback methods is mistakenly specified as void (likely a copy-paste error). PR #382 should fix this.

When the code check if a bool form Evaluate method is defined in the callback, it checks the corresponding trait. When the signature is wrong, it is always assumed that the bool form method is absent, and therefore it's never called.

EvaluateConstraint, Gradient, and GradientConstraint are also affected by this.

@rcurtin
Copy link
Member

rcurtin commented Nov 26, 2023

With #382, #383, and #384 merged, the example here should work, so I'll go ahead and close the issue. @DiscreteLogarithm and @Kaltxi please feel free to reopen or open a new issue if anything is still wrong. Thanks for the report!

@rcurtin rcurtin closed this as completed Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants