Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Research idea] Stopping rules #178

Closed
odow opened this issue Oct 26, 2018 · 7 comments · Fixed by #598
Closed

[Research idea] Stopping rules #178

odow opened this issue Oct 26, 2018 · 7 comments · Fixed by #598

Comments

@odow
Copy link
Owner

odow commented Oct 26, 2018

Without an upper bound, the question of when to terminate SDDP-type algorithms is an open question in the literature. Thus, we need some heuristic for preemptively terminating the algorithm when the solution is ''good enough.'' We call these heuristics stopping rules. Choosing an appropriate stopping rule depends on the motivation for solving the multistage stochastic optimization problem. In our view, there are three possible motivations:

  1. to obtain the (risk-adjusted) expected cost of the policy, e.g., in order to compare the operational cost of different investment decisions;
  2. to obtain a policy in the primal space, e.g., a decision rule at each node for when to release water through a hydro-electric turbine; and
  3. to obtain a policy in the dual space, e.g., marginal water values in the hydro-thermal scheduling problem.

Currently, there are two main stopping rules in the literature:

  1. the statistical stopping rule: perform a Monte Carlo simulation and stop when the lower bound is within the confidence interval for the mean of the simulated objectives
  2. the bound stalling stopping rule: stop once the lower bound fails to improve by more than some epsilon for more than N iterations.

Both of these stopping rules target the first use case. (Although the statistical stopping rule implicitly targets the 2nd and 3rd motivations though the simulation.)

I propose a different stopping rule based on the "stability" of the primal (or dual) policy.

To test for convergence, periodically perform a Monte Carlo simulation of the policy. Then, for select primal or dual variables of interest, compute at each stage some quantiles (e.g. the 10, 25, 50, 75, and 90'th) of the distribution of the value of those variables.

Then, using some appropriate norm, compare those quantiles to the values calculated at the previous test for convergence. Terminate once this distance stabilizes.

As an example, here are five simulations from the infinite horizon POWDER model conducted after 100, 200, 300, 400, and 500 iterations. Notice how the policy begins to stabilise.

After 100 iterations
image

After 200 iterations
image

After 300 iterations
image

After 400 iterations
image

After 500 iterations
image

Existing literature

cc @dukduque potential paper?

This has been discussed before by Alexandre Street and co:

image

@odow odow transferred this issue from odow/Kokako.jl Feb 14, 2019
@odow
Copy link
Owner Author

odow commented May 13, 2019

Found this gem of a stopping rule in MSPPy (#208) and it's somewhat related

image

image

@odow odow mentioned this issue May 13, 2019
6 tasks
@aferragu
Copy link
Contributor

Reading the ideas in MSPPY that you mention, and since I already tweaked the Statistical stopping rule, several things come to mind:

  1. The MSPPY approach of relative gap makes sense for the statistical stopping rule, i.e. dividing by the current lower (upper) bound.

  2. Also, the one sided confidence interval is also relevant. With the current default value of z_score=1.96 the one sided confidence interval is 2.5% confident, one can relax that to z_score=1.64 (5% confidence) since now the rule is one-sided in SDDP.jl too (Statistical stopping rule #204)

  3. As for the stopping rule outlined above, I don't know how they compute that. They seem to perform M simulations and compare two policies. For this one has to have the older policy somewhat saved at the last convergence testing point in the algorithm, and then use it with the new simulations. It seems much more reasonable to only store the mean and std. deviation of the previous simulation step, and when testing, perform a new simulation step of M realizations with the current policy. If the confidence intervals overlap (or some similar criteria), then stop. This only requires storing only two values and not the whole policy between testing steps.

  4. Thinking about this maybe a more reasonable approach would be to use Wasserstein distance (aka "earth mover" distance), which is more related to what you outlined above. Perform M simulations at evaluation point a and store the values V_1,...V_M. Then continue the algorithm and at testing point b perform another round of simulations getting V'_1,...V'_M and compute the Wasserstein metric (a simple LP). I'm using Wasserstein here but a quantile related metric may also be useful.

@zidanessf
Copy link

Hi, is that possible to add a deterministic upper bound? The intuition is that, since the lower bound is the outer approximation of the value function, the upper bound can be calculated by the inner approximation (naturally). This paper discussed this approach. https://doi.org/10.1137/19M1258876

@odow
Copy link
Owner Author

odow commented Sep 30, 2021

to add a deterministic upper bound?

I have no plans to do so. Note that the formulation SDDP.jl uses is non-standard in the literature. We can handle integer variables, non-convex subproblems, and we have objective and belief state variables.

A much more problematic issue is that the inner approximation requires significantly more computational effort. So it doesn't really make sense to use it. I think the stopping rule outlined above more closely matches what people expect.

p.s. I see you have https://github.com/zidanessf/RDDP.jl. I opened a couple of issues. Happy to chat about anything in more detail

@zidanessf
Copy link

I think the inner approximation in this form has little computational effort.
min f_t(x) + Σλ_kV_k + penalty * (x⁺ + x⁻)
s.t x ∈ X
Σλ_k
x_k + x⁺ - x⁻ = x
where (x_k,V_k) are sample points
Besides, the inner approximation should converge to the lagrangian relaxation of an nonconvex function, so it's probably suitable for the SDDiP, but not sure for the case with objective and belief state variables. However, in my experiment with the upper bound set as inner approximation and the lower bound generated by lagrangian cut, the bounds converges very slowly. Maybe it's the lagrangian cut's fault xD.

@odow
Copy link
Owner Author

odow commented Oct 1, 2021

A few points on the computational effort:

  • You're right that we could apply the upper bound in a limited subset of cases. But my experience is that leads to confusion when people implement models and things don't work.
  • There's a trade off: should I compute more lower bounding cuts or an upper bound? For every minute spent on the upper bound, you could have a better policy
  • If you have a good upper bound, what would you do with it? Most people just want a good policy.
  • Does a small gap translate into a good policy?

The last point is pretty subtle: it often happens that the gap can be very small, but if you look at simulations of the policy the performance is great for 98% of the simulation runs, and terrible in the 2% (i.e., you haven't added cuts along those trajectories).

Because those bad trajectories only account for 2% of the total, and they might not be very terrible, their total contribution to the objective function may be very small. Thus, you can have a small gap even with visibly poor trajectories!

The stopping rule about attempts to quantify this: you want to stop when things "look" good, regardless of the exact quantitative performance of the policy.

Maybe it's the lagrangian cut's fault

O boy yes, the Lagrangian cuts are tricky to get right. I've had a few goes with little progress.

Maybe I need to revisit the upper bounds. I know @frapac did the original experiments here: https://github.com/frapac/DualSDDP.jl

@odow
Copy link
Owner Author

odow commented Oct 5, 2021

@zidanessf: I just spoke to some folks how are hoping to have a postdoc work on SDDP.jl. They mentioned the same upper bound :) I now understand how it works much better.

If they get the funding, we should have more to report on this front.

They also had some good ideas for how to use the upper bound in sampling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants