-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas to improve multistage stochastic integer programs #246
Comments
@haoxiangyang89 #265 had some of the ideas we discussed the other day. |
@guyichen09 (cc @mortondp) a use-case for Bayesian Optimization/multi-armed bandit stuff: Each iteration of SDDP.jl, we have three choices for our duality handler
Each duality handler will lead to cuts that improve our lower bound by some amount, but take a different amount of time. You could normalize each iteration to a reward function of (change in objective / time taken). This is stochastic, and it changes over time. I want to write a |
Hi Oscar,
I have an idea for this BanditDuailty.
I think that the multi-armed bandit algorithm might be a good method for
this. We could use a modified version of Thompson Sampling. We would need
to modify the reward function to take into account both the change in
objective and time taken.
What do you think?
Best,
Guyi
…On Tue, Sep 7, 2021 at 2:56 PM Oscar Dowson ***@***.***> wrote:
@guyichen09 <https://github.com/guyichen09> (cc @mortondp
<https://github.com/mortondp>) a use-case for Bayesian
Optimization/multi-armed bandit stuff:
Each iteration of SDDP.jl, we have three choices for our duality handler
- ContinuousConicDuality (a.k.a. continuous benders)
- StrengthenedConicDuality (a.k.a. strengthened benders)
- LagrangianDuality (a.k.a. SDDiP)
Each duality handler will lead to cuts that improve our lower bound by
some amount, but take a different amount of time. You could normalize each
iteration to a reward function of (change in objective / time taken). This
is stochastic, and it changes over time.
I want to write a BanditDuality that intelligently picks which handler to
run over the iterations. Seems like this is pretty do-able?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#246 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/APSLDE4CZ6X35M6DDVH47QLUAZU4TANCNFSM4IPP3Y5A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Yeah my suggestion for the reward would be function reward(log::Vector{Log})
d_bound = abs(log[end].bound - log[end-1].bound)
dt = log[end].time - log[end-1].time
return d_bound / dt
end And the Line 938 in ad7a418
There's a little bit of plumbing to do, but not too much work. We should pass Line 472 in ad7a418
which gets updated here: SDDP.jl/src/plugins/duality_handlers.jl Lines 64 to 71 in ad7a418
here SDDP.jl/src/plugins/duality_handlers.jl Lines 131 to 136 in ad7a418
and here SDDP.jl/src/plugins/headers.jl Lines 185 to 192 in ad7a418
Then we can write a new method for It should probably start with a minimum of say, 10 iterations of |
We could use this bayesian learning in a few other contexts: Trajectory depth in cyclic graphsThe default sampling scheme for the forward pass has some tunable parameters: SDDP.jl/src/plugins/sampling_schemes.jl Lines 38 to 43 in 493b48e
In particular, for cyclic graphs, you might want to learn between A = InSampleMonteCarlo(
max_depth = length(model.nodes),
terminate_on_dummy_leaf = false,
)
B = InSampleMonteCarlo(
max_depth = 5 * length(model.nodes),
terminate_on_dummy_leaf = false,
)
C = InSampleMonteCarlo() Revisiting forward passesRather than just looping through forward passes: SDDP.jl/src/plugins/forward_passes.jl Lines 136 to 158 in 493b48e
we should learn which ones to revisit according to some score. So you look at the delta achieved by each forward pass, as well as the expected delta of sampling a new trajectory. And then pick the one that has the best. I think this is both computationally efficient, and helpful from a marketing aspect, because we can claim machine learning is being used to drive decision making within the algorithm. |
@guyichen09 here's a potentially open question: is there any work on learning when the reward is not-stationary? In our case, we know that the reward for each arm will tend towards 0 as it gets pulled. The history is also action-dependent, which means assumptions about the distribution of reward being independent of the algorithm are not satisfied. Some papers: |
I’m with you. Good idea.
When Daniel was looking at DRO / SDDP we looked at sampling forward paths according to the nominal distribution or according to the DRO distribution. In theory, the latter can get you in trouble with respect to not converging due to zero-probability branches. We did a convex combo / mixture. Lower bounds were significantly tighter when putting enough weight on DRO paths, meaning that I think your idea would make sense, at least according to some metrics. We also looked at out-of-sample quality of the resulting policy. That difference was less dramatic, but there may have been something there, too. Can’t remember for sure.
Dave
On Sep 18, 2021, at 12:38 AM, Oscar Dowson ***@***.******@***.***>> wrote:
We could use this bayesian learning in a few other contexts:
Trajectory depth in cyclic graphs
The default sampling scheme for the forward pass has some tunable parameters:
https://github.com/odow/SDDP.jl/blob/493b48e256a88cb1cc898497d01a4d3245d3ffd0/src/plugins/sampling_schemes.jl#L38-L43<https://urldefense.com/v3/__https://github.com/odow/SDDP.jl/blob/493b48e256a88cb1cc898497d01a4d3245d3ffd0/src/plugins/sampling_schemes.jl*L38-L43__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMONAhXHRw$>
In particular, for cyclic graphs, you might want to learn between
A = InSampleMonteCarlo(
max_depth = length(model.nodes),
terminate_on_dummy_leaf = false,
)
B = InSampleMonteCarlo(
max_depth = 5 * length(model.nodes),
terminate_on_dummy_leaf = false,
)
C = InSampleMonteCarlo()
Revisiting forward passes
Rather than just looping through forward passes:
https://github.com/odow/SDDP.jl/blob/493b48e256a88cb1cc898497d01a4d3245d3ffd0/src/plugins/forward_passes.jl#L136-L158<https://urldefense.com/v3/__https://github.com/odow/SDDP.jl/blob/493b48e256a88cb1cc898497d01a4d3245d3ffd0/src/plugins/forward_passes.jl*L136-L158__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMP2JH-UPA$>
we should learn which ones to revisit according to some score. So you look at the delta achieved by each forward pass, as well as the expected delta of sampling a new trajectory. And then pick the one that has the best.
I think this is both computationally efficient, and helpful from a marketing aspect, because we can claim machine learning is being used to drive decision making within the algorithm.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/odow/SDDP.jl/issues/246*issuecomment-922196850__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMP9GOiq6g$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJBB7ENN5YS5NFH32IGZF5LUCQQV7ANCNFSM4IPP3Y5A__;!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMNv02dObg$>.
Triage notifications on the go with GitHub Mobile for iOS<https://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMO1wGwTQA$> or Android<https://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMNQG9lXCw$>.
|
Hi Oscar,
I am not sure if there is any work about non-stationary rewards. During my
internship, I found an algorithm using Thompson Sampling for the
combinatorial multi-armed bandit problem. In this algorithm, the reward is
deterministic in each pull during an iteration; however, from entire
algorithm iteration to iteration, the reward is changing and tends to go to
zero. They use a Bernoulli random variable to account for that. The
specific algorithm is attached below.
[image: Screen Shot 2021-09-24 at 11.16.09 AM.png]
Line 9 - 10 is where they deal with the reward.
I think it is a possible option for what we are looking for. I will also
research more on this topic.
Best,
Guyi
…On Sun, Sep 19, 2021 at 11:24 AM mortondp ***@***.***> wrote:
I’m with you. Good idea.
When Daniel was looking at DRO / SDDP we looked at sampling forward paths
according to the nominal distribution or according to the DRO distribution.
In theory, the latter can get you in trouble with respect to not converging
due to zero-probability branches. We did a convex combo / mixture. Lower
bounds were significantly tighter when putting enough weight on DRO paths,
meaning that I think your idea would make sense, at least according to some
metrics. We also looked at out-of-sample quality of the resulting policy.
That difference was less dramatic, but there may have been something there,
too. Can’t remember for sure.
Dave
On Sep 18, 2021, at 12:38 AM, Oscar Dowson ***@***.******@***.***>> wrote:
We could use this bayesian learning in a few other contexts:
Trajectory depth in cyclic graphs
The default sampling scheme for the forward pass has some tunable
parameters:
https://github.com/odow/SDDP.jl/blob/493b48e256a88cb1cc898497d01a4d3245d3ffd0/src/plugins/sampling_schemes.jl#L38-L43
<
https://urldefense.com/v3/__https://github.com/odow/SDDP.jl/blob/493b48e256a88cb1cc898497d01a4d3245d3ffd0/src/plugins/sampling_schemes.jl*L38-L43__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMONAhXHRw$>
In particular, for cyclic graphs, you might want to learn between
A = InSampleMonteCarlo(
max_depth = length(model.nodes),
terminate_on_dummy_leaf = false,
)
B = InSampleMonteCarlo(
max_depth = 5 * length(model.nodes),
terminate_on_dummy_leaf = false,
)
C = InSampleMonteCarlo()
Revisiting forward passes
Rather than just looping through forward passes:
https://github.com/odow/SDDP.jl/blob/493b48e256a88cb1cc898497d01a4d3245d3ffd0/src/plugins/forward_passes.jl#L136-L158
<
https://urldefense.com/v3/__https://github.com/odow/SDDP.jl/blob/493b48e256a88cb1cc898497d01a4d3245d3ffd0/src/plugins/forward_passes.jl*L136-L158__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMP2JH-UPA$>
we should learn which ones to revisit according to some score. So you look
at the delta achieved by each forward pass, as well as the expected delta
of sampling a new trajectory. And then pick the one that has the best.
I think this is both computationally efficient, and helpful from a
marketing aspect, because we can claim machine learning is being used to
drive decision making within the algorithm.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<
https://urldefense.com/v3/__https://github.com/odow/SDDP.jl/issues/246*issuecomment-922196850__;Iw!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMP9GOiq6g$>,
or unsubscribe<
https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJBB7ENN5YS5NFH32IGZF5LUCQQV7ANCNFSM4IPP3Y5A__;!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMNv02dObg$>.
Triage notifications on the go with GitHub Mobile for iOS<
https://urldefense.com/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMO1wGwTQA$>
or Android<
https://urldefense.com/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!Dq0X2DkFhyF93HkjWTBQKhk!CmdXgj4D2n2FtwVS04NfPQw0OYHLBGX88OFuL6YlBabRpa4NuF-vNmRLDT4MwHTLxMNQG9lXCw$>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#246 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/APSLDE4A5GOLMVTIPJR56R3UCYFEZANCNFSM4IPP3Y5A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
A few updates for this:
SDDP.jl/src/plugins/duality_handlers.jl Lines 310 to 412 in 23fcaa9
SDDP.jl/src/plugins/local_improvement_search.jl Lines 23 to 130 in 23fcaa9
|
Closing because I think this is a dead end. The cost of Lagrangian duality just doesn't improve things. I've seen a few papers recently claiming to "do SDDiP" but they have mixed-integer-continuous state and control variables, and then sneakily say that "we just use the continuous dual, it might give a worse bound but it works well." And it does. So we don't need this. |
Motivation
The implementation of SDDiP in SDDP.jl is bad. If you use it, you will probably run into convergence issues! No need to open an issue, I'm aware of the problem. (If you have challenging multistage stochastic integer problems, please send them to me!)
At this point it isn't apparent whether SDDiP is a bad algorithm or a poor implementation in SDDP.jl.
My guess is that it is a bad algorithm almost results in complete enumeration for most problems, but in some cases we can do okay if we implement the following algorithmic improvements.
Clever switching between duality handlers
ContinuousConicDuality()
StrengthenendConicDuality()
StrengthenendConicDuality()
if the improvement was more than could be expected in the same time asContinuousConicDuality()
This should mean that it can be the default for all cases. LP has no performance loss. MIP can only be better, and we don't waste time with the strengthened solve if the problem is naturally integer, or with the Lagrangian solve if it takes too long to solve.
Bounded states
We need bounds on
x.in
to ensure the dual has a bounded solution. But it isn't obvious what or how to do this.Advanced Lagrangian solves
The current implementation of the cutting plane solve is very naive. From Jim's slides: https://www.ima.umn.edu/materials/2015-2016/ND8.1-12.16/25411/Luedtke-mip-decomp.pdf
Other
Done
Write a theory tutorial
I should write a tutorial for the documentation.
I find how SDDiP is presented as a concept in the literature confusing.
We have these subproblems:
and we're trying to find a feasible dual solution and corresponding dual objective value for the slope and intercept of the cut respectively.
You should probably be able to find a feasible solution with the default
ContinuousConicDuality
option as well. I changed how we handle integer variables in the forward pass during training.The text was updated successfully, but these errors were encountered: