Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RiskAdjustedForwardPass #413

Merged
merged 3 commits into from Jun 6, 2021
Merged

Add RiskAdjustedForwardPass #413

merged 3 commits into from Jun 6, 2021

Conversation

odow
Copy link
Owner

@odow odow commented Jun 4, 2021

Adding this so I can experiment with the entropic model.

For risk-averse models, we want to sample bad trajectories more frequently. The main motivation is that if you're using a risk measure, you care about the tails. So we want a good policy in the tails, which means adding more cuts in the tails. But if we sample with the nominal distribution, then the tails aren't going to get many cuts!

The standard way around this is to use some sort of importance sampling on the forward pass (e.g., https://arxiv.org/pdf/1901.01302.pdf, https://arxiv.org/pdf/2001.06026.pdf, probably some others, msppy calls it "biased sampling").

This takes a different approach: we just periodically resample bad trajectories that we have already seen, sampled based on the risk-adjusted probability of the cumulative objective values (I don't know what it will do if you have a cyclic policy graph? Repeatedly sample the longest trajectories?)

This is potentially better than the importance sampling, because it refines things we actually care about (bad trajectories).

The importance sampling approach could be overly conservative, because it assumes at each time step that a bad thing is more likely to happen, when in reality it's unlikely to go bad-bad-bad (if this was true, you're modeling it wrong; use a Markovian policy graph). There is also evidence that resampling trajectories can be a good thing (http://www.optimization-online.org/DB_HTML/2021/05/8397.html).

The other reason not to do importance sampling is that it's hard to implement because the data-structures aren't setup to do it :(. We'd need a way for each node to track a single vector of probabilities, instead of node->realization pairs, and a way for the backward pass to communicate the risk-adjusted probabilities with the forward pass.

@odow odow merged commit f3a2d9e into master Jun 6, 2021
@odow odow deleted the od/RiskAdjustedForwardPass branch June 6, 2021 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant