-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JAXPlanner: discrete control problems #224
Comments
Your mileage may vary when using JaxPlanner on discrete control problems since we need to use approximate continuous relaxations of the discrete decisions and transitions for use with JaxPlanner's differentiation approach to planning. In some domains and instances, these relaxations simply do not provide useful gradient information for effective planning. My best guess is that the behavior you've observed is due to poor performance by JaxPlanner on this particular problem; however, if you believe you've found a bug in the JaxPlanner code, please let us know. FYI, coming soon: I remark that we have a Gurobi-based planner in development that is not as scalable as JaxPlanner, but which works better for some discrete action domains due to its use of mixed integer nonlinear programming optimization methods. GurobiPlan documentation is not yet released, but will be forthcoming. |
In case you haven't found it already, I believe we merged the initial version of the Gurobi planner a while ago, but this is yet to be documented. You can find a working example here (some bugs are likely expected at this early stage). The overall workflow is now the same as JaxPlanner and available in two flavors, straight-line and replanning style. On initial tests, it seems to be efficient for relatively short time horizon, so replanning is likely your only option. You could also consider increasing the required MipGap parameter to terminate earlier, unless you need strict optimality at each time step. If stochasticity is an issue, there is a bilevel version that we hope to integrate at some point into the main codebase, but this handles only Uniform and Gaussian r.v.s. currently (it is not difficult to add the ones you need) and scales poorly in the number of ground fluents. About the deficits of JaxPlanner, I concur with Scott here. In some problems, the no-op action specified in the RDDL document could in fact be optimal at some time steps, but IIRC it might not be present in the dictionary at that time step (e.g. Wildfire). |
It turns out that the main problem with the supply chain was the choice of non-fluents. This domain is difficult to code because the authors of the original paper never stated the values they used for the costs and the capacities. Please take a look at the updated version as well as the hyper-parameters which you could use to run the experiment. I will be closing this issue. |
Is it possible to use the JAXPlanner for discrete control problems?
When applying the introductory JAX tutorial to the SupplyChain problem, the empty dict is (almost) always output as the action during evaluation:
However, the action
produce
is never part of the actions during all steps.The text was updated successfully, but these errors were encountered: