This package is a minimal implementation of Continuous Value Iteration and Imaginary Experience Replay for a simple continuous state and action grid-world.
To run it for multiple seeds and configurations:
python train.py\
train_cycles=100\
seed=0,1,2\
augment.HER=false,true\
augment.IER=false,true\
--multirun
I am interested in investigating the role of experience expansion using model-based RL. Can we learn a transition model to generate rollouts and then apply imaginary/hindsight experience replay? Just like hindsight helps to generalize accross goals, imagination rollouts helps to visit states where no experience has gone before. As more real-experience is gathered, the quality of the transition model improves.
To-do list:
- Include a module for rollouts with a learned transition model (with a neural network)
- Compare with SAC
- Learn the value function with a neural network. Then use the learned differentiable value function to learn an explicit policy from gradients. This should be helpful when learning from high-dimensional input spaces.
- Learn the transition model using the Value Equivalence principle.
- Use experience replay with reward shaping.