Hindsight policy gradients
This software supplements the paper "Hindsight policy gradients".
The implementation focuses on clarity and flexibility rather than computational efficiency.
Training an agent in a bit flipping environment (k = 8) using a weighted per-decision hindsight policy gradient estimator (HPG):
python3 hpg/scripts/run.py hpg/examples/flipbit8/flipbit8_bs2_hpg
Training an agent in a bit flipping environment (k = 8) using a goal-conditional policy gradient estimator (GCPG):
python3 hpg/scripts/run.py hpg/examples/flipbit8/flipbit8_bs2_gcpg
Combining the corresponding results into a single plot (see folder "results/flipbit8_bs2"):
mkdir -p results/flipbit8_bs2 cp -r hpg/examples/flipbit8/flipbit8_bs2_hpg hpg/examples/flipbit8/flipbit8_bs2_gcpg results/flipbit8_bs2 python3 hpg/scripts/analysis.py results/flipbit8_bs2