A Temporal-Difference Approach to Policy Gradient Estimation

Samuele Tosatto, Andrew Patterson, Martha White, A. Rupam Mahmood The international Conference of Machine Learning 2022 (Baltimore)

This repository contains the code to run the experiments contained in our submission to ICML 2022. The code runs with python3.6.

We suggest creating a conda environment, and to install HeRL with cd herl and pip install -e ..

The repository is organized as follow:

algorithms contain a simple version of the algorithms listed in the paper
environments contains the environments used in these experiments
experiments contains the scripts of the experiments
plots contains tikz figures (almost) ready to be inserted in the paper, and numpy files used to generate them
readme.md contains a description of this project.

How to replicate results in the paper

Figure 1b

cd experiments
python3 lambda_lstd_gamma_imani_bias.py
cd ../plots/imani/lambda_lstd_gamma_bias/plot.py

Figure 1c, 1b, (and Figure 4 in Appendix)

cd experiments
python3 lambda_lstd_gamma_imani_gradient.py
cd ../plots/imani/lambda_lstd_gamma_gradient/plot.py
cd ../plots/imani/lambda_lstd_gamma_gradient/scatter.py

Figure 2a

cd experiments
python3 lambda_lstd_gamma_imani_last_performance.py
cd ../plots/imani/lambda_lstd_gamma_last/plot.py

Figure 2b

cd experiments
python3 lambda_rc_gamma_learning.py
cd ../plots/imani/lambda_lstd_gamma_learning/plot.py

Figure 2c and d

cd parallel_experiments
sh mdps_learning.sh
cd ../plots/mdps/learning/plot.py

For Figure 3 in Appendix, refer to Andrew's Repository.

Suggestion: to render progress bars, enable emulate in console if using pycharm.

If you find our research interesting, cite us:

@inproceedings{tosatto_temporal_2022,
        title = {A {Temporal}-{Difference} {Approach} to {Policy} {Gradient} {Estimation}},
        author = {Tosatto, Samuele and Patterson, Andrew and White, Martha and Mahmood, Rupam},
        booktitle = {Proceedings of the 39th {International} {Conference} on {Machine} {Learning}}
        year = {2022},
        note = {event-place: Baltimore, Meryland, United States of America},
        organization = {PMLR}
}

Thanks!

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
algorithms		algorithms
experiments		experiments
parallel_experiments		parallel_experiments
plots/imani		plots/imani
settings		settings
.gitignore		.gitignore
__init__.py		__init__.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algorithms

algorithms

experiments

experiments

parallel_experiments

parallel_experiments

plots/imani

plots/imani

settings

settings

.gitignore

.gitignore

init.py

init.py

readme.md

readme.md

Repository files navigation

A Temporal-Difference Approach to Policy Gradient Estimation

How to replicate results in the paper

About

Releases

Packages

Languages

SamuelePolimi/temporal-difference-gradient

Folders and files

Latest commit

History

Repository files navigation

A Temporal-Difference Approach to Policy Gradient Estimation

How to replicate results in the paper

About

Resources

Stars

Watchers

Forks

Languages