Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States (ICML 2024)
Official implementation for the experiments in Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States (ICML 2024), based on the PyTorch and Torchdyn libraries.
Tested with Python 3.9.
- Install PyTorch from the official website (tested with version 1.11.0).
- The
requirements.txt
file includes additional dependencies, which can be installed via:
pip install -r requirements.txt
The following command runs experiments, in which a linear controller is trained via policy gradient over an underdetermined LQR problem, based on the given configuration file.
python lqr_experiments_plan_runner.py --plan_config_path <path_to_config_file>
For reproducing the experiments in the paper, run the above command while replacing <config_path>
with each of the configuration files below.
Experiment | Configuration File Path |
---|---|
Identity, random A, shift systems with horizon 5 (Figure 1) | control/experiment_plans/lqr_experiments_plan.json |
Identity, random A, shift systems with horizon 8 (Figure 3) | control/experiment_plans/lqr_longer_horizon_experiments_plan.json |
Random A,B,Q system with horizon 5 (Figure 4) | control/experiment_plans/lqr_rnd_Q_B_experiments_plan.json |
Additional Notes:
- A folder with log files and metrics will be automatically created under the directory specified by
outputs_dir
in the configuration file (default isoutputs/lqr
). In particular, at the end of each run asummary.json
file describing the results of the experiment will be created. - The different configuration options are documents in
common/experiment/fit_experiment_base.py
andcontrol/experiment/lqr_co_experiment.py
.
After running experiments using each of the configuration files specified above, the following command will print metrics aggregated across the different runs and produce plots similar to those included in the paper.
python lqr_experiments_experiments_results_helper.py
Plots will be saved by default in the directory outputs/plots
.
The following command runs experiments, in which a neural network controller is trained via policy gradient for solving the pendulum control problem, based on the given configuration file.
python pendulum_experiments_plan_runner.py --plan_config_path <path_to_config_file>
For reproducing the experiments in the paper, run the above command while replacing <config_path>
with each of the configuration files below.
Experiment | Configuration File Path |
---|---|
Policy gradient (Figure 1) | control/experiment_plans/pendulum_experiments_plan.json |
Non-extrapolating baseline (Figure 1) | control/experiment_plans/pendulum_no_ext_experiments_plan.json |
Fitting unseen initial states for estimating minimal cost over them (Figure 1) | control/experiment_plans/pendulum_fit_test_experiments_plan.json |
Additional Notes:
- A folder with log files and metrics will be automatically created under the directory specified by
outputs_dir
in the configuration file. In particular, at the end of each run asummary.json
file describing the results of the experiment will be created. - It is possible to use a GPU by adding an available gpu id to the
gpu_ids
field in the configuration file. - The different configuration options are documents in
common/experiment/fit_experiment_base.py
andcontrol/experiment/pendulum_co_experiment.py
.
The following command runs experiments, in which a neural network controller is trained via policy gradient for solving the quadcopter control problem, based on the given configuration file.
python quadcopter_experiments_plan_runner.py --plan_config_path <path_to_config_file>
For reproducing the experiments in the paper, run the above command while replacing <config_path>
with each of the configuration files below.
Experiment | Configuration File Path |
---|---|
Policy gradient (Figure 1) | control/experiment_plans/quadcopter_below_experiments_plan.json |
Non-extrapolating baseline (Figure 1) | control/experiment_plans/quadcopter_below_no_ext_experiments_plan.json |
Fitting unseen initial states for estimating minimal cost over them (Figure 1) | control/experiment_plans/quadcopter_below_fit_test_experiments_plan.json |
Policy gradient (Figure 7) | control/experiment_plans/quadcopter_below_add_experiments_plan.json |
Non-extrapolating baseline (Figure 7) | control/experiment_plans/quadcopter_below_add_no_ext_experiments_plan.json |
Fitting unseen initial states for estimating minimal cost over them (Figure 7) | control/experiment_plans/quadcopter_below_add_fit_test_experiments_plan.json |
Policy gradient (Figure 9) | control/experiment_plans/quadcopter_dist_experiments_plan.json |
Non-extrapolating baseline (Figure 9) | control/experiment_plans/quadcopter_dist_no_ext_experiments_plan.json |
Fitting unseen initial states for estimating minimal cost over them (Figure 9) | control/experiment_plans/quadcopter_dist_fit_test_experiments_plan.json |
Additional Notes:
- A folder with log files and metrics will be automatically created under the directory specified by
outputs_dir
in the configuration file. In particular, at the end of each run asummary.json
file describing the results of the experiment will be created. - It is possible to use a GPU by adding an available gpu id to the
gpu_ids
field in the configuration file. - The different configuration options are documents in
common/experiment/fit_experiment_base.py
andcontrol/experiment/quadcopter_co_experiment.py
.
After running experiments for the pendulum and quadcopter control problems, using each of the configuration files specified above, the following command will print metrics aggregated across the different runs and produce plots similar to those included in the paper.
python nonlinear_control_experiments_results_helper.py
Plots will be saved by default in the directory outputs/plots
.
For citing the paper you can use:
@inproceedings{razin2024implicit,
title={Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States},
author={Razin, Noam and Alexander, Yotam and Cohen-Karlik, Edo and Giryes, Raja and Globerson, Amir and Cohen, Nadav},
booktitle={International Conference on Machine Learning},
year={2024}
}