Skip to content

Latest commit

 

History

History
69 lines (53 loc) · 5.6 KB

algo_td3.md

File metadata and controls

69 lines (53 loc) · 5.6 KB

Twin Delayed Deep Deterministic (TD3)

+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **Paper**         | Addressing Function Approximation Error in Actor-Critic Methods :cite:`Fujimoto2018AddressingFA`                                                                                                                                      |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **Framework(s)**  | .. figure:: ./images/tf.png                                                                                              | .. figure:: ./images/pytorch.png                                                                           |
|                   |    :scale: 20%                                                                                                           |    :scale: 10%                                                                                             |
|                   |    :class: no-scaled-link                                                                                                |    :class: no-scaled-link                                                                                  |
|                   |                                                                                                                          |                                                                                                            |
|                   |    TensorFlow                                                                                                            |    PyTorch                                                                                                 |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **API Reference** | `garage.tf.algos.TD3 <https://garage.readthedocs.io/en/latest/_autoapi/garage/tf/algos/index.html#garage.tf.algos.TD3>`_ |                                                                                                            |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **Code**          | `garage/tf/algos/td3.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/td3.py>`_                 |                                                                                                            |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **Examples**      | :ref:`td3_pendulum_tf`                                                                                                   |                                                                                                            |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **Benchmarks**    | :ref:`td3_garage_tf`                                                                                                     |                                                                                                            |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+

Twin Delayed Deep Deterministic (TD3) is an alogrithm motivated by Double Q-learning and built by taking the minimum value between two critic networks to prevent the overestimation of the value function. Garage's implementation is based on the paper's approach, which includes clipped Double Q-learning, delayed update of target and policy networks as well as target policy smoothing.

Default Parameters

target_update_tau=0.01,
policy_lr=1e-4,
qf_lr=1e-3,
discount=0.99,
exploration_policy_sigma=0.2,
exploration_policy_clip=0.5,
actor_update_period=2,

Examples

td3_pendulum_tf

.. literalinclude:: ../../examples/tf/td3_pendulum.py

Benchmarks

Benchmarks Results

TD3 TF HalfCheetah-v2 TD3 TF Hopper-v2 TD3 TF InvertedDoublePendulum-v2 TD3 TF InvertedPendulum-v2 TD3 TF Swimmer-v2

td3_garage_tf

.. literalinclude:: ../../benchmarks/src/garage_benchmarks/experiments/algos/td3_garage_tf.py

References

.. bibliography:: references.bib
   :style: unsrt
   :filter: docname in docnames

This page was authored by Iris Liu (@irisliucy).