+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **Paper** | Addressing Function Approximation Error in Actor-Critic Methods :cite:`Fujimoto2018AddressingFA` |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **Framework(s)** | .. figure:: ./images/tf.png | .. figure:: ./images/pytorch.png |
| | :scale: 20% | :scale: 10% |
| | :class: no-scaled-link | :class: no-scaled-link |
| | | |
| | TensorFlow | PyTorch |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **API Reference** | `garage.tf.algos.TD3 <https://garage.readthedocs.io/en/latest/_autoapi/garage/tf/algos/index.html#garage.tf.algos.TD3>`_ | |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **Code** | `garage/tf/algos/td3.py <https://github.com/rlworkgroup/garage/blob/master/src/garage/tf/algos/td3.py>`_ | |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **Examples** | :ref:`td3_pendulum_tf` | |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| **Benchmarks** | :ref:`td3_garage_tf` | |
+-------------------+--------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
Twin Delayed Deep Deterministic (TD3) is an alogrithm motivated by Double Q-learning and built by taking the minimum value between two critic networks to prevent the overestimation of the value function. Garage's implementation is based on the paper's approach, which includes clipped Double Q-learning, delayed update of target and policy networks as well as target policy smoothing.
target_update_tau=0.01,
policy_lr=1e-4,
qf_lr=1e-3,
discount=0.99,
exploration_policy_sigma=0.2,
exploration_policy_clip=0.5,
actor_update_period=2,
.. literalinclude:: ../../examples/tf/td3_pendulum.py
.. literalinclude:: ../../benchmarks/src/garage_benchmarks/experiments/algos/td3_garage_tf.py
.. bibliography:: references.bib
:style: unsrt
:filter: docname in docnames
This page was authored by Iris Liu (@irisliucy).