Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc][hackathon] To add Adadelta Optimizer to the documentation #63255

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
33 changes: 29 additions & 4 deletions torch/optim/adadelta.py
Expand Up @@ -5,9 +5,33 @@


class Adadelta(Optimizer):
"""Implements Adadelta algorithm.

It has been proposed in `ADADELTA: An Adaptive Learning Rate Method`__.
r"""Implements Adadelta algorithm.

.. math::
\begin{aligned}
&\rule{110mm}{0.4pt} \\
&\textbf{input} : \gamma \text{ (lr)}, \: \theta_0 \text{ (params)},
\: f(\theta) \text{ (objective)}, \: \rho \text{ (decay)},
\: \lambda \text{ (weight decay)} \\
&\textbf{initialize} : v_0 \leftarrow 0 \: \text{ (square avg)},
\: u_0 \leftarrow 0 \: \text{ (accumulate variables)} \\[-1.ex]
&\rule{110mm}{0.4pt} \\
&\textbf{for} \: t=1 \: \textbf{to} \: \ldots \: \textbf{do} \\
&\hspace{5mm}g_t \leftarrow \nabla_{\theta} f_t (\theta_{t-1}) \\
&\hspace{5mm}if \: \lambda \neq 0 \\
&\hspace{10mm} g_t \leftarrow g_t + \lambda \theta_{t-1} \\
&\hspace{5mm} v_t \leftarrow v_{t-1} \rho + g^2_t (1 - \rho) \\
&\hspace{5mm}\Delta x_t \leftarrow \frac{\sqrt{u_{t-1} +
\epsilon }}{ \sqrt{v_t + \epsilon} }g_t \hspace{21mm} \\
&\hspace{5mm} u_t \leftarrow u_{t-1} \rho +
\Delta x^2_t (1 - \rho) \\
&\hspace{5mm}\theta_t \leftarrow \theta_{t-1} - \gamma \Delta x_t \\
&\rule{110mm}{0.4pt} \\[-1.ex]
&\bf{return} \: \theta_t \\[-1.ex]
&\rule{110mm}{0.4pt} \\[-1.ex]
\end{aligned}

For further details regarding the algorithm we refer to `ADADELTA: An Adaptive Learning Rate Method`_.

Args:
params (iterable): iterable of parameters to optimize or dicts defining
Expand All @@ -20,7 +44,8 @@ class Adadelta(Optimizer):
to the parameters (default: 1.0)
weight_decay (float, optional): weight decay (L2 penalty) (default: 0)

__ https://arxiv.org/abs/1212.5701
.. _ADADELTA\: An Adaptive Learning Rate Method:
https://arxiv.org/abs/1212.5701
"""

def __init__(self, params, lr=1.0, rho=0.9, eps=1e-6, weight_decay=0):
Expand Down