Skip to content

Commit

Permalink
Fix lbfgs notebook errors
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 650634215
  • Loading branch information
vroulet authored and OptaxDev committed Jul 9, 2024
1 parent 068569a commit 3b4a482
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
3 changes: 0 additions & 3 deletions docs/gallery.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,9 +167,6 @@
<div class="sphx-glr-thumbnail-title">Character-level Transformer on Tiny Shakespeare.</div>
</div>

.. raw:: html

</div>

.. raw:: html

Expand Down
7 changes: 5 additions & 2 deletions examples/lbfgs.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@
"L-BFGS is a classical optimization method that uses past gradients and parameters informations to iteratively refine a solution to a minimization problem. In this notebook, we illustrate\n",
"1. how to use L-BFGS as a simple gradient transformation,\n",
"2. how to wrap L-BFGS in a solver, and how linesearches are incorporated,\n",
"3. how to debug the solver if needed,\n",
"3. how to use L-BFGS to train a medium scale network on CIFAR10."
"3. how to debug the solver if needed,\n"
]
},
{
Expand Down Expand Up @@ -52,13 +51,17 @@
"### What is L-BFGS?\n",
"\n",
"To solve a problem of the form\n",
"\n",
"$$\n",
"\\min_w f(w),\n",
"$$\n",
"\n",
"L-BFGS ([Limited memory Broyden–Fletcher–Goldfarb–Shanno algorithm](https://en.wikipedia.org/wiki/Limited-memory_BFGS)) makes steps of the form\n",
"\n",
"$$\n",
"w_{k+1} = w_k - \\eta_k P_k g_k,\n",
"$$\n",
"\n",
"where, at iteration $k$, $w_k$ are the parameters, $g_k = \\nabla f_k$ are the gradients, $\\eta_k$ is the stepsize, and $P_k$ is a *preconditioning* matrix, that is, a matrix that transforms the gradients to ease the optimization process.\n",
"\n",
"L-BFGS builds the preconditioning matrix $P_k$ as an approximation of the Hessian inverse $P_k \\approx \\nabla^2 f(w_k)^{-1}$ using past gradient and parameters information. Briefly, at iteration $k$, the previous preconditioning matrix $P_{k-1}$ is updated such that $P_k$ satisfies the secant condition $P_k(w_k-w_{k-1}) = g_k -g_{k-1}$. The original BFGS algorithm updates $P_k$ using all past information, the limited-memory variant only uses a fixed number of past parameters and gradients to build $P_k$. See [Nocedal and Wright, Numerical Optimization, 1999](https://www.math.uci.edu/~qnie/Publications/NumericalOptimization.pdf) or the [documentation](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_lbfgs) for more details on the implementation.\n"
Expand Down

0 comments on commit 3b4a482

Please sign in to comment.