diff --git a/docs/gallery.rst b/docs/gallery.rst index e14bf99c..58a6d508 100644 --- a/docs/gallery.rst +++ b/docs/gallery.rst @@ -167,9 +167,6 @@
Character-level Transformer on Tiny Shakespeare.
-.. raw:: html - - .. raw:: html diff --git a/examples/lbfgs.ipynb b/examples/lbfgs.ipynb index ccb4ee2d..76073de7 100644 --- a/examples/lbfgs.ipynb +++ b/examples/lbfgs.ipynb @@ -11,8 +11,7 @@ "L-BFGS is a classical optimization method that uses past gradients and parameters informations to iteratively refine a solution to a minimization problem. In this notebook, we illustrate\n", "1. how to use L-BFGS as a simple gradient transformation,\n", "2. how to wrap L-BFGS in a solver, and how linesearches are incorporated,\n", - "3. how to debug the solver if needed,\n", - "3. how to use L-BFGS to train a medium scale network on CIFAR10." + "3. how to debug the solver if needed,\n" ] }, { @@ -52,13 +51,17 @@ "### What is L-BFGS?\n", "\n", "To solve a problem of the form\n", + "\n", "$$\n", "\\min_w f(w),\n", "$$\n", + "\n", "L-BFGS ([Limited memory Broyden–Fletcher–Goldfarb–Shanno algorithm](https://en.wikipedia.org/wiki/Limited-memory_BFGS)) makes steps of the form\n", + "\n", "$$\n", "w_{k+1} = w_k - \\eta_k P_k g_k,\n", "$$\n", + "\n", "where, at iteration $k$, $w_k$ are the parameters, $g_k = \\nabla f_k$ are the gradients, $\\eta_k$ is the stepsize, and $P_k$ is a *preconditioning* matrix, that is, a matrix that transforms the gradients to ease the optimization process.\n", "\n", "L-BFGS builds the preconditioning matrix $P_k$ as an approximation of the Hessian inverse $P_k \\approx \\nabla^2 f(w_k)^{-1}$ using past gradient and parameters information. Briefly, at iteration $k$, the previous preconditioning matrix $P_{k-1}$ is updated such that $P_k$ satisfies the secant condition $P_k(w_k-w_{k-1}) = g_k -g_{k-1}$. The original BFGS algorithm updates $P_k$ using all past information, the limited-memory variant only uses a fixed number of past parameters and gradients to build $P_k$. See [Nocedal and Wright, Numerical Optimization, 1999](https://www.math.uci.edu/~qnie/Publications/NumericalOptimization.pdf) or the [documentation](https://optax.readthedocs.io/en/latest/api/transformations.html#optax.scale_by_lbfgs) for more details on the implementation.\n"