Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Pitch] Full-batch optimization toolkit #55279

Open
rfeinman opened this issue Apr 3, 2021 · 6 comments
Open

[Feature Pitch] Full-batch optimization toolkit #55279

rfeinman opened this issue Apr 3, 2021 · 6 comments

Comments

@rfeinman
Copy link

@rfeinman rfeinman commented Apr 3, 2021

馃殌 Feature

It would be great to have a library of optimization routines for the deterministic setting (a la scipy.optimize) using PyTorch autograd mechanics. I have written a prototype library of PyTorch-based function minimizers and I'd like to guage the interest of the pytorch community. This could perhaps be a submodule of torch.optim, or it could be included elsewhere.

Motivation

For a variety of optimization problems, deterministic (or "full-batch") routines such as Newton and Quasi-Newton methods are preferrable over vanilla gradient descent. Although PyTorch offers many routines for stochastic optimization, utilities for deterministic optimization are scarce; only L-BFGS is included in the torch.optim package, and it's modified for mini-batch training.

MATLAB and SciPy are the industry standards for deterministic optimization. These libraries have a comprehensive set of routines; however, automatic differentiation is not supported. Therefore, the user must specify 1st- and 2nd-order gradients explicitly (they must be known) or use finite-difference approximations. This limits the applications considerably.

It would be wonderful to have a library of deterministic optimization routines in PyTorch. As a standalone entity, the library would be preferable over SciPy/MATLAB: the user only has to provide a function (not jacobian/hessian functions), and analytical derivatives are always used. But perhaps more importantly, it would be great to have these resources available inside of PyTorch to use alongside all of its other great tools.

Pitch

See complete pitch at pytorch-minimize.

My current library offers 8 different methods for unconstrained minimization. A complete description of these algorithms is provided in the readme. Constrained minimization is not currently implemented, but could perhaps be included in the future.

import torch
from torchmin import minimize  # <- perhaps ``from torch.optim import minimize``

def rosen(x):
    return torch.sum(100*(x[..., 1:] - x[..., :-1]**2)**2 
                     + (1 - x[..., :-1])**2)

# initial point
x0 = torch.tensor([1., 8.])

# BFGS
result = minimize(rosen, x0, method='bfgs')

# Newton Conjugate Gradient
result = minimize(rosen, x0, method='newton-cg')

methods: BFGS, L-BFGS, Conjugate Gradient (CG), Newton Conjugate Gradient (NCG), Newton Exact, Dogleg, Trust-Region Exact, Trust-Region NCG

Additional context

torch.optim is tailored to training neural networks with stochastic (mini-batch) gradient descent. I envision a minimize package that is tailored to a different set of problems, namely those of the deterministic or "full-batch" setting. For example, we may want to do various manipulations with a trained neural net such as: search for adversarial examples, traverse a latent space (e.g. styleGAN), or find a MAP estimate of latent variables. In addition, we may want to solve an optimization sub-problem inside the loop of our neural network optimization, as for example in EM algorithms like sparse dictionary learning.

cc @mruberry @kurtamohler @vincentqb

@rfeinman rfeinman changed the title Deterministic optimization routines (fminunc / fmincon) [Feature Pitch] Deterministic optimization routines Apr 4, 2021
@rfeinman rfeinman changed the title [Feature Pitch] Deterministic optimization routines [Feature Pitch] Deterministic optimization toolkit Apr 4, 2021
@rfeinman
Copy link
Author

@rfeinman rfeinman commented Apr 11, 2021

Update: I've added a preliminary constrained optimizer to the proposed library. I included a tutorial showing how to find an optimal adversarial perturbation subject to perturbation norm constraint.

@rfeinman
Copy link
Author

@rfeinman rfeinman commented Apr 13, 2021

@mruberry @mrshenli - I had an alternative idea that I wanted to share.

I realize that adding a new "minimize" module could be difficult to coordinate since the schematic differs considerably from existing torch optimizers. As an alternative, I wrote a new "Minimize" class that inherits from optim.Optimizer and supports parameter specifications just like existing optimizers. At the moment, the optimizer wraps scipy.optimize.minimize and supports all of its various solver methods. Like optim.LBFGS, Minimize needs to re-evaluate the objective multiple times, so a closure callable must be provided.

Eventually it would be ideal to use custom torch implementations of the underlying solver algorithms and remove the scipy dependency. Currently, Minimize interacts with scipy behind the scenes, so the front-end schematic is exactly as other optimizers (parameters are torch.Tensor or nn.Parameter, objectives use torch, etc.). CUDA is supported but will only be used for evaluating the objective and gradient; other numerical computations performed by the underlying solvers are on CPU (with numpy arrays).

@mruberry
Copy link
Contributor

@mruberry mruberry commented Apr 14, 2021

Hey @rfeinman! This is a cool idea and I'd like to discuss it with more of the PyTorch team.

One question we always have, however, with possible extensions is: could the desired functionality be provided as a library on top of PyTorch, or does it need to be included in PyTorch itself? Are there advantages to including this in PyTorch that may not be obvious?

@rfeinman
Copy link
Author

@rfeinman rfeinman commented Apr 14, 2021

@mruberry valid question. Deterministic optimization is a well-established area of machine learning, and I feel that PyTorch users could get a lot out of having these tools in the core library. I've seen a number of forum posts and tickets asking about the Conjugate Gradient optimizer, which is mentioned in the torch.optim documentation page but not actually implemented (I think it was previously available in torch, but never ported to pytorch?). Clearly there seems to be an interest in optimization routines of this family.

Although PyTorch is primarily a neural network library, people now use it for a range of scientific computing applications, especially those which can be accelerated by GPU. In addition to supporting the neural network applications that I mentioned in my original ticket, I think deterministic optimizers can help expand the pytorch horizon by offering high-performance implementations of widely-used tools from libraries like SciPy and MATLAB. It would seem to fit well with other ongoing pytorch projects, such as the linalg module.

@mruberry
Copy link
Contributor

@mruberry mruberry commented Apr 14, 2021

Makes sense, @rfeinman. @vincentqb, what are your thoughts?

@jbschlosser jbschlosser changed the title [Feature Pitch] Deterministic optimization toolkit [Feature Pitch] Full-batch optimization toolkit Apr 23, 2021
@rfeinman
Copy link
Author

@rfeinman rfeinman commented May 10, 2021

@mruberry @vincentqb I put together a write-up of the proposed API here. It includes a summary of both the functional and object-oriented APIs that I've mentioned. Would be curious to hear your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants