A Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum

This is a Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum called HF-CSAM.

Installation

Install via

git clone https://github.com/flo3003/HF-CSAM.git
cd HF-CSAM/hfcsam
python setup.py install

hfcsam requires a TensorFlow and Keras installation (the current code has been tested for realeases 1.6--1.8), but this is not currently enforced in the setup.py to allow for either the CPU or the GPU version.

Usage

The hfcsam module contains the class HF-CSAM, which inherits from keras optimizers and can be used as direct drop-in replacement for Keras's built-in optimizers.

from hfcsam import HFCSAM

loss = ...
opt = HFCSAM(dP=0.07, xi=0.99)
step = opt.minimize(loss)
with tf.Session() as sess:
    sess.run([loss, step])

HF-CSAM has two hyper-parameters: dP and xi. The dP parameter the step size and can vary depending on the problem. In MNIST and CIFAR datasets dP is 0.05 < dP < 0.5 and xi should be 0.5 < xi < 0.99 (the default value xi=0.99 should work for most problems).

Short Description of HF-CSAM

We give a short description of the algorithm, ignoring various details. Please refer to the [paper][1] for a complete description.

The algorithm's weight update rule is similar to SGD with momentum but with two main differences arising from the formulation of the training task as a constrained optimization problem: (i) the momentum term is scaled with curvature information (in the form of the Hessian); (ii) the coefficients for the learning rate and the scaled momentum term are adaptively determined.

The objective is to reach a minimum of the cost function L_t with respect to the synaptic weights, and simultaneously to maximize incrementally at each epoch the following quantity:

$\Large \Phi_t={\boldmath{dw}_t}^T\boldmath{H}_t\boldmath{dw}_{t-1}$

where $\Large \boldmath{dw}_t$ are the weight updates at the current time step, $\Large \boldmath{dw}_{t-1}$ are the weight updates at the previous time step and $\Large \boldmath{H}_t$ is the Hessian of the cost function $\Large \mathcal{L}_t$ .

At each epoch t of the learning process, the vector $\Large \boldmath{w}_t$ will be incremented by $\Large \boldmath{dw}_t$ , so that:

$\Large \boldmath{dw}_t^T\boldmath{dw}_t=(\delta P)^2$

And the objective function $\Large \mathcal{L}_t$ must be decremented by a quantity $\Large \delta\mathcal{Q}_t$ , so that:

$\Large d\mathcal{L}_t=\delta\mathcal{Q}_t$

The learning rule can be derived by solving the following constrained optimization problem:

Maximize z $\Large \Phi_t={\boldmath{dw}_t}^T\boldmath{H}_t\boldmath{dw}_{t-1}$

subject to the constraints

$\Large \boldmath{dw}_t^T\boldmath{dw}_t=(\delta P)^2$ and $\Large d\mathcal{L}_t=\delta\mathcal{Q}_t$

Hence, by solving this constrained optimization problem analytically, we get the following update rule:

$\Large \boldmath{dw}_t=-\frac{\lambda_1}{2\lambda_2}\boldmath{G}_t+\frac{1}{2\lambda_2}\boldmath{H}_td\boldmath{w}_{t-1}$

where $\Large \boldmath{G}_t$ is the gradient of the network's loss/cost function $\Large \mathcal{L}_t$ .

Feedback

If you have any questions or suggestions regarding this implementation, please open an issue in flo3003/HF-CSAM. Apart from that, we welcome any feedback regarding the performance of HF-CSAM on your training problems (mail to flwra.sakketoy@gmail.com).

Citation

If you use HF-CSAM for your research, please cite the [paper][1].

[1]: A Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum (under review)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
HF_CSAM		HF_CSAM
hfcsam		hfcsam
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HF_CSAM

HF_CSAM

hfcsam

hfcsam

LICENSE

LICENSE

README.md

README.md

Repository files navigation

A Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum

Installation

Usage

Short Description of HF-CSAM

Feedback

Citation

About

Releases

Packages

Languages

License

flo3003/HF-CSAM

Folders and files

Latest commit

History

Repository files navigation

A Hessian Free Neural Networks Training Algorithm with Curvature Scaled Adaptive Momentum

Installation

Usage

Short Description of HF-CSAM

Feedback

Citation

About

Resources

License

Stars

Watchers

Forks

Languages