[proposed feature] Eve: Improving Stochastic Gradient Descent with Feedback #1329

snowyday · 2017-04-22T06:32:24Z

Hi all,

I implemented a paper "Improving Stochastic Gradient Descent with Feedback" as called Eve.
Eve is a modified version of Adam, and outperforms other SGD algorithms on some benchmark tasks including image classification.
Please, give me any advice and code review.

The code is uploaded in this gist, and below:

import math
# from .optimizer import Optimizer
from torch.optim import Optimizer


class Eve(Optimizer):
    """Implements Eve (Adam with feedback) algorithm.
    
    It has been proposed in `Improving Stochastic Gradient Descent with Feedback, `_.
    
    Arguments:
        params (iterable): iterable of parameters to optimize or dicts defining
            parameter groups
        lr (float, optional): learning rate (default: 1e-2)
        betas (Tuple[float, float, float], optional): coefficients used for computing
            running averages of gradient and its square (default: (0.9, 0.999, 0.999))
        thr ((Tuple[float, float], optional): lower and upper threshold for relative change 
            (default: (0.1, 10))
        eps (float, optional): term added to the denominator to improve
            numerical stability (default: 1e-8)
        weight_decay (float, optional): weight decay (L2 penalty) (default: 0)
        
    .. _Eve\: Improving Stochastic Gradient Descent with Feedback
        https://arxiv.org/abs/1611.01505
    """

    def __init__(self, params, lr=1e-2, betas=(0.9, 0.999, 0.999), eps=1e-8, thr=(0.1, 10), weight_decay=0):
        defaults = dict(lr=lr, betas=betas, eps=eps, thr=thr, weight_decay=weight_decay)
        super(Eve, self).__init__(params, defaults)

    def step(self, closure=None):
        """Performs a single optimization step.
        
        Arguments:
            closure (callable, optional): A closure that reevaluates the model
                and returns the loss.
        """

        if closure is not None:
            loss = closure()
            loss_val = loss.data[0]
        else:
            raise ValueError("Eve requires a value of the loss function.")

        for group in self.param_groups:
            for p in group['params']:

                grad = p.grad.data
                state = self.state[id(p)]

                # State initialization
                if len(state) == 0:
                    state['step'] = 0
                    # Exponential moving average of gradient values
                    state['exp_avg'] = grad.new().resize_as_(grad).zero_()
                    # Exponential moving average of squared gradient values
                    state['exp_avg_sq'] = grad.new().resize_as_(grad).zero_()
                    # Previous loss value
                    state['loss_hat_prev'] = loss_val
                    # Feed-back from the loss function
                    state['decay_rate'] = 1

                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']
                beta1, beta2, beta3 = group['betas']
                thl, thu = group['thr']
                loss_hat_prev = state['loss_hat_prev']

                state['step'] += 1

                if group['weight_decay'] != 0:
                    grad = grad.add(group['weight_decay'], p.data)

                # Decay the first and second moment running average coefficient
                exp_avg.mul_(beta1).add_(1 - beta1, grad)
                exp_avg_sq.mul_(beta2).addcmul_(1 - beta2, grad, grad)

                bias_correction1 = 1 - beta1 ** state['step']
                bias_correction2 = 1 - beta2 ** state['step']
                step_size = group['lr'] * math.sqrt(bias_correction2) / bias_correction1

                if state['step'] > 1:
                    if loss_val >= loss_hat_prev:
                        lower_bound = thl + 1
                        upper_bound = thu + 1
                    else:
                        lower_bound = 1 / (thu + 1)
                        upper_bound = 1 / (thl + 1)

                    clip = min(max(lower_bound, loss_val / loss_hat_prev), upper_bound)
                    loss_hat = clip * loss_hat_prev
                    relative_change = abs(loss_hat - loss_hat_prev) / min(loss_hat, loss_hat_prev)
                    state['decay_rate'] = beta3 * state['decay_rate'] + (1 - beta3) * relative_change
                    state['loss_hat_prev'] = loss_hat

                denom = exp_avg_sq.sqrt().mul_(state['decay_rate']).add_(group['eps'])

                p.data.addcdiv_(-step_size, exp_avg, denom)

        return loss

jsuarez5341 · 2017-04-30T05:04:22Z

Awesome! Will try this out on my language model as soon as a GPU frees up. Am interested to see how this does on large tasks that take several days of training. Will report back soon, this may be something that really needs to be added in

…_bool Leave bottleneck masks as bool

snowyday changed the title ~~[proposed feature] Improving Stochastic Gradient Descent with Feedback~~ [proposed feature] Eve: Improving Stochastic Gradient Descent with Feedback Apr 22, 2017

soumith added this to Uncategorized in Issue Status Aug 23, 2017

soumith added this to nn / autograd / torch in Issue Categories Sep 13, 2017

ezyang added feature A request for a proper, new feature. module: optimizer Related to torch.optim triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 3, 2019

eqy pushed a commit to eqy/pytorch that referenced this issue Jan 20, 2022

Fix test names. (pytorch#1329)

578e6a9

hubertlu-tw pushed a commit to hubertlu-tw/pytorch that referenced this issue Nov 1, 2022

Merge pull request pytorch#1329 from NVIDIA/leave_bottleneck_masks_as…

1a43f29

…_bool Leave bottleneck masks as bool

jithunnair-amd pushed a commit that referenced this issue Mar 18, 2024

Setting APEX version to 1.1.0 (#1329)

728f8d6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proposed feature] Eve: Improving Stochastic Gradient Descent with Feedback #1329

[proposed feature] Eve: Improving Stochastic Gradient Descent with Feedback #1329

snowyday commented Apr 22, 2017 •

edited

jsuarez5341 commented Apr 30, 2017

[proposed feature] Eve: Improving Stochastic Gradient Descent with Feedback #1329

[proposed feature] Eve: Improving Stochastic Gradient Descent with Feedback #1329

Comments

snowyday commented Apr 22, 2017 • edited

jsuarez5341 commented Apr 30, 2017

snowyday commented Apr 22, 2017 •

edited