Sharpness Aware Minimization (SAM) requires closure #64

manza-ari · 2022-06-13T07:15:28Z

Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this?

RuntimeError: [-] Sharpness Aware Minimization (SAM) requires closure

kozistr · 2022-06-13T07:25:51Z

Hello!

First of all, thanks for your interest to the repo!

you can find the usage at the docstring here!
closure function should be passed into the step() function.

if possible, please upload your code so that debug the codes more accurately :)

For now, there's lack of docs, but someday i'm gonna build a documentation to use easily (i can't sure when it's done).

if you have more questions, feel free to comment here

best regards

manza-ari · 2022-06-13T07:31:51Z

Thank you for your reply, I have gone through this documentation but still, I am not getting how to fix it. However, the code is here

`if method =='lloss':
models = {'backbone': resnet18, 'module': loss_module}

        # Loss, criterion and scheduler (re)initialization
        criterion      = nn.CrossEntropyLoss(reduction='none')
        base_optimizer = torch.optim.SGD
        optim_backbone = SAM(models['backbone'].parameters(), base_optimizer, lr=LR, 
            momentum=MOMENTUM, weight_decay=WDECAY)
        sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)
        optimizers = {'backbone': optim_backbone}
        schedulers = {'backbone': sched_backbone}
        `

kozistr · 2022-06-13T07:37:17Z

I think the definition part (your code) is perfect, but the training part.

To use SAM optimizer, we should call the optimizer in the training process like below!

    # use this loss for any training statistics
    loss = criterion(output, model(input))
    loss.backward()
    optimizer.first_step(zero_grad=True)

    # second forward-backward pass
    # make sure to do a full forward pass
    criterion(output, model(input)).backward()
    optimizer.second_step(zero_grad=True)

% optimizer is equal to optimizers['backbone']
% model is equal to models['backbone']

manza-ari · 2022-06-13T07:49:06Z

This is the training part

`def train(models, method, criterion, optimizers, schedulers, dataloaders, num_epochs, epoch_loss):
print('>> Train a Model.')
best_acc = 0.

for epoch in range(num_epochs):

    best_loss = torch.tensor([0.5]).cuda()
    loss = train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss)

    schedulers['backbone'].step()
    if method == 'lloss':
        schedulers['module'].step()

    if False and epoch % 20  == 7:
        acc = test(models, epoch, method, dataloaders, mode='test')
        # acc = test(models, dataloaders, mc, 'test')
        if best_acc < acc:
            best_acc = acc
            print('Val Acc: {:.3f} \t Best Acc: {:.3f}'.format(acc, best_acc))
print('>> Finished.')`

kozistr · 2022-06-13T10:08:12Z

maybe in the train_epoch function, there're codes loss backward part (loss.backward())

manza-ari · 2022-06-13T10:37:59Z

Thank you so much for your help. I wrote something like this

`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()

    iters += 1

    optimizers['backbone'].zero_grad()
    if method == 'lloss':
        optimizers['module'].zero_grad()

    scores, _, features = models['backbone'](inputs) 
    target_loss = criterion(scores, labels)

    if method == 'lloss':
        if epoch > epoch_loss:
            features[0] = features[0].detach()
            features[1] = features[1].detach()
            features[2] = features[2].detach()
            features[3] = features[3].detach()

        pred_loss = models['module'](features)
        pred_loss = pred_loss.view(pred_loss.size(0))
        m_module_loss   = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss + WEIGHT * m_module_loss 
    else:
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss

    #######        SAM Optimizer 
    
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
    
    criterion(scores, method(input)).backward()              # I have error here 
    optimizers['backbone'].second_step(zero_grad=True)
    
    if method == 'lloss':
        optimizers['module'].step()

    return loss`

kozistr · 2022-06-13T10:41:57Z

criterion(scores, method(input)).backward()

maybe, it should be changed to,

criterion(models['backbone'](inputs)[0], labels).backward()

similar scheme with criterion(scores, labels)

manza-ari · 2022-06-13T10:44:50Z

`
criterion(models'backbone'[0], labels).backward()
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 166, in backward
grad_tensors = make_grads(tensors, grad_tensors, is_grads_batched=False)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 67, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

`

kozistr · 2022-06-13T11:22:42Z

maybe criterion in your code doesn't return scalar output(s).

I think the whole codes (below) are the criterion function.

    target_loss = criterion(scores, labels)

    if method == 'lloss':
        if epoch > epoch_loss:
            features[0] = features[0].detach()
            features[1] = features[1].detach()
            features[2] = features[2].detach()
            features[3] = features[3].detach()

        pred_loss = models['module'](features)
        pred_loss = pred_loss.view(pred_loss.size(0))
        m_module_loss   = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss + WEIGHT * m_module_loss 
    else:
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss
    # `loss` here is the final loss.

manza-ari · 2022-06-13T12:00:32Z

No, Actually this repo is using multiple methods such as Random or 'lloss'
I have removed the module of that method for sake of simplicity, now can you suggest me

`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers['backbone'].zero_grad()

    if method == 'lloss':
        optimizers['module'].zero_grad()

    scores, _, features = models['backbone'](inputs) 
    target_loss = criterion(scores, labels)

    
    m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
    loss            = m_backbone_loss

    # -----------------SAM Optimizer -------------------
    #loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
    optimizers['backbone'].second_step(zero_grad=True)
    criterion(models['backbone'](inputs)[0], labels).backward()
    return loss`

kozistr · 2022-06-14T13:37:39Z

I guess this should be worked,

    criterion(models['backbone'](inputs)[0], labels).backward()
    optimizers['backbone'].first_step(zero_grad=True)

    criterion(models['backbone'](inputs)[0], labels).backward()
    optimizers['backbone'].second_step(zero_grad=True)

if still there's an error, please check my test code! (below codes are runnable with no errors, and correct usage)

manza-ari · 2022-06-15T02:41:31Z

Thank you so much for your help and recommendations. I cannot thank you enough.
I fixed the error by adding loss.backward()

`

-----------------SAM Optimizer -------------------

    loss.backward()
    criterion(models['backbone'](inputs)[0], labels)
    optimizers['backbone'].first_step(zero_grad=True)
  
    criterion(models['backbone'](inputs)[0], labels)
    optimizers['backbone'].second_step(zero_grad=True)

`
I hope I am using SAM correctly

kozistr · 2022-06-15T02:47:31Z

maybe you should call loss.backward() twice

only calling criterion(models['backbone'](inputs)[0], labels) doesn't do backward(), just calculating loss

below code is the correct usage!

    loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
  
    loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].second_step(zero_grad=True)

manza-ari · 2022-06-15T03:08:55Z

When I call loss.backward() twice it gives me following error

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

kozistr · 2022-06-15T04:53:28Z

by following error, you can specify retain_graph=True, then maybe error will be gone

manza-ari · 2022-06-15T05:26:01Z

None of them are working

`

-----------------SAM Optimizer -------------------

    criterion(models['backbone'](inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers['backbone'].first_step(zero_grad=True)
    
    criterion(models['backbone'](inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers['backbone'].second_step(zero_grad=True)

-----------------SAM Optimizer for LLOSS Method -------------------

    if method == 'lloss':
        #optimizers['module'].step()
        loss1 = criterion(models['backbone'](inputs)[0], labels)
        loss1.backward( )
        optimizers['module'].first_step(zero_grad=True)
        
        loss2 = criterion(models['backbone'](inputs)[0], labels)
        loss2.backward( )
        optimizers['module'].second_step(zero_grad=True)

        loss = torch.tensor([loss1, loss2])
        loss.backward(gradient=torch.tensor([1.0,1.0]))

`
Error Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

manza-ari · 2022-06-15T06:23:07Z

As per the sample given here https://github.com/davda54/sam loss.backward() is not required for secont_step. the first one working fine for me but not working for my LLOSS method

kozistr · 2022-06-15T10:46:19Z

actually it does! (do backward twice)

in the example code (got from https://github.com/davda54/sam), it does backward() twice. And, by the concept of SAM optimizer, forward-backward pass must be done twice!

  # first forward-backward pass
  loss = loss_function(output, model(input))  # use this loss for any training statistics
  loss.backward()
  optimizer.first_step(zero_grad=True)
  
  # second forward-backward pass
  loss_function(output, model(input)).backward()  # make sure to do a full forward pass
  # it is equal to
  # loss = loss_function(output, model(input))
  # loss.backward()
  optimizer.second_step(zero_grad=True)

manza-ari · 2022-06-16T03:53:43Z

Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it.
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

kozistr · 2022-06-16T10:53:29Z

Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it. raise RuntimeError("grad can be implicitly created only for scalar outputs") RuntimeError: grad can be implicitly created only for scalar outputs

grad can be implicitly created only for scalar outputs error means that loss is not a scalar, but vector. you need to check whether the loss is scalar.

it's depends on the output(s) of model & loss function of your codes. so take a look into that part!

kozistr · 2022-08-21T10:19:16Z

#66 (comment)

kozistr self-assigned this Jun 13, 2022

kozistr added the question Further information is requested label Jun 13, 2022

kozistr closed this as completed Aug 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sharpness Aware Minimization (SAM) requires closure #64

Sharpness Aware Minimization (SAM) requires closure #64

manza-ari commented Jun 13, 2022

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022 •

edited

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022 •

edited

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022

kozistr commented Jun 14, 2022

manza-ari commented Jun 15, 2022 •

edited

kozistr commented Jun 15, 2022

manza-ari commented Jun 15, 2022

kozistr commented Jun 15, 2022

manza-ari commented Jun 15, 2022 •

edited

manza-ari commented Jun 15, 2022

kozistr commented Jun 15, 2022 •

edited

manza-ari commented Jun 16, 2022

kozistr commented Jun 16, 2022

kozistr commented Aug 21, 2022

Sharpness Aware Minimization (SAM) requires closure #64

Sharpness Aware Minimization (SAM) requires closure #64

Comments

manza-ari commented Jun 13, 2022

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022 • edited

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022 • edited

kozistr commented Jun 13, 2022

manza-ari commented Jun 13, 2022

kozistr commented Jun 14, 2022

manza-ari commented Jun 15, 2022 • edited

-----------------SAM Optimizer -------------------

kozistr commented Jun 15, 2022

manza-ari commented Jun 15, 2022

kozistr commented Jun 15, 2022

manza-ari commented Jun 15, 2022 • edited

-----------------SAM Optimizer -------------------

-----------------SAM Optimizer for LLOSS Method -------------------

manza-ari commented Jun 15, 2022

kozistr commented Jun 15, 2022 • edited

manza-ari commented Jun 16, 2022

kozistr commented Jun 16, 2022

kozistr commented Aug 21, 2022

manza-ari commented Jun 13, 2022 •

edited

manza-ari commented Jun 13, 2022 •

edited

manza-ari commented Jun 15, 2022 •

edited

manza-ari commented Jun 15, 2022 •

edited

kozistr commented Jun 15, 2022 •

edited