Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharpness Aware Minimization (SAM) requires closure #64

Closed
manza-ari opened this issue Jun 13, 2022 · 21 comments
Closed

Sharpness Aware Minimization (SAM) requires closure #64

manza-ari opened this issue Jun 13, 2022 · 21 comments
Assignees
Labels
question Further information is requested

Comments

@manza-ari
Copy link

Hi, thank you so much for your repo, I am using SAM optimizer but I am facing this error, how to fix this?

RuntimeError: [-] Sharpness Aware Minimization (SAM) requires closure

@kozistr kozistr self-assigned this Jun 13, 2022
@kozistr kozistr added the question Further information is requested label Jun 13, 2022
@kozistr
Copy link
Owner

kozistr commented Jun 13, 2022

Hello!

First of all, thanks for your interest to the repo!

you can find the usage at the docstring here!
closure function should be passed into the step() function.

if possible, please upload your code so that debug the codes more accurately :)

For now, there's lack of docs, but someday i'm gonna build a documentation to use easily (i can't sure when it's done).

if you have more questions, feel free to comment here

best regards

@manza-ari
Copy link
Author

Thank you for your reply, I have gone through this documentation but still, I am not getting how to fix it. However, the code is here

`if method =='lloss':
models = {'backbone': resnet18, 'module': loss_module}

        # Loss, criterion and scheduler (re)initialization
        criterion      = nn.CrossEntropyLoss(reduction='none')
        base_optimizer = torch.optim.SGD
        optim_backbone = SAM(models['backbone'].parameters(), base_optimizer, lr=LR, 
            momentum=MOMENTUM, weight_decay=WDECAY)
        sched_backbone = lr_scheduler.MultiStepLR(optim_backbone, milestones=MILESTONES)
        optimizers = {'backbone': optim_backbone}
        schedulers = {'backbone': sched_backbone}
        `

@kozistr
Copy link
Owner

kozistr commented Jun 13, 2022

I think the definition part (your code) is perfect, but the training part.

To use SAM optimizer, we should call the optimizer in the training process like below!

    # use this loss for any training statistics
    loss = criterion(output, model(input))
    loss.backward()
    optimizer.first_step(zero_grad=True)

    # second forward-backward pass
    # make sure to do a full forward pass
    criterion(output, model(input)).backward()
    optimizer.second_step(zero_grad=True)

% optimizer is equal to optimizers['backbone']
% model is equal to models['backbone']

@manza-ari
Copy link
Author

This is the training part

`def train(models, method, criterion, optimizers, schedulers, dataloaders, num_epochs, epoch_loss):
print('>> Train a Model.')
best_acc = 0.

for epoch in range(num_epochs):

    best_loss = torch.tensor([0.5]).cuda()
    loss = train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss)

    schedulers['backbone'].step()
    if method == 'lloss':
        schedulers['module'].step()

    if False and epoch % 20  == 7:
        acc = test(models, epoch, method, dataloaders, mode='test')
        # acc = test(models, dataloaders, mc, 'test')
        if best_acc < acc:
            best_acc = acc
            print('Val Acc: {:.3f} \t Best Acc: {:.3f}'.format(acc, best_acc))
print('>> Finished.')`

@kozistr
Copy link
Owner

kozistr commented Jun 13, 2022

maybe in the train_epoch function, there're codes loss backward part (loss.backward())

@manza-ari
Copy link
Author

manza-ari commented Jun 13, 2022

Thank you so much for your help. I wrote something like this

`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()

    iters += 1

    optimizers['backbone'].zero_grad()
    if method == 'lloss':
        optimizers['module'].zero_grad()

    scores, _, features = models['backbone'](inputs) 
    target_loss = criterion(scores, labels)

    if method == 'lloss':
        if epoch > epoch_loss:
            features[0] = features[0].detach()
            features[1] = features[1].detach()
            features[2] = features[2].detach()
            features[3] = features[3].detach()

        pred_loss = models['module'](features)
        pred_loss = pred_loss.view(pred_loss.size(0))
        m_module_loss   = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss + WEIGHT * m_module_loss 
    else:
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss

    #######        SAM Optimizer 
    
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
    
    criterion(scores, method(input)).backward()              # I have error here 
    optimizers['backbone'].second_step(zero_grad=True)
    
    if method == 'lloss':
        optimizers['module'].step()

    return loss`

@kozistr
Copy link
Owner

kozistr commented Jun 13, 2022

criterion(scores, method(input)).backward()

maybe, it should be changed to,

criterion(models['backbone'](inputs)[0], labels).backward()

similar scheme with criterion(scores, labels)

@manza-ari
Copy link
Author

manza-ari commented Jun 13, 2022

`
criterion(models'backbone'[0], labels).backward()
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 166, in backward
grad_tensors
= make_grads(tensors, grad_tensors, is_grads_batched=False)
File "/home/kanza/anaconda3/envs/optuna/lib/python3.8/site-packages/torch/autograd/init.py", line 67, in _make_grads
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

`

@kozistr
Copy link
Owner

kozistr commented Jun 13, 2022

maybe criterion in your code doesn't return scalar output(s).

I think the whole codes (below) are the criterion function.

    target_loss = criterion(scores, labels)

    if method == 'lloss':
        if epoch > epoch_loss:
            features[0] = features[0].detach()
            features[1] = features[1].detach()
            features[2] = features[2].detach()
            features[3] = features[3].detach()

        pred_loss = models['module'](features)
        pred_loss = pred_loss.view(pred_loss.size(0))
        m_module_loss   = LossPredLoss(pred_loss, target_loss, margin=MARGIN)
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss + WEIGHT * m_module_loss 
    else:
        m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
        loss            = m_backbone_loss
    # `loss` here is the final loss.

@manza-ari
Copy link
Author

No, Actually this repo is using multiple methods such as Random or 'lloss'
I have removed the module of that method for sake of simplicity, now can you suggest me

`def train_epoch(models, method, criterion, optimizers, dataloaders, epoch, epoch_loss):
models['backbone'].train()
if method == 'lloss':
models['module'].train()
global iters
for data in tqdm(dataloaders['train'], leave=False, total=len(dataloaders['train'])):
with torch.cuda.device(CUDA_VISIBLE_DEVICES):
inputs = data[0].cuda()
labels = data[1].cuda()
iters += 1
optimizers['backbone'].zero_grad()

    if method == 'lloss':
        optimizers['module'].zero_grad()

    scores, _, features = models['backbone'](inputs) 
    target_loss = criterion(scores, labels)

    
    m_backbone_loss = torch.sum(target_loss) / target_loss.size(0)        
    loss            = m_backbone_loss

    # -----------------SAM Optimizer -------------------
    #loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
    optimizers['backbone'].second_step(zero_grad=True)
    criterion(models['backbone'](inputs)[0], labels).backward()
    return loss`

@kozistr
Copy link
Owner

kozistr commented Jun 14, 2022

I guess this should be worked,

    criterion(models['backbone'](inputs)[0], labels).backward()
    optimizers['backbone'].first_step(zero_grad=True)

    criterion(models['backbone'](inputs)[0], labels).backward()
    optimizers['backbone'].second_step(zero_grad=True)

if still there's an error, please check my test code! (below codes are runnable with no errors, and correct usage)

  1. https://github.com/kozistr/pytorch_optimizer/blob/main/tests/test_optimizers.py#L187
  2. https://github.com/kozistr/pytorch_optimizer/blob/main/tests/test_optimizers.py#L213

@manza-ari
Copy link
Author

manza-ari commented Jun 15, 2022

Thank you so much for your help and recommendations. I cannot thank you enough.
I fixed the error by adding loss.backward()

`

-----------------SAM Optimizer -------------------

    loss.backward()
    criterion(models['backbone'](inputs)[0], labels)
    optimizers['backbone'].first_step(zero_grad=True)
  
    criterion(models['backbone'](inputs)[0], labels)
    optimizers['backbone'].second_step(zero_grad=True)

`
I hope I am using SAM correctly

@kozistr
Copy link
Owner

kozistr commented Jun 15, 2022

maybe you should call loss.backward() twice

only calling criterion(models['backbone'](inputs)[0], labels) doesn't do backward(), just calculating loss

below code is the correct usage!

    loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].first_step(zero_grad=True)
  
    loss = criterion(models['backbone'](inputs)[0], labels)
    loss.backward()
    optimizers['backbone'].second_step(zero_grad=True)

@manza-ari
Copy link
Author

When I call loss.backward() twice it gives me following error

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

@kozistr
Copy link
Owner

kozistr commented Jun 15, 2022

by following error, you can specify retain_graph=True, then maybe error will be gone

@manza-ari
Copy link
Author

manza-ari commented Jun 15, 2022

None of them are working

`

-----------------SAM Optimizer -------------------

    criterion(models['backbone'](inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers['backbone'].first_step(zero_grad=True)
    
    criterion(models['backbone'](inputs)[0], labels)
    loss.backward(retain_graph=True)
    optimizers['backbone'].second_step(zero_grad=True)

-----------------SAM Optimizer for LLOSS Method -------------------

    if method == 'lloss':
        #optimizers['module'].step()
        loss1 = criterion(models['backbone'](inputs)[0], labels)
        loss1.backward( )
        optimizers['module'].first_step(zero_grad=True)
        
        loss2 = criterion(models['backbone'](inputs)[0], labels)
        loss2.backward( )
        optimizers['module'].second_step(zero_grad=True)

        loss = torch.tensor([loss1, loss2])
        loss.backward(gradient=torch.tensor([1.0,1.0]))

`
Error Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 100]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

@manza-ari
Copy link
Author

As per the sample given here https://github.com/davda54/sam loss.backward() is not required for secont_step. the first one working fine for me but not working for my LLOSS method

@kozistr
Copy link
Owner

kozistr commented Jun 15, 2022

actually it does! (do backward twice)

in the example code (got from https://github.com/davda54/sam), it does backward() twice. And, by the concept of SAM optimizer, forward-backward pass must be done twice!

  # first forward-backward pass
  loss = loss_function(output, model(input))  # use this loss for any training statistics
  loss.backward()
  optimizer.first_step(zero_grad=True)
  
  # second forward-backward pass
  loss_function(output, model(input)).backward()  # make sure to do a full forward pass
  # it is equal to
  # loss = loss_function(output, model(input))
  # loss.backward()
  optimizer.second_step(zero_grad=True)

@manza-ari
Copy link
Author

Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it.
raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

@kozistr
Copy link
Owner

kozistr commented Jun 16, 2022

Okay right, but there are some errors using backwar() the second time. I don't know how to resolve it. raise RuntimeError("grad can be implicitly created only for scalar outputs") RuntimeError: grad can be implicitly created only for scalar outputs

grad can be implicitly created only for scalar outputs error means that loss is not a scalar, but vector. you need to check whether the loss is scalar.

it's depends on the output(s) of model & loss function of your codes. so take a look into that part!

@kozistr
Copy link
Owner

kozistr commented Aug 21, 2022

#66 (comment)

@kozistr kozistr closed this as completed Aug 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants