Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dice Loss PR #1249

Open
rogertrullo opened this issue Apr 12, 2017 · 39 comments

Comments

@rogertrullo
Copy link

commented Apr 12, 2017

Hi, I have implemented a Dice loss function which is used in segmentation tasks, and sometimes even preferred over cross_entropy. More info in this paper:
http://campar.in.tum.de/pub/milletari2016Vnet/milletari2016Vnet.pdf
Here's the link of it:
https://github.com/rogertrullo/pytorch/blob/rogertrullo-dice_loss/torch/nn/functional.py#L708
How could I submit a PR?
thanks!

@IssamLaradji

This comment has been minimized.

Copy link

commented May 31, 2017

Is your code doing the same thing as this ?

def dice_loss(input, target):
    smooth = 1.

    iflat = input.view(-1)
    tflat = target.view(-1)
    intersection = (iflat * tflat).sum()
    
    return 1 - ((2. * intersection + smooth) /
              (iflat.sum() + tflat.sum() + smooth))

@soumith soumith added this to Uncategorized in Issue Status Aug 23, 2017

@tommy-qichang

This comment has been minimized.

Copy link

commented Sep 6, 2017

Hi @IssamLaradji
I've a few questions about the code.

  1. Does smooth similar to eps which avoid division by zero?
  2. Like the cross entropy loss, the result should be a positive value so I'm wondering if is that correct :

return 1 - ((2. * intersection + smooth) / (iflat.sum() + tflat.sum() + smooth))

Thanks

@soumith soumith added this to nn / autograd / torch in Issue Categories Sep 13, 2017

@IssamLaradji

This comment has been minimized.

Copy link

commented Oct 19, 2017

@tommy-qichang

  1. smooth does more than that. You can set smooth to zero and add eps to the denominator to prevent division by zero. However, having a larger smooth value (also known as Laplace smooth, or Additive smooth) can be used to avoid overfitting. The larger the smooth value the closer the following term is to 1 (if everything else is fixed),
((2. * intersection + smooth) /  (iflat.sum() + tflat.sum() + smooth))

This decreases the penalty obtained from having 2*intersection different from iflat.sum() + tflat.sum() . A similar approach is commonly used in Naive Bayes, see equation (119) in these notes.

  1. Yah that should be the case, good catch!
@tommy-qichang

This comment has been minimized.

Copy link

commented Oct 19, 2017

Thanks @IssamLaradji for your reply.
As for 2, even the cost value is negative, I think it doesn't affect the backpropagation.

@IssamLaradji

This comment has been minimized.

Copy link

commented Oct 19, 2017

That's true. It shouldn't affect the optimization

@rogertrullo

This comment has been minimized.

Copy link
Author

commented Oct 24, 2017

Hi @IssamLaradji, for some reason I never got to reply. I am sorry.
These codes are actually doing something very similar but they have small differences:

  1. The Dice ratio in my code follows the definition presented in the paper I mention; (the difference it's in the denominator where you define the union as the sum whereas I use the sum of the squares).

  2. Additionally, my code was thought to be used for having 2 channels in the last layer whereas yours takes only one channel (possibly the output of a sigmoid layer).

I just tested your code and mine and theres a difference of the order of 1e-3. I am not really sure why, I think is related to the fact that I basically compute the dice independently for each element in the batch and then divide by the batch size, but not really sure.

@varghesealex90

This comment has been minimized.

Copy link

commented Oct 27, 2017

@IssamLaradji

  1. Does your implementation need the target and predictions to be in one hot encoding manner? I think it does not need.

  2. Lets say I have 3 classes in an image, is it possible to get the dice score associated with each class? If this is possible, one could assign a more weightage to under represented classes

Regards

Varghese

@rogertrullo

This comment has been minimized.

Copy link
Author

commented Oct 27, 2017

@varghesealex90 read my previous post, as I mentioned, isamm's code takes as input only one channel representing the probability of a pixel being foreground, so it is for binary problems.

@varghesealex90

This comment has been minimized.

Copy link

commented Oct 27, 2017

I was browsing through your code. I have a multi class problem However my labels are one hot encoded. Is there a quick and neat way of doing it in pytorch

@IssamLaradji

This comment has been minimized.

Copy link

commented Oct 27, 2017

@varghesealex90 A naive quick way is to apply the dice loss on each channel with a different weight. Note that this does not tie the classes in the output layer as it treats each class independently as a binary problem. But earlier layers would try to learn features that differentiate between the classes.

Here is an inefficient way of doing this,

def dice_loss(input, target):
    smooth = 1.
    loss = 0.
    for c in range(n_classes):
           iflat = input[:, c ].view(-1)
           tflat = target[:, c].view(-1)
           intersection = (iflat * tflat).sum()
           
           w = class_weights[c]
           loss += w*(1 - ((2. * intersection + smooth) /
                             (iflat.sum() + tflat.sum() + smooth)))
    return loss

Where class_weights is a list containing the weight for each class and input and target are shaped as (n_batches, n_classes, height, width). target is assumed to be one-hot encoded.

With proper vectorization, you can make this run much faster.

@rogertrullo

This comment has been minimized.

Copy link
Author

commented Oct 27, 2017

@varghesealex90 here's a way to convert a Tensor to one hot:
y_onehot.scatter_(1,GT.view(GT(0),1,GT.size(1),GT.size(2)),1)

This assumes that GT is a tensor of shape N,H,W, and y_onehot is a Tensor of shape N,C,H,W where C is the number of classes

@varghesealex90

This comment has been minimized.

Copy link

commented Oct 27, 2017

@rogertrullo

I think there is bug in the one hot encoding code you shared here.
batch_size=3
gt = torch.ones(batch_size,4,4).long() # N,H,W

make dumpy 3 classes

gt[0]=0
gt[2]=2

y_one_hot = torch.FloatTensor(batch_size, 4,4)

y_one_hot.scatter_(1,gt.view(gt.size(0),1,gt.size(1),gt.size(2)),1)

error:

File "", line 1, in
RuntimeError: invalid argument 3: Index tensor must have same dimensions as output tensor at /Users/soumith/code/builder/wheel/pytorch-src/torch/lib/TH/generic/THTensorMath.c:50

I know I am missing something here. Any ideas?

@rogertrullo

This comment has been minimized.

Copy link
Author

commented Oct 27, 2017

@varghesealex90 the error is in y_one_hot = torch.FloatTensor(batch_size,4,4) , it should be y_one_hot = torch.FloatTensor(batch_size,3,4,4), Notice the the 3 indicating number of classes

@varghesealex90

This comment has been minimized.

Copy link

commented Oct 27, 2017

@rogertrullo I am using your Dice Loss function, however I am getting the following error
num=torch.sum(num,dim=3)#####b,c
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 476, in sum
return Sum.apply(self, dim, keepdim)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/reduce.py", line 21, in forward
return input.sum(dim)
RuntimeError: dimension out of range (expected to be in range of [-3, 2], but got 3)

Dimension of input and target are 10,3,240,240
They have been converted to cuda float

@varghesealex90

This comment has been minimized.

Copy link

commented Oct 27, 2017

the size of input, target and num are
(10L, 3L, 240L, 240L)
(10L, 3L, 240L, 240L)
(10L, 3L, 240L, 240L)
which seems correct

@varghesealex90

This comment has been minimized.

Copy link

commented Oct 27, 2017

@rogertrullo Is the fix to convert subtract the dim by 1

num=torch.sum(num,dim=2)
num=torch.sum(num,dim=3)#b,c

den1=probs*probs#--p^2
den1=torch.sum(den1,dim=2)
den1=torch.sum(den1,dim=3)#b,c,1,1

den2=target*target#--g^2
den2=torch.sum(den2,dim=2)
den2=torch.sum(den2,dim=3)#b,c,1,1

The fix ?

num=torch.sum(num,dim=1)
num=torch.sum(num,dim=2)#b,c

den1=probs*probs#--p^2
den1=torch.sum(den1,dim=1)
den1=torch.sum(den1,dim=2)#b,c,1,1

den2=target*target#--g^2
den2=torch.sum(den2,dim=1)
den2=torch.sum(den2,dim=2)#b,c,1,1
@rogertrullo

This comment has been minimized.

Copy link
Author

commented Oct 27, 2017

@varghesealex90 the problem is that you are summing across channels from the beginning, so I am not sure if that is right (nor sure if it's not). But it should work without modification. Did you print the size of num before num=torch.sum(num,dim=3)#####b,c?

@varghesealex90

This comment has been minimized.

Copy link

commented Oct 27, 2017

('input', (10L, 3L, 240L, 240L))
('target', (10L, 3L, 240L, 240L))
('num', (10L, 3L, 240L, 240L))

File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 476, in sum
return Sum.apply(self, dim, keepdim)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/reduce.py", line 21, in forward
return input.sum(dim)
RuntimeError: dimension out of range (expected to be in range of [-3, 2], but got 3)

PS: I commented out these lines in the code

from . import _functions

from .modules import utils

from ._functions.padding import ConstantPad2d

from .modules.utils import _single, _pair, _triple

@rogertrullo

This comment has been minimized.

Copy link
Author

commented Oct 27, 2017

@varghesealex90 sorry, I don't see why it would fail. It is weird because num has 4 dimensions so it should be fine to sum across the 4th one; it should produce a tensor of shape 10,3,240,1

@varghesealex90

This comment has been minimized.

Copy link

commented Oct 27, 2017

@rogertrullo I think I got it

a= torch.ones(2,3,4,4)
print (a.size())
(2L, 3L, 4L, 4L)
b= torch.sum(a,dim=2)
print (b.size())
(2L, 3L, 4L)
c= torch.sum(b,dim=3)
Error pops up. Because there is no 4th dimension . I think we should keep dim 2 through out. Your thoughts?

@rogertrullo

This comment has been minimized.

Copy link
Author

commented Oct 27, 2017

@varghesealex90 oh in my computer it keeps a singleton dimension, in that case I guess you can switch the lines that add the 3rd dimension and the 2nd dimension, so that we first add the 3rd dimension and then the second (currently the 2nd is first and then the 3rd)

@varghesealex90

This comment has been minimized.

Copy link

commented Oct 27, 2017

Yes. That would be perfect.

@trypag

This comment has been minimized.

Copy link

commented Nov 12, 2017

@rogertrullo Hi, I am having issues optimizing the dice loss that you specified, the loss term should be something like loss = criterion(output, target_var) + (1- dice_loss(output, target_var)). That formulation is not possible because I noticed the upper bound of the function I implemented is not 1, as opposed to the real dice similarity. I rewrote the dice loss so that I can filter out the unwanted label -1, I might have introduced a bug while rewriting the code.

def dice_loss(output, target, weights=1, ignore_index=None):
    output = output.exp()
    encoded_target = output.data.clone().zero_()
    if ignore_index is not None:
        # mask of invalid label
        mask = target == ignore_index
        # clone target to not affect the variable ?
        filtered_target = target.clone()
        # replace invalid label with whatever legal index value
        filtered_target[mask] = 0
        # one hot encoding
        encoded_target.scatter_(1, filtered_target.unsqueeze(1), 1)
        # expand the mask for the encoded target array
        mask = mask.unsqueeze(1).expand(output.data.size())
        # apply 0 to masked pixels
        encoded_target[mask] = 0
    else:
        encoded_target.scatter_(1, target.unsqueeze(1), 1)
    encoded_target = Variable(encoded_target)

    assert output.size() == encoded_target.size(), "Input sizes must be equal."
    assert output.dim() == 4, "Input must be a 4D Tensor."

    numerator = (output * encoded_target).sum(dim=3).sum(dim=2)
    denominator = output.pow(2) + encoded_target
    if ignore_index is not None:
        # exclude masked values from den1
        denominator[mask] = 0

    dice = 2 * (numerator / denominator.sum(dim=3).sum(dim=2)) * weights
    return dice.sum() / dice.size(0)

Would you mind to give me your point of view on this code and the formulation of your loss please ?
Thanks

@rogertrullo

This comment has been minimized.

Copy link
Author

commented Nov 12, 2017

Hi @trypag , there is no need to compute 1-dice_val, instead I just multiply the dice_val by -1. We do that since we want to maximize the dice, and the optimizers trie to minimize the function. I haven't checked your code (I will do it later) but you would probably want to multiply by -1.

@trypag

This comment has been minimized.

Copy link

commented Nov 13, 2017

@rogertrullo thanks for answering, I have a few questions:

  • do you agree this loss is not bounded between 0 and 1 ? I noticed it can go higher than 1.
  • I understand that min(-dice) = max(dice), but in this configuration it seems strange, for example if you only optimise with the dice loss, you first start with a negative loss, I have no idea if it will get to 0 eventually, at least for me optimizing only with the dice does not converge.
  • With the cross_entropy loss, having loss = ce(output, target) - dice(output, target) we might have a negative loss at some time also.
    Thanks :)
@rogertrullo

This comment has been minimized.

Copy link
Author

commented Nov 16, 2017

@trypag , they way I did it, the loss should actually start with a value close to 0, and then decrease to a negative value. The "more negative" the better. It doesn't matter that the loss is negative, I have trained already systems with this loss and they work quite well.
How many classes do you have?

@trypag

This comment has been minimized.

Copy link

commented Nov 16, 2017

Alright, I have 134 classes.

@rogertrullo

This comment has been minimized.

Copy link
Author

commented Nov 16, 2017

@trypag, then the loss should go beyond -1. Basically what I do is to add the individual dice scores so the perfect score should be -134. If you want it to be between 0 and minus one, you should divide it by the number of classes

@faustomilletari

This comment has been minimized.

Copy link

commented Apr 28, 2018

the squares at the denominator. have a look at proof (somewhere in) here: https://mediatum.ub.tum.de/doc/1395260/1395260.pdf

@PeterXiaoGuo

This comment has been minimized.

Copy link

commented Jun 21, 2018

@IssamLaradji Hi, I read your post and I still face a problem that what's the shape of input and target?

I try to feed it in shape of (# of batch size, channel = 1, width, height) and find that dice_loss is larger than 1. Should I feed set batch_size = 1 during each dice loss calculation?

Besides, when I calculate the dice loss, should I divide it by 2, as @rogertrullo mentioned divided by the number of class?

I use prostate3T dataset and some of the labels containing {0, 1, 2} while most labels containing only {0, 1} values.

Thank you very much!

Peter

@JingLi-0131

This comment has been minimized.

Copy link

commented Oct 24, 2018

As for dice loss, should I have to write the backward() function for computing the gradient?
I watched the dice loss in V-net, https://github.com/mattmacy/vnet.pytorch, it included the backward() function from scratch.

@trypag

This comment has been minimized.

Copy link

commented Oct 24, 2018

No you don't since autograd will backprop everything for you.

@CodeR57

This comment has been minimized.

Copy link

commented Jan 2, 2019

Hey! I wanted to use the dice loss for training my network, but I can't find it in torch.nn.loss. Is the dice loss commit merged yet? How can I use it?

@StuckinPhD

This comment has been minimized.

Copy link

commented Jan 2, 2019

Hey! I wanted to use the dice loss for training my network, but I can't find it in torch.nn.loss. Is the dice loss commit merged yet? How can I use it?

Dice loss has not been merged yet. But you can use it as given above. Ive tried the code and it works. (Hoever for binary segmentation I see to be getting better results with BCE) Make sure to change your label tensor such that each channel represents a class, even if you have a binary segmentation problem, the network output should be 2 channels and the label tensor should be 2 channels, 1 for background and 1 for foreground.

@CodeR57

This comment has been minimized.

Copy link

commented Jan 3, 2019

@farazkhan86 Thanks a lot for the reply. Which of the above options did you use? My use case is in multi-class (3 classes) segmentation.

@StuckinPhD

This comment has been minimized.

Copy link

commented Jan 7, 2019

@CodeR57

I used the one provided by @rogertrullo but the one given by @IssamLaradji is also fine, both seem to give similar results.

You have to arrange your target tensor such that each channel represents a class. I'm using the following function to rewrite my target class to each channel:

def _expand_target(input, C, device):
    """
    Converts NxHxW label image to NxCxHxW, where each label is stored in a separate channel
    :param input: 3D input image (NxHxW)
    :param C: number of channels/labels
    :return: 4D output image (NxCxHxW)
    """
    assert input.dim() == 3
    shape = input.size()
    shape = list(shape)
    shape.insert(1, C)
    shape = tuple(shape)

    result = torch.zeros(shape)
    # for each batch instance
    for i in range(input.size()[0]):
        # iterate over channel axis and create corresponding binary mask in the target
        for c in range(C):
            mask = result[i, c]  
            mask[input[i] == c] = 1  
    return result.to(device)

All the best

@JingLiRaysightmed

This comment has been minimized.

Copy link

commented Jan 10, 2019

I defined the dice loss function showed below but it doesn't work.

def dice_loss(y_pred, y):  
    # y_pred.shape = torch.Size([1, 2, 128, 128, 128]) 
    # y.shape = torch.Size([1, 128, 128, 128])
    smooth = 1.
    print(y_pred.grad_fn)  # <CudnnConvolutionBackward object at 0x7f5ab3afdc50>
    y_pred = F.softmax(y_pred, dim=1)
    print(y_pred.grad_fn)  # <SoftmaxBackward object at 0x7f5ab3afdc50>
    y_pred = torch.argmax(y_pred, dim=1)
    print(y_pred.grad_fn)  # None

    iflat = y_pred.view(-1)
    tflat = y.view(-1)
    intersection = (iflat * tflat).sum()

    loss = 1 - ((2. * intersection + smooth) / (iflat.sum() + tflat.sum() + smooth))

    return loss

The error is

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I don't know why the grad_fn changed? And how to make dice loss work?

@emergencyd

This comment has been minimized.

Copy link

commented Sep 8, 2019

  • loss = ce(output, target) - dice(output, target)

I thought the dice loss could be used directly, like loss = - dice(output, target) rather than loss = ce(output, target) - dice(output, target).

I am quite curious about it... which one is right? @rogertrullo

@rogertrullo

This comment has been minimized.

Copy link
Author

commented Sep 8, 2019

@emergencyd -dice... should be fine, that was just a test trying to combine the regular cross entropy with the dice loss

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.