Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dice score function #3611

Closed
hadim opened this issue Aug 28, 2016 · 18 comments
Closed

Dice score function #3611

hadim opened this issue Aug 28, 2016 · 18 comments

Comments

@hadim
Copy link

hadim commented Aug 28, 2016

I am using the following score function :

def dice_coef(y_true, y_pred, smooth=1):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

def dice_coef_loss(y_true, y_pred):
    return -dice_coef(y_true, y_pred)

# ...
model.compile(optimizer=optimizer, loss=dice_coef_loss, metrics=[dice_coef])
# ...

It works pretty well for me training a fully DCNN to segment images.

Would you be interested in a PR in order to implement this in Keras ?

Note that the original implementation comes from the Kaggle post https://www.kaggle.com/c/ultrasound-nerve-segmentation/forums/t/21358/0-57-deep-learning-keras-tutorial

@alexander-rakhlin
Copy link
Contributor

alexander-rakhlin commented Aug 29, 2016

I suggest averaging across batch axis, 0-dimension:

def dice_coef(y_true, y_pred, smooth=1):
    intersection = K.sum(y_true * y_pred, axis=[1,2,3])
    union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3])
    return K.mean( (2. * intersection + smooth) / (union + smooth), axis=0)

@wassname
Copy link
Contributor

Don't you think it should be?

def dice_coef_loss(y_true, y_pred):
    return 1-dice_coef(y_true, y_pred)

With your code a correct prediction get -1 and a wrong one gets -0.25, I think this is the opposite of what a loss function should be.

# not matched
dice_coef_loss(
    K.theano.shared(np.array([[0,0,0]])),
    K.theano.shared(np.array([[1,1,1]]))
).eval() # -0.25
# match
dice_coef_loss(
    K.theano.shared(np.array([[0,0,0]])),
    K.theano.shared(np.array([[0,0,0]]))
).eval() # -1.0

Here's suggestion which uses vector operations and averages across the batch axis.

def dice_coef(y_true, y_pred, smooth=1):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.dot(y_true, K.transpose(y_pred))
    union = K.dot(y_true,K.transpose(y_true))+K.dot(y_pred,K.transpose(y_pred))
    return (2. * intersection + smooth) / (union + smooth)

def dice_coef_loss(y_true, y_pred):
    return K.mean(1-dice_coef(y_true, y_pred),axis=-1)

# test
dice_coef_loss(
    K.theano.shared(np.array([[0,0,0],[0,0,0]])),
    K.theano.shared(np.array([[1,1,1],[1,1,1]]))
).eval() 
# array([ 0.99999997,  0.99999997])

dice_coef_loss(
    K.theano.shared(np.array([[0,0,0],[0,0,0]])),
    K.theano.shared(np.array([[0,0,0],[0,0,0]]))
).eval() # array([ 0.,  0.])

@cicobalico
Copy link

Hi @wassname, could you clarify your statement?

With your code a correct prediction get -1 and a wrong one gets -0.25, I think this is the opposite of what a loss function should be.

I'm quite new to ML but isn't a loss function supposed to output a lower value for a correct prediction and a higher value for a wrong one? isn't that exactly what @hadim version of the function is doing?

@wassname
Copy link
Contributor

wassname commented Sep 12, 2016

@cicobalico yeah sure

EDIT: I was wrong about that sorry

When I used OP's loss function my CNN converged on the exact opposite answer and made an inverse mask instead of a mask. That makes sense if it was working backwards towards -0.25.

(its not just hadim, it's written that way in [a](https://github.com/jocicmarko/ultrasound-nerve-segmentation/blob/master/train.py#L26) [couple](https://github.com/EdwardTyantov/ultrasound-nerve-segmentation/blob/master/metric.py#L19) of repos which makes me think I'm missing something) 

@alexander-rakhlin
Copy link
Contributor

alexander-rakhlin commented Sep 12, 2016

1-dice_coef
OR
-dice_coef
makes no difference for convergence, I used both.
1-dice_coef just more familiar for monitoring as its value belong to [0, 1], not [-1, 0]

@wassname
Copy link
Contributor

but for -dice_coef it converges on y_pred!=y_true, doesn't it. I gave specific examples above.

I think the ranges [0,1] and [0,-1] would be interchangeable but not [0,1] and [-1,0] as in this case.

@alexander-rakhlin
Copy link
Contributor

It shouldn't. Back propagation must minimize loss as low as it can, -1 in case of (-dice_coef) loss

@wassname
Copy link
Contributor

Ah that makes sense then, thanks for clarifying that!

@stale stale bot added the stale label May 23, 2017
@stale
Copy link

stale bot commented May 23, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

@karhunenloeve
Copy link

Hello everybody,
I need to use the dice coefficient for some computation on biomedical image data. My question is, shouldn't there be a K.abs() expression?
Aren't intersection and union only a valid measure for absolute values?

Thanks for answering in advance!

@JadBatmobile
Copy link

if you are using dice coefficient as a loss, should you not specify the derivative of the dice coefficient w.r.t. to the output layer so that back propagation can work?

@jizhang02
Copy link

hi,
I use dice loss in u-net, but the predicted images are all white.
Could someone explain that?

@ankurshukla03
Copy link

hi,
I use dice loss in u-net, but the predicted images are all white.
Could someone explain that?

I suppose white means it is considering all the images as foreground. Can you post more about how did you made training set and is it binary level segmentation or multi label segmentation.

@jizhang02
Copy link

hi,
I use dice loss in u-net, but the predicted images are all white.
Could someone explain that?

I suppose white means it is considering all the images as foreground. Can you post more about how did you made training set and is it binary level segmentation or multi label segmentation.

I use the code from the first floor:
`def dice_coef(y_true, y_pred, smooth=1):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

def dice_coef_loss(y_true, y_pred):
return -dice_coef(y_true, y_pred)

...

model.compile(optimizer=optimizer, loss=dice_coef_loss, metrics=[dice_coef])

...

`
yes, it is binary level segmentation. I use U-Net network based on Keras.
Because the output of the last layer is the probablity value of sigmoid function, if probability>05, the color = 1; is probability<0.5, the color = 0. I watched the output, that most of the probability are 0.49xxxxx, or 0.50xxxxxx. So that will cause the prediceted image become all white or all black.
I think the ideal two(binary) output should be 0.9xxxx and 0.1xxxx, that is to say, they should be very close to 1.0 and 0.0.
Do you have some points to let the output probability close to ground truth value? Thank you.

@tinalegre
Copy link

tinalegre commented May 14, 2019

I suggest averaging across batch axis, 0-dimension:

def dice_coef(y_true, y_pred, smooth=1):
    intersection = K.sum(y_true * y_pred, axis=[1,2,3])
    union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3])
    return K.mean( (2. * intersection + smooth) / (union + smooth), axis=0)

@alexander-rakhlin i've seen that some implementations of the dice-coefficient use smooth=1, where does this value comes from? From what I understand, this value is used to avoid division by zero, so why not use a very small value close to zero (e.g. smooth=1e-9)? In addition, by suggesting axis=[1,2,3], I guess you're assuming a 4D Tensorflow Tensor of size (Batch, Height, Width, Channels), right?

@alexander-rakhlin
Copy link
Contributor

@tinalegre this was 3 years ago and I can't remember where this smooth=1 comes from. 1e-7 is a better idea. Yes, we are speaking of 4D tensor. That time I was using Theano's (Batch, Channels, Height, Width), but this makes no difference

@tinalegre
Copy link

@alexander-rakhlin thank you! It doesn't make any difference you mean, because for both channel_first => (batch, channels, height, width) or channel_last=>(batch, height, width, channels) representations, the Batch dimension is at axis=0 and thus return K.mean(iou, axis=0) would work for both, right?

@Tombery1
Copy link

Tombery1 commented Jun 2, 2022

lease,what is the correct implementation of the dice coefficient

def dice_coef1(y_true, y_pred, smooth=1):
  intersection = K.sum(y_true * y_pred, axis=[1,2,3])
  union = K.sum(y_true, axis=[1,2,3]) + K.sum(y_pred, axis=[1,2,3])
  dice = K.mean((2. * intersection + smooth)/(union + smooth), axis=0)
  return dice

Gives me the following result = 0.85

or

def dice_coef2(target, prediction, smooth=1):
    numerator = 2.0 * K.sum(target * prediction) + smooth
    denominator = K.sum(target) + K.sum(prediction) + smooth
    coef = numerator / denominator

    return coef

Gives me the following result : 0.94

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants