New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using binary_crossentropy loss (Tensorflow backend) #7678

Open
FREDMINGLI opened this Issue Aug 17, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@FREDMINGLI

FREDMINGLI commented Aug 17, 2017

In the training example in Keras documentation,

https://keras.io/getting-started/sequential-model-guide/#training

binary_crossentropy is used and sigmoid activation is added in the network's last layer, but is it necessary that add sigmoid in the last layer? As I found in the source code:

def binary_crossentropy(output, target, from_logits=False):
  """Binary crossentropy between an output tensor and a target tensor.
  Arguments:
      output: A tensor.
      target: A tensor with the same shape as `output`.
      from_logits: Whether `output` is expected to be a logits tensor.
          By default, we consider that `output`
          encodes a probability distribution.
  Returns:
      A tensor.
  """
  # Note: nn.softmax_cross_entropy_with_logits
  # expects logits, Keras expects probabilities.
  if not from_logits:
    # transform back to logits
    epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype)
    output = clip_ops.clip_by_value(output, epsilon, 1 - epsilon)
    output = math_ops.log(output / (1 - output))
  return nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output)

Keras invokes sigmoid_cross_entropy_with_logits in Tensorflow, but in sigmoid_cross_entropy_with_logits function, sigmoid(logits) is calculated again.

https://www.tensorflow.org/versions/master/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits

So I don't think it makes sense that add a sigmoid at last, but seemingly all the binary/multi-label classification examples and tutorials in Keras I found online added sigmoid at last. Besides I don't understand what is the meaning of

# Note: nn.softmax_cross_entropy_with_logits
# expects logits, Keras expects probabilities.

Why Keras expects probabilities? Doesn't it use the nn.softmax_cross_entropy_with_logits function? Does it make sense?

Thanks.

@stale

This comment has been minimized.

Show comment
Hide comment
@stale

stale bot Nov 15, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

stale bot commented Nov 15, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the stale label Nov 15, 2017

@yesufeng

This comment has been minimized.

Show comment
Hide comment
@yesufeng

yesufeng Jan 2, 2018

The last sigmoid activation layer is to generate probability output as also mentioned in the doc above. However, tensorflow nn.sigmoid_cross_entropy_with_logits expects 'logits' output, to conform to the interface, this function, converts probability to logit for tensorflow backend. Thus the whole interface is consistent. So yes, the last layer of sigmoid activation is necessary.

yesufeng commented Jan 2, 2018

The last sigmoid activation layer is to generate probability output as also mentioned in the doc above. However, tensorflow nn.sigmoid_cross_entropy_with_logits expects 'logits' output, to conform to the interface, this function, converts probability to logit for tensorflow backend. Thus the whole interface is consistent. So yes, the last layer of sigmoid activation is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment