# Using binary_crossentropy loss (Tensorflow backend)

opened this Issue Aug 17, 2017

### FREDMINGLI commented Aug 17, 2017

 In the training example in Keras documentation, https://keras.io/getting-started/sequential-model-guide/#training binary_crossentropy is used and sigmoid activation is added in the network's last layer, but is it necessary that add sigmoid in the last layer? As I found in the source code: ``````def binary_crossentropy(output, target, from_logits=False): """Binary crossentropy between an output tensor and a target tensor. Arguments: output: A tensor. target: A tensor with the same shape as `output`. from_logits: Whether `output` is expected to be a logits tensor. By default, we consider that `output` encodes a probability distribution. Returns: A tensor. """ # Note: nn.softmax_cross_entropy_with_logits # expects logits, Keras expects probabilities. if not from_logits: # transform back to logits epsilon = _to_tensor(_EPSILON, output.dtype.base_dtype) output = clip_ops.clip_by_value(output, epsilon, 1 - epsilon) output = math_ops.log(output / (1 - output)) return nn.sigmoid_cross_entropy_with_logits(labels=target, logits=output) `````` Keras invokes sigmoid_cross_entropy_with_logits in Tensorflow, but in sigmoid_cross_entropy_with_logits function, sigmoid(logits) is calculated again. https://www.tensorflow.org/versions/master/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits So I don't think it makes sense that add a sigmoid at last, but seemingly all the binary/multi-label classification examples and tutorials in Keras I found online added sigmoid at last. Besides I don't understand what is the meaning of ``````# Note: nn.softmax_cross_entropy_with_logits # expects logits, Keras expects probabilities. `````` Why Keras expects probabilities? Doesn't it use the nn.softmax_cross_entropy_with_logits function? Does it make sense? Thanks.

### yesufeng commented Jan 2, 2018

 The last sigmoid activation layer is to generate probability output as also mentioned in the doc above. However, tensorflow nn.sigmoid_cross_entropy_with_logits expects 'logits' output, to conform to the interface, this function, converts probability to logit for tensorflow backend. Thus the whole interface is consistent. So yes, the last layer of sigmoid activation is necessary.