Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Using binary_crossentropy loss (Tensorflow backend) #7678
In the training example in Keras documentation,
binary_crossentropy is used and sigmoid activation is added in the network's last layer, but is it necessary that add sigmoid in the last layer? As I found in the source code:
Keras invokes sigmoid_cross_entropy_with_logits in Tensorflow, but in sigmoid_cross_entropy_with_logits function, sigmoid(logits) is calculated again.
So I don't think it makes sense that add a sigmoid at last, but seemingly all the binary/multi-label classification examples and tutorials in Keras I found online added sigmoid at last. Besides I don't understand what is the meaning of
Why Keras expects probabilities? Doesn't it use the nn.softmax_cross_entropy_with_logits function? Does it make sense?
The last sigmoid activation layer is to generate probability output as also mentioned in the doc above. However, tensorflow nn.sigmoid_cross_entropy_with_logits expects 'logits' output, to conform to the interface, this function, converts probability to logit for tensorflow backend. Thus the whole interface is consistent. So yes, the last layer of sigmoid activation is necessary.