# <center> Keras </center>
## <center>1.6.1 Loss functions</center>

# Loss functions

The loss function (or objective function) is the quantity that will be minimized during training. It represents a measure of success on the task at hand.

<img src="img/structure_loss.PNG" width="60%" />
For different problems there are better suiting functions:
<img src="img/LossFunction.PNG" width = "70%" />
Some of them were mathematically proven to be good. Others were only observed to suit best for a certain kind of problem.<br>
Other functions:
 - mean_absolute_error
 - mean_absolute_percentage_error
 - mean_squared_logarithmic_error
 - squared_hinge
 - hinge
 - categorical_hinge
 - logcosh
 - sparse_categorical_crossentropy
 - kullback_leibler_divergence
 - poisson
 - cosine_proximity

# Code

Let us run the code from the previous section first.

In [None]:
# Importing the MNIST dataset
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Processing the input data
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

# Processing the output data
from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Build a network
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(units=512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(units=10, activation='softmax'))

Let us now examine the code related to compiling a network. 

In [None]:
# Compile the network
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

Here we are using `rmsprop` as the optimizer and `categorical_crossentropy` as the loss function. The performance metric is `accuracy`.

In [None]:
import matplotlib.pyplot as plt
def plot_training_history(history):
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    #loss
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()

In [None]:
# Train the network
history = network.fit(train_images, train_labels, epochs=5, batch_size=128, 
                      verbose=1, validation_data=(test_images, test_labels))

# Plot the training results
plot_training_history(history)

# Task
Try out four different loss functions and see which one performs better<br>:

- mean_absolute_error
- mean_absolute_percentage_error
- mean_squared_logarithmic_error
- squared_hinge
- hinge
- categorical_hinge
- logcosh
- sparse_categorical_crossentropy
- kullback_leibler_divergence
- poisson
- cosine_proximity

# Summary
In this section we learned about the loss functions.

# Feedback
<a href = "http://goto/ml101_doc/Keras09">Feedback: Loss functions</a> <br>

# Navigation

<div>
<span> <h3 style="display:inline">&lt;&lt; Prev: <a href = "keras08.ipynb">Compile a network</a></h3> </span>
<span style="float: right"><h3 style="display:inline">Next: <a href = "keras10.ipynb">Optimizers</a> &gt;&gt; </h3></span>
</div>