<a href="https://colab.research.google.com/github/sandhyaparna/Python-DataScience-CookBook/blob/master/Tensorflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import tensorflow as tf

#### Note: It is possible to bake this tf.nn.softmax in as the activation function for the last layer of the network. While this can make the model output more directly interpretable, this approach is discouraged as it's impossible to provide an exact and numerically stable loss calculation for all models when using a softmax output.

In [12]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)]) #the model returns a vector of "logits" or "log-odds" scores

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

#### Difference between loss and metrics
* The loss function is that parameter one passes to Keras model.compile which is actually optimized while training the model . 
* metric is used for judging the performance of the model.

In [None]:
### loss functions for Regression

# Last layer of the model should be
model.add(Dense(1, activation='linear'))

# Mean Squared Error Loss - if the distribution of the target variable is Gaussian/Normal
model.compile(loss='mean_squared_error', metrics=['mse']) 

# Mean Squared Logarithmic Error Loss - When target value has a spread of values and when predicting a large value, you may not want to punish a model as heavily as mean squared error.
# first calculates the natural logarithm of each of the predicted values, then calculate the mean squared error
model.compile(loss='mean_squared_logarithmic_error')

# Mean Absolute Error Loss - When distribution of the target variable may be mostly Gaussian, but may have outliers, e.g. large or small values far from the mean value. it is more robust to outliers
# It is calculated as the average of the absolute difference between the actual and predicted values
model.compile(loss='mean_absolute_error')

In [None]:
### loss functions for Classification

# Binary Cross-Entropy Loss - It is intended for use with binary classification where the target values are in the set {0, 1}
model.compile(loss='binary_crossentropy', metrics=['accuracy'])
# Last layer of the model should be
model.add(Dense(1, activation='sigmoid'))

# Hinge Loss - It is intended for use with binary classification where the target values are in the set {-1, 1}. primarily developed for use with Support Vector Machine (SVM) models.
# The hinge loss function encourages examples to have the correct sign, assigning more error when there is a difference in the sign between the actual and predicted class values.
model.compile(loss='hinge')
# Last layer of the model should be
model.add(Dense(1, activation='tanh')) # Activation function is tanh as range is [-1,1]


# Squared Hinge Loss - It is intended for use with binary classification where the target values are in the set {-1, 1}. 
# If using a hinge loss does result in better performance on a given binary classification problem, is likely that a squared hinge loss may be appropriate.
# It has the effect of smoothing the surface of the error function and making it numerically easier to work with.
model.compile(loss='squared_hinge')
# Last layer of the model should be
model.add(Dense(1, activation='tanh')) # Activation function is tanh as range is [-1,1]


In [None]:
### Multi-Class Classification Loss Functions

# Last layer of the model should be
model.add(Dense(n, activation='softmax')) # n (number of nodes) is number of classes. this means that the target variable must be one hot encoded before splitting into train and test and before fitting the model

# Cross-entropy is the default loss function to use for multi-class classification problems where each class is assigned a unique integer value
model.compile(loss='categorical_crossentropy')

# Sparse Multiclass Cross-Entropy Loss: Sparse cross-entropy addresses this by performing the same cross-entropy calculation of error, without requiring that the target variable be one hot encoded prior to training.
# For example, predicting words in a vocabulary may have tens or hundreds of thousands of categories, one for each label. This can mean that the target element of each training example may require a one hot encoded vector with tens or hundreds of thousands of zero values, requiring significant memory.
model.compile(loss='sparse_categorical_crossentropy')

# Kullback Leibler Divergence Loss
# Kullback Leibler Divergence, or KL Divergence for short, is a measure of how one probability distribution differs from a baseline distribution
# KL divergence loss function is more commonly used when using models that learn to approximate a more complex function than simply multi-class classification, such as in the case of an autoencoder used for learning a dense feature representation under a model that must reconstruct the original input. In this case, KL divergence loss would be preferred.

model.compile(loss='kullback_leibler_divergence')

