# Mean squared error loss (MSE)
The Mean Squared Error (MSE) or Mean Squared Deviation (MSD) of an estimator measures the average of error squares i.e. the average squared difference between the estimated values and true value. It is a risk function, corresponding to the expected value of the squared error loss. 

**It is always non – negative and values close to zero are better.** 

The equation of the function is 


![MSE](https://miro.medium.com/max/808/1*-e1QGatrODWpJkEwqP4Jyg.png)

In [14]:
from sklearn.metrics import mean_squared_error 

# Given values 
Y_true = [1,1,2,2,4] # Y_true = Y (original values) 

# calculated values 
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y' 

# Calculation of Mean Squared Error (MSE) 
mean_squared_error(Y_true,Y_pred) 


0.21606

In [20]:
#Manual Calculation of the MSE for above code
(1/5)*((-0.4*-0.4)+(0.29*0.29)+(-0.01*-0.01)+(0.69*0.69)+(-0.6*-0.6))

0.21605999999999997

# Mean Absolute error (MAE)

It is the average of sum of absolute differences between predictions and actual observations

The equation of the function is 

![alt text](https://miro.medium.com/max/780/1*fYNhlncTwLYqUl-_H6YElA.png)

In [22]:
from sklearn.metrics import mean_absolute_error 

# Given values 
Y_true = [1,1,2,2,4] # Y_true = Y (original values) 

# calculated values 
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y' 

# Calculation of mean_absolute_error (MSE) 
mean_absolute_error(Y_true,Y_pred) 



0.398

In [26]:
#Manual Calculation
(1/5)*(0.4+0.29+0.01+0.69+0.6)

0.39799999999999996

# Mean squared logarithmic error (MSLE)
Use MSLE when doing regression, believing that your target, conditioned on the input, is normally distributed, and you don’t want large errors to be significantly more penalized than small ones, in those cases where the range of the target value is large.

Example: You want to predict future house prices, and your dataset includes homes that are orders of magnitude different in price. The price is a continuous value, and therefore, we want to do regression. MSLE can here be used as the loss function.

The expression is as follows
![alt text](https://raw.githubusercontent.com/imsajeev/deep-learning-using-python/master/MSLE.jpg)


In [36]:
from sklearn.metrics import mean_squared_log_error 

# Given values 
Y_true = [1,1,2,2,4] # Y_true = Y (original values) 

# calculated values 
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y' 

# Calculation of mean_squared_log_error  
mean_squared_log_error(Y_true,Y_pred) 


0.025466969135005298

# Mean absolute percentage error (MAPE)
The mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics, for example in trend estimation, also used as a loss function for regression problems in machine learning. It usually expresses the accuracy as a ratio defined by the formula:

![alt text](https://i.imgur.com/OBBvmIH.jpg)

In [48]:
#from sklearn.metrics import mean_absolute_percentage_error
import numpy as np

# Given values 
Y_true = [1,1,2,2,4] # Y_true = Y (original values) 

# calculated values 
Y_pred = [0.6,1.29,1.99,2.69,3.4] # Y_pred = Y' 

# Calculation of mean_squared_log_error  
#mean_absolute_percentage_error(Y_true,Y_pred) 
def mean_absolute_percentage_error(y_true, y_pred): 
  y_true, y_pred = np.array(y_true), np.array(y_pred)
  mape=(np.mean(np.abs((y_true - y_pred) / y_true)))*(100/len(Y_true))
  return mape


MAPE=mean_absolute_percentage_error(Y_true,Y_pred)
MAPE

4.76

# Classification Loss
##1.Hinge Loss
also known as Multi class SVM Loss. Hinge loss is applied for maximum-margin classification, prominently for support vector machines. It is a convex function used in convex optimizers.

Hinge loss is the most commonly used loss function when the network must be
optimized for a hard classification. For example, 0 = no fraud and 1 = fraud,
which by convention is called a 0-1 classifier. The 0,1 choice is somewhat
arbitrary and –1, 1 is also seen in lieu of 0–1. Hinge loss is also seen in a class of
models called maximum-margin classification models (e.g., support vector
machines, a somewhat distant cousin to neural networks).

![alt text](https://raw.githubusercontent.com/imsajeev/deep-learning-using-python/master/hingeloss.jpg)

##Hinge Loss for two case prediction

In [54]:
#Create a Sample SVM Model
from sklearn import svm
from sklearn.metrics import hinge_loss
X = [[0], [1],[2],[-5]]
y = [-1, 1,1,-1]
est = svm.LinearSVC(random_state=0)
est.fit(X, y)
#LinearSVC(random_state=0)
pred_decision = est.decision_function([[-2], [3], [0.5]])
pred_decision


array([-2.18181059,  2.36363311,  0.09091126])

In [55]:
#finding Hinge loss
hinge_loss([-1, 1, 1], pred_decision)


0.3030295801509197

##Hinge Loss for Multi case prediction

In [56]:
import numpy as np
X = np.array([[0], [1], [2], [3]])
Y = np.array([0, 1, 2, 3])
labels = np.array([0, 1, 2, 3])
est = svm.LinearSVC()
est.fit(X, Y)
#LinearSVC()
pred_decision = est.decision_function([[-1], [2], [3]])
y_true = [0, 2, 3]
hinge_loss(y_true, pred_decision, labels=labels)

0.5641164913813278

## 2.Logistic loss

###Introduction
Log Loss is the most important classification metric based on probabilities.

It's hard to interpret raw log-loss values, but log-loss is still a good metric for comparing models. For any given problem, a lower log-loss value means better predictions.

Log Loss is a slight twist on something called the Likelihood Function. In fact, Log Loss is -1 * the log of the likelihood function. So, we will start by understanding the likelihood function.

The likelihood function answers the question "How likely did the model think the actually observed set of outcomes was." If that sounds confusing, an example should help.

###Example
A model predicts probabilities of [0.8, 0.4, 0.1] for three houses. The first two houses were sold, and the last one was not sold. So the actual outcomes could be represented numeically as [1, 1, 0].

Let's step through these predictions one at a time to iteratively calculate the likelihood function.

The first house sold, and the model said that was 80% likely. So, the likelihood function after looking at one prediction is 0.8.

The second house sold, and the model said that was 40% likely. There is a rule of probability that the probability of multiple independent events is the product of their individual probabilities. So, we get the combined likelihood from the first two predictions by multiplying their associated probabilities. That is 0.8 * 0.4, which happens to be 0.32.

Now we get to our third prediction. That home did not sell. The model said it was 10% likely to sell. That means it was 90% likely to not sell. So, the observed outcome of not selling was 90% likely according to the model. So, we multiply the previous result of 0.32 by 0.9.

We could step through all of our predictions. Each time we'd find the probability associated with the outcome that actually occurred, and we'd multiply that by the previous result. That's the likelihood.

###From Likelihood to Log Loss
Each prediction is between 0 and 1. If you multiply enough numbers in this range, the result gets so small that computers can't keep track of it. So, as a clever computational trick, we instead keep track of the log of the Likelihood. This is in a range that's easy to keep track of. We multiply this by negative 1 to maintain a common convention that lower loss scores are better.

In [58]:
from sklearn.metrics import log_loss
log_loss(["spam", "ham", "ham", "spam"],[[.1, .9], [.9, .1], [.8, .2], [.35, .65]])

0.21616187468057912