[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/frank77/Python-Data-Analysis/AAAMLP/Chapter4_Evaluation_Metrics/Evaluation Metrics.ipynb)

# Evaluation Metrics
## Classifcation
- Accuracy
- Precision(P)
- Recall (R)
- F1 Score(F1)
- AUC
- Log Loss
- Precision at k (P@K)
- Average Precision at k (AP@K)
- Mean Average Precision at k (MAP@K)
## Regression
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
- Root Mean Squared Logarithmic Error (RMSLE)
- Mean Percentage Error (MPE)
- Mean Absolute Percentage Error (MAPE)
- R2

## Classification Metrics

**When we have equal number of positive and negative samples in a binary classification, we generally use *accuracy， precision, recall and f1***

### Accuracy

In [None]:
def accuracy(y_true, y_pred):
    count_val = 0
    for i, j in zip(y_true, y_pred):
        if i == j:
            count_val += 1
    
    return count_val / len(y_true)

In [None]:
from sklearn import metrics

l1 = [0,1,1,1,0,0,0,1]
l2 = [0,1,0,1,0,1,0,0]


print(accuracy(l1,l2))

#sklearn function to calculate  accuracy 
metrics.accuracy_score(l1,l2)

**If the dataset is skewed, accuracy is not an good choice. we need to use precision.**

### Precision:
- **True Positive(TP)**
- **True Negtive(TN)**
- **False Positive(FP)**
- **False Negtive(FN)**

In [None]:
def TP(y_true, y_pred):
    tp = 0
    for y, p in zip(y_true, y_pred):
        if y == 1 and p == 1:
            tp += 1
    
    return tp 

def TN(y_true, y_pred):
    tn = 0
    for y, p in zip(y_true, y_pred):
        if y == 0 and p == 0:
            tn += 1
    
    return tn 

def FP(y_true, y_pred):
    fp = 0
    for y, p in zip(y_true, y_pred):
        if y == 0 and p == 1:
            fp += 1
    
    return fp

def FN(y_true, y_pred):
    fn = 0
    for y, p in zip(y_true, y_pred):
        if y == 1 and p == 0:
            fn += 1
    
    return fn 

In [None]:
l1 = [0,1,1,1,0,0,0,1]
l2 = [0,1,0,1,0,1,0,0]

print(TP(l1, l2), FP(l1, l2), FN(l1, l2), TN(l1, l2))

$$
\begin{gather}
\text{Accuracy} = \frac{(TP + TN)}{TP+TN+FP+FN}, \quad
\text{Precision} = \frac{TP}{TP+FP}, \quad
\text{Recall} = \frac{TP}{TP+FN}
\end{gather}
$$

In [None]:
def accuracy(y_true, y_pred):
    tp = TP(y_true, y_pred)
    tn = TN(y_true, y_pred)
    fp = FP(y_true, y_pred)
    fn = FN(y_true, y_pred)
    
    return (tp + tn) / (tp + tn + fp + fn)

def precision(y_true, y_pred):
    tp = TP(y_true, y_pred)
    fp = FP(y_true, y_pred)
    
    return tp / (tp + fp)

def recall(y_true, y_pred):
    tp = TP(y_true, y_pred)
    fn = FN(y_true, y_pred)
    
    return tp / (tp + fn)

In [None]:
l1 = [0,1,1,1,0,0,0,1]
l2 = [0,1,0,1,0,1,0,0]

print(accuracy(l1, l2), precision(l1, l2), recall(l1, l2))

### Precision Recall Curve

In [None]:
y_true = [0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0]

y_pred = [.02638412, 0.11114267, 0.31620708, 
0.0490937, 0.0191491, 0.17554844, 
0.15952202, 0.03819563, 0.11639273, 
0.079377, 0.08584789, 0.39095342, 
0.27259048, 0.03447096, 0.04644807, 
0.03543574, 0.18521942, 0.05934905, 
0.61977213, 0.33056815]

precisions = []
recalls = []
#how we assumed these threshold is a long story

thresholds = [0.0490937, 0.05934905, 0.079377,
              0.08584789, 0.11114267, 0.11639273, 
              0.15952202, 0.17554844, 0.18521942, 
              0.27259048, 0.31620708, 0.33056815, 
              0.39095342, 0.61977213] 


#for every threshold, calculate predictions in binary
#and append their calculated precision and recalls
#to their respective lists
for i in thresholds:
    temp_prediction = [1 if x >= i else 0 for x in y_pred]
    p = precision(y_true,temp_prediction)
    r = recall(y_true , temp_prediction)
    precisions.append(p)
    recalls.append(r)

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize = (7,7))
plt.plot(recalls,precisions)
plt.xlabel("Recall" ,fontsize = 15)
plt.ylabel("Precision",  fontsize = 15)

**Choosing a threshhod can be quite challenging**
### F1 Score
$$
\begin{gather}
\text{F1} = \frac{2PR}{P + R}, \quad  
\text{F1} = \frac{2TP}{2TP + FP +FN} 
\end{gather}
$$


In [None]:
def f1(y_true, y_pred):
    p = precision(y_true, y_pred)
    r = recall(y_true, y_pred)

    score = 2 * p * r / (p + r)
    return score

In [None]:
y_true = [0,0,0,1,0,0,0,0,0,0,
         1,0,0,0,0,0,0,0,1,0]


y_pred = [0,0,1,0,0,0,1,0,0,0,
         1,0,0,0,0,0,0,0,1,0]

f1(y_true,y_pred)


In [None]:
#Using scikit learn 
metrics.f1_score(y_true, y_pred)

### TPR - True Positive Rate = Recall
$$
TPR = \frac{TP}{TP + FN}
$$

In [None]:
def tpr(y_true, y_pred):
    tp = TP(y_true, y_pred)
    fn = FN(y_true, y_pred)
    
    return tp / (tp + fn)

### FPR - False Positive Rate
$$
FPR = \frac{FP}{FP + TN}
$$

In [None]:
def fpr(y_true, y_pred):
    fp = FP(y_true, y_pred)
    tn = TN(y_true, y_pred)

    return fp / (fp + tn)

In [None]:
#empty lists to store tpr
#and fpr values

tpr_list = []
fpr_list = []

#actual targets
y_true = [0,0,0,0,1,0,1,0,0,1,0,1,0,0,1]


#predicted probabilities of the sample being 1
y_pred = [0.1, 0.3, 0.2, 0.6, 0.8, 0.05, 0.9, 0.5, 0.3, 0.66, 0.3, 0.2, 0.85, 0.15, 0.99] 

#handmade thresholds 
thresholds = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.99, 1.0] 


#loop over all thresholds 
for thresh in thresholds:
    #calculate predictions for a given threshold
    temp_pred = [1 if x >= thresh else 0 for x in y_pred]
    #calculate tpr
    temp_tpr = tpr(y_true, temp_pred)
    #calculate fpr 
    temp_fpr = fpr(y_true, temp_pred) 
    #append tpr and fpr to lists 
    tpr_list.append(temp_tpr) 
    fpr_list.append(temp_fpr) 

In [None]:
import pandas as pd

values = {"threshold":thresholds,"tpr":tpr_list,"fpr":fpr_list}
values  = pd.DataFrame(values)
values

In [None]:
plt.figure(figsize = (7,7))
plt.plot(fpr_list, tpr_list, lw = 3)
plt.fill_between(fpr_list, tpr_list, alpha = 0.4)
plt.xlabel("FPR", fontsize = 15)
plt.ylabel("TPR", fontsize = 15)

- this curve is also known as Reciver Operating Characteristic (**ROC**)
- Area Under ROC curve is known as **AUC**
- they are often used when we have **skewed binary target**

In [None]:
from sklearn import metrics

y_true = [0, 0, 0, 0, 1, 0, 1,0, 0, 1, 0, 1, 0, 0, 1] 

y_pred = [0.1, 0.3, 0.2, 0.6, 0.8, 0.05, 0.9, 0.5, 0.3, 0.66, 0.3, 0.2, 0.85, 0.15, 0.99] 

metrics.roc_auc_score(y_true, y_pred)  

### Log Loss
$$
\text{Log Loss} = -1.0 \times (target - \log(pred) + (1 - target) * \log(1 - pred))
$$

In [None]:
import numpy as np

def log_loss(y_true, y_pred):
    epsilon = 1e-15 
    loss = 0
    n = len(y_true)
    for yt, yp in zip(y_true, y_pred):
        yp = np.clip(yp, epsilon, 1 - epsilon) 
        loss += yt * np.log(yp) + (1 - yt) * np.log(1 - yp)
    
    loss = - loss / n
    return loss

In [None]:
y_true = [0, 0, 0, 0, 1, 0, 1,0, 0, 1, 0, 1, 0, 0, 1]
y_proba = [0.1, 0.3, 0.2, 0.6, 0.8, 0.05,0.9, 0.5, 0.3, 0.66, 0.3, 0.2, 0.85, 0.15, 0.99]

log_loss(y_true,y_proba)

In [None]:
from sklearn import metrics

metrics.log_loss(y_true,y_proba)

### Precision For Multi-Class Classification
- **Macro Average Precision** $$ \text{Macro Precision}=\frac{1}{N}\sum_{i=1}^N Precision_i $$
- **Micro Average Precision** $$ \text{Micro Precision}=\frac{\sum_{i=1}^N TP_i}{\sum_{i=1}^N (TP_i + FP_i)} $$
- **Weighted Precision** $$ \text{Weighted Precision}=\frac{\sum_{i=1}^N (n_i * Precision_i)}{\sum_{i=1}^N N_i} $$

In [None]:
def macro_precision(y_true, y_pred):
    num_classes = len(np.unique(y_true))
    precisions = []
    for cls in range(num_classes):
        tp = sum(1 for yt, yp in zip(y_true, y_pred) if yt == cls and yp == cls)
        fp = sum(1 for yt, yp in zip(y_true, y_pred) if yt != cls and yp == cls)
        precision_cls = tp / (tp + fp) if (tp + fp) > 0 else 0
        precisions.append(precision_cls)
    
    return sum(precisions) / num_classes

def micro_precision(y_true, y_pred):
    num_classes = len(np.unique(y_true))
    tp = 0
    fp = 0
    for cls in range(num_classes):
        tp += sum(1 for yt, yp in zip(y_true, y_pred) if yt == cls and yp == cls)
        fp += sum(1 for yt, yp in zip(y_true, y_pred) if yt != cls and yp == cls)
    
    return tp / (tp + fp)

def weighted_precision(y_true, y_pred):
    num_classes = len(np.unique(y_true))
    precisions = []
    weights = []
    for cls in range(num_classes):
        tp = sum(1 for yt, yp in zip(y_true, y_pred) if yt == cls and yp == cls)
        fp = sum(1 for yt, yp in zip(y_true, y_pred) if yt != cls and yp == cls)
        precision_cls = tp / (tp + fp) if (tp + fp) > 0 else 0
        weight_cls = sum(1 for yt in y_true if yt == cls)
        precisions.append(precision_cls * weight_cls)
        weights.append(weight_cls)
    
    return sum(precisions) / sum(weights)

In [None]:
from sklearn import metrics 
y_true = [0, 1, 2, 0, 1, 2, 0, 2, 2]
y_pred = [0, 2, 1, 0, 2, 1, 0, 0, 2] 

print(macro_precision(y_true, y_pred)) 
print(metrics.precision_score(y_true, y_pred, average="macro"))

print(micro_precision(y_true, y_pred))
print(metrics.precision_score(y_true, y_pred, average="micro")) 

print(weighted_precision(y_true, y_pred))
print(metrics.precision_score(y_true, y_pred, average="weighted")) 

## MultiLabel Classification problem
In multilabel classificaion each sample can have one or more classes associated with it.
Metric for this problem
- Precision at K **P@k)**
- Average Precision at k **(AP@k)**
- Mean Average Precision at k **(MAP@k)**
- Log Loss 

In [None]:
def pk(y_true, y_pred, k):
    if k == 0:
        return 0
    y_pred = y_pred[:k]
    pred_set = set(y_pred)
    true_set = set(y_true)
    intersection = pred_set.intersection(true_set)
    return len(intersection) / k

In [None]:
def apk(y_true, y_pred, k):
    total_pk = 0
    for i in range(1, k + 1):
        total_pk += pk(y_true, y_pred, i)
    return total_pk / k

In [None]:
def mapk(y_true_list, y_pred_list, k):
    total_apk = 0
    n = len(y_true_list)
    for y_true, y_pred in zip(y_true_list, y_pred_list):
        total_apk += apk(y_true, y_pred, k)
    return total_apk / n

In [None]:
y_true = [
		[1, 2, 3],
		[0, 2],
		[1],
		[2, 3],
		[1, 0],
		[]
	]
y_pred = [
		[0, 1, 2],
		[1],
		[0, 2, 3],
		[2, 3, 4, 0],
		[0, 1, 2],
		[0]
	]
for i in range(len(y_true)):
	for j in range(1, 4):
		print(
				f"""
				y_true={y_true[i]},
				y_pred={y_pred[i]},
				AP@{j}={apk(y_true[i], y_pred[i], k=j)}
				"""
			)

In [None]:
y_true = [
			[1, 2, 3],
			[0, 2],
			[1],
			[2, 3],
			[1, 0],
			[]
		]

y_pred = [
			[0, 1, 2],
			[1],
			[0, 2, 3],
			[2, 3, 4, 0],
			[0, 1, 2],
			[0]
		]


print("Mean Average Precision @ k:", mapk(y_true, y_pred, k=1))
print("Mean Average Precision @ k:", mapk(y_true, y_pred, k=2))
print("Mean Average Precision @ k:", mapk(y_true, y_pred, k=3))
print("Mean Average Precision @ k:", mapk(y_true, y_pred, k=4))

**Log Loss** for multilabel classfication is quite easy. You can covert the targets to binary format and then use a log loss for each column. You can take the average of log loss of each column. 

## Metrics for Regression
- ### Error
- ### Absolute Error
- ### Mean Absolute Error(MAE)
- ### Mean Squared Error(MSE)
- ### Root Mean Squared Error(RMSE)
- ### Squared Logarithmic Error(SLE)
- ### Mean Squared Logarithmic Error(MSLE)
- ### Root Mean Squared Logarithmic Error(RMSLE)
- ### Percent Error
- ### Mean Absolute Percentage Eoor(MAPE)
- ### R2 or R-quared or coefficient of determination
$$
R^2 = 1 - \frac{\sum^N_{i=1}(y_{t_i} - y_{p_i})^2}{\sum^N_{i=1}(y_{t_i} - y_{t_{mean}})^2}
$$




In [None]:
def mse(y_true, y_pred):
    n = len(y_true)
    total_error = 0
    for yt, yp in zip(y_true, y_pred):
        total_error += (yt - yp) ** 2
    return total_error / n

def mae(y_true, y_pred):
    n = len(y_true)
    total_error = 0
    for yt, yp in zip(y_true, y_pred):
        total_error += abs(yt - yp)
    return total_error / n

def msle(y_true, y_pred):
    n = len(y_true)
    total_error = 0
    for yt, yp in zip(y_true, y_pred):
        total_error += np.log(1 + yt) - np.log(1 + yp) ** 2
    return total_error / n

def r2(y_true, y_pred):
    mean = mean_true_value = np.mean(y_true)
    numerator = 0
    denominator = 0
    for yt, yp in zip(y_true, y_pred):
        numerator += (yt - yp) ** 2
        denominator += (yt - mean) ** 2
    
    return 1 - (numerator / denominator)

## Other Advanced Metric
- ### Quadratic Weighted Kappa, also known as QWK. It is also known as Cohen’s kappa
**QWK measures the “agreement” between two “ratings”.**
- The ratings can be any real numbers in 0 to N. And predictions are also in the same range.
- An agreement can be defined as how close these ratings are to each other. So, it’s suitable for a classification problem with N different categories/classes.
- If the agreement is high, the score is closer towards 1.0. In the case of low agreement, the score is close to 0.
- ### Mattew's Correalation Coefficient(MCC)
$$
MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP) \times (TN + FN) \times (FP + TN) \times (TP + FN)}}
$$

In [None]:
from sklearn import metrics 

y_true = [1, 2, 3, 1, 2, 3, 1, 2, 3] 
y_pred = [2, 1, 3, 1, 2, 3, 3, 1, 2] 

metrics.cohen_kappa_score(y_true, y_pred, weights="quadratic") 