# mean absolute error a detailed guid

 How Does MAE Work?
MAE calculates the absolute difference between predicted and actual values for each data point.
It then averages these absolute differences to compute the final error value.
Since absolute values are used, MAE treats all errors equally, regardless of their direction (positive or negative).


 When to Use MAE
✅ When Outliers Are Expected: Since MAE isn’t drastically affected by large errors, it’s a great choice if your data has potential outliers.
✅ When Interpretability Matters: Because MAE is in the same unit as the target variable, it’s easier for stakeholders to understand.
✅ For Balanced Error Sensitivity: MAE treats all errors equally, making it suitable for scenarios where both small and large errors are equally important.

When Not to Use MAE
❌ When Emphasizing Large Errors Is Important: Unlike MSE, MAE doesn’t punish larger errors more heavily. If you want to heavily penalize large deviations, MSE may be better.
❌ In Gradient Descent (Rarely): Since MAE has a non-smooth derivative (its gradient is ±1), MSE often converges faster in gradient-based optimizers.
❌ For Highly Noisy Data: MAE can become unstable if your dataset has extremely noisy features or fluctuating patterns.

In [2]:
import numpy as np
from sklearn.metrics import mean_absolute_error

# Actual and predicted values
y_true = np.array([100, 200, 300, 400])
y_pred = np.array([110, 190, 320, 390])

# Manual calculation
mae_manual = np.mean(np.abs(y_true - y_pred))
print("Manual MAE calculation:", mae_manual)

Manual MAE calculation: 12.5


In [4]:
# using scikitlearn(recomended)
mea_sklearn = mean_absolute_error(y_true , y_pred)
print("sklearn mae calculation:" ,mea_sklearn)

sklearn mae calculation: 12.5


# Key Takeaway
MAE is best suited for regression tasks where outliers are expected, or when you need an easily interpretable error metric. However, it’s not ideal when large errors must be penalized more heavily.

# mean squared error (mse)

Mean Squared Error (MSE) is a widely used loss function for regression tasks. It measures the average of the squared differences between the actual values and the predicted values.

# How Does MSE Work?
1. MSE calculates the squared difference between the predicted and actual values for each data point.

2. It then averages these squared differences to compute the final error value.
3. Since errors are squared, MSE heavily penalizes larger errors — making it more sensitive to outliers than MAE.

# key characters of mse
1. ✅ Heavily Penalizes Large Errors: Because MSE squares the errors, larger deviations have a disproportionate impact on the final error.
2. ✅ Continuous and Differentiable: The MSE loss function has a smooth gradient, making it ideal for gradient descent optimizers.
3. ✅ Mathematical Simplicity: MSE's quadratic form is easy to compute and implement.



#  When to Use MSE
1. ✅ When Large Errors Need Strong Penalization: MSE is ideal when you want your model to prioritize reducing large errors.
2. ✅ In Models Using Gradient Descent: Since MSE has a smooth, continuous gradient (derivative), it behaves predictably in gradient-based optimizers like SGD or Adam.
3. ✅ For Normally Distributed Errors: MSE assumes errors are normally distributed. If this is true, MSE will provide optimal results.

# When NOT to Use MSE
1. ❌ When Outliers Are Present: MSE’s sensitivity to large errors can make your model unstable if your dataset contains outliers. In this case, MAE or Huber Loss may be better.
2. ❌ When You Need Interpretable Results: Since MSE’s output is squared, the resulting error is no longer in the same unit as the original data, making it harder to explain.
3. ❌ For Highly Imbalanced Datasets: MSE may focus too much on the larger values if the data distribution is skewed.

# code

In [7]:
from sklearn.metrics import mean_squared_error
# Actual and predicted values
y_true = np.array([100, 200, 300, 400])
y_pred = np.array([110, 190, 320, 390])

# manual calculation
mse_manuel = np.mean((y_pred-y_true)**2)
print("Manuel MSE calculation: ",mse_manuel)


#using scikit-learn
mse_sklearn = mean_squared_error(y_true , y_pred)
print("scikit mse calculations: " , mse_sklearn)

Manuel MSE calculation:  175.0
scikit mse calculations:  175.0


# log loss

# what is logg loss 
Log Loss, also known as Logarithmic Loss, Binary Cross-Entropy, or Categorical Cross-Entropy, is a loss function used for classification tasks. It measures how far the predicted probabilities are from the actual labels.

In essence, Log Loss penalizes confident but wrong predictions more heavily than less confident wrong predictions. It’s the go-to loss function for probabilistic models like logistic regression and neural networks in classification tasks.

# How Does Log Loss Work?

Log Loss behaves like this:

Correct and Confident Prediction: If the model predicts 1 when the actual class is 1, or 0 when the actual class is 0, the Log Loss is low.
Wrong and Confident Prediction: If the model predicts 1 when the actual class is 0, or 0 when the actual class is 1, the Log Loss is very high.
Uncertain Predictions: Predictions close to 0.5 are treated as less confident, resulting in moderate penalties.

# when to use log loss
✅ For Binary Classification Tasks: Log Loss is the default choice for models like logistic regression, support vector machines (SVM), and binary neural networks.
✅ For Probabilistic Models: If your model outputs probabilities, Log Loss effectively measures the confidence of those predictions.
✅ For Imbalanced Datasets: Log Loss performs well even with imbalanced datasets, as it directly evaluates prediction probabilities.

# when not to use log loss
❌ When Predictions Are Not Probabilistic: Models that output hard labels (e.g., 0 or 1 directly) are unsuitable for Log Loss since it expects probability values.
❌ When Outliers Are Severe: Like MSE, Log Loss heavily penalizes extreme misclassifications.
❌ For Multi-Class Problems Without Modification: Use Categorical Cross-Entropy for multi-class classification.

# Visualizing Log Loss Behavior
Prediction close to 1 for class 1: Low loss (good prediction)
Prediction close to 0 for class 0: Low loss (good prediction)
Prediction close to 0 for class 1: Very high loss (bad prediction)
Prediction close to 1 for class 0: Very high loss (bad prediction)

In [1]:
import numpy as np
from sklearn.metrics import log_loss

# Actual labels (binary classification)
y_true = np.array([1, 0, 1, 0])

# Predicted probabilities
y_pred = np.array([0.9, 0.2, 0.1, 0.8])

# Manual Calculation
log_loss_manual = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
print("Manual Log Loss Calculation:", log_loss_manual)

# Using Scikit-learn (recommended)
log_loss_sklearn = log_loss(y_true, y_pred)
print("Sklearn Log Loss Calculation:", log_loss_sklearn)

Manual Log Loss Calculation: 1.0601317681000455
Sklearn Log Loss Calculation: 1.0601317681000455


#  Log Loss for Multi-Class Classification

In [3]:
from sklearn.metrics import log_loss

# Multi-class example
y_true = [0, 2, 1, 2]  # Actual class labels
y_pred = [
    [0.9, 0.05, 0.05],
    [0.1, 0.1, 0.8],
    [0.3, 0.6, 0.1],
    [0.05, 0.05, 0.9],
]

print("Multi-class Log Loss:", log_loss(y_true, y_pred))

Multi-class Log Loss: 0.23617255159896322
