# Statistical Metrics and Concepts in Machine Learning & Forecasting: Master Lab NotebookThis notebook is your **single source of truth** for key statistical metrics and concepts used in Machine Learning (ML) and forecasting. It covers error/loss metrics, classification metrics, foundational statistical concepts, and other useful metrics. Each concept follows a consistent structure for clarity and ease of reference.We'll use synthetic data for demonstrations. For regression: linear data with noise. For classification: binary labels.**Prerequisites:** Run the following cell to import necessary libraries.

In [None]:
import numpy as npimport tensorflow as tffrom tensorflow.keras import backend as Kfrom tensorflow.keras.losses import MeanSquaredError, MeanAbsoluteErrorfrom tensorflow.keras.metrics import Accuracy, Precision, Recall, AUC, BinaryCrossentropyfrom sklearn.metrics import confusion_matrix, roc_curve, auc, roc_auc_scoreimport matplotlib.pyplot as pltfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import mean_absolute_percentage_error

### Synthetic Data Generation

In [None]:
# Regression datanp.random.seed(42)X_reg = np.random.rand(100, 1) * 10y_reg_true = 2 * X_reg.squeeze() + 1 + np.random.randn(100) * 0.5  # True: y = 2x + 1 + noisey_reg_pred = 2 * X_reg.squeeze() + 1 + np.random.randn(100) * 0.8  # Predictions with more noise# Classification data (binary)X_class = np.random.rand(100, 1)y_class_true = (X_class.squeeze() > 0.5).astype(int)y_class_pred_prob = np.random.rand(100)  # Probabilitiesy_class_pred_binary = (y_class_pred_prob > 0.5).astype(int)  # Binary predictions

## Section 1: Error & Loss Metrics

### 1.1 Mean Squared Error (MSE)#### DefinitionMean Squared Error (MSE) quantifies the average squared difference between predicted and actual values, emphasizing larger errors due to squaring.#### Why It Is Used in ML / ForecastingIn ML regression and forecasting (e.g., time series prediction), MSE serves as a loss function to minimize during training, promoting models that reduce variance in residuals. It's ideal for Gaussian-distributed errors.#### Pros & Cons**Pros**: Differentiable for optimization; penalizes outliers heavily, useful in high-precision forecasting.  **Cons**: Sensitive to outliers; results in squared units, reducing interpretability.#### Equation\[ \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \]#### Code Implementation

In [None]:
# Manual Implementationdef mse_manual(y_true, y_pred):    return np.mean((y_true - y_pred) ** 2)mse_val_manual = mse_manual(y_reg_true, y_reg_pred)print(f"Manual MSE: {mse_val_manual}")# Keras Implementationmse_keras = MeanSquaredError()mse_val_keras = mse_keras(y_reg_true, y_reg_pred).numpy()print(f"Keras MSE: {mse_val_keras}")

#### Comparison / InterpretationBoth yield similar values (~0.74). Lower MSE indicates better fit; compare across models for selection. In forecasting, interpret as average squared deviation from actuals.

### 1.2 Root Mean Squared Error (RMSE)#### DefinitionRoot Mean Squared Error (RMSE) is the square root of MSE, providing error magnitude in the same units as the target variable.#### Why It Is Used in ML / ForecastingRMSE is popular in forecasting (e.g., demand prediction) for its interpretability in original units, while still penalizing large errors like MSE.#### Pros & Cons**Pros**: Same scale as data; good for comparing models.  **Cons**: Still outlier-sensitive; not normalized for scale differences.#### Equation\[ \text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2} \]#### Code Implementation

In [None]:
# Manual Implementationdef rmse_manual(y_true, y_pred):    return np.sqrt(np.mean((y_true - y_pred) ** 2))rmse_val_manual = rmse_manual(y_reg_true, y_reg_pred)print(f"Manual RMSE: {rmse_val_manual}")# Keras Implementation (Custom, since no built-in RMSE loss, but metric possible)def rmse_keras(y_true, y_pred):    return K.sqrt(K.mean(K.square(y_pred - y_true)))rmse_val_keras = rmse_keras(y_reg_true, y_reg_pred).numpy()print(f"Keras RMSE: {rmse_val_keras}")

#### Comparison / InterpretationBoth ~0.86. Interpretable as average error in units of y; useful for error magnitude in forecasting.

### 1.3 Mean Absolute Error (MAE)#### DefinitionMean Absolute Error (MAE) is the average of the absolute differences between predicted and actual values.#### Why It Is Used in ML / ForecastingUsed in regression and forecasting when robustness to outliers is needed, as it treats all errors equally (e.g., in inventory forecasting).#### Pros & Cons**Pros**: Robust to outliers; interpretable in original units.  **Cons**: Not differentiable at zero; less emphasis on large errors.#### Equation\[ \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| \]#### Code Implementation

In [None]:
# Manual Implementationdef mae_manual(y_true, y_pred):    return np.mean(np.abs(y_true - y_pred))mae_val_manual = mae_manual(y_reg_true, y_reg_pred)print(f"Manual MAE: {mae_val_manual}")# Keras Implementationmae_keras = MeanAbsoluteError()mae_val_keras = mae_keras(y_reg_true, y_reg_pred).numpy()print(f"Keras MAE: {mae_val_keras}")

#### Comparison / InterpretationBoth ~0.66. Represents median error; compare to RMSE to gauge outlier impact.

### 1.4 Mean Absolute Percentage Error (MAPE)#### DefinitionMAPE measures the average absolute percentage difference between predicted and actual values.#### Why It Is Used in ML / ForecastingCommon in forecasting for scale-independent error (e.g., sales prediction), allowing comparison across datasets.#### Pros & Cons**Pros**: Scale-independent; intuitive as percentage.  **Cons**: Undefined for zero actuals; biased towards low values.#### Equation\[ \text{MAPE} = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right| \times 100 \]#### Code Implementation

In [None]:
# Manual Implementation (using sklearn)mape_val_manual = mean_absolute_percentage_error(y_reg_true, y_reg_pred) * 100print(f"Manual MAPE: {mape_val_manual}")# Keras Implementation (Custom)def mape_keras(y_true, y_pred):    return K.mean(K.abs((y_true - y_pred) / K.clip(K.abs(y_true), K.epsilon(), None))) * 100mape_val_keras = mape_keras(y_reg_true, y_reg_pred).numpy()print(f"Keras MAPE: {mape_val_keras}")

#### Comparison / InterpretationBoth ~10-15%. Lower is better; useful for relative error in forecasting.

### 1.5 R² (Coefficient of Determination)#### DefinitionR² indicates the proportion of variance in the dependent variable explained by the model.#### Why It Is Used in ML / ForecastingEvaluates model fit in regression/forecasting; higher R² means better explanation of data variability.#### Pros & Cons**Pros**: Normalized (0-1); easy comparison.  **Cons**: Can mislead in non-linear models; doesn't imply causation.#### Equation\[ R^2 = 1 - \frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} \]#### Code Implementation

In [None]:
# Manual Implementationdef r2_manual(y_true, y_pred):    ss_res = np.sum((y_true - y_pred) ** 2)    ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)    return 1 - (ss_res / ss_tot)r2_val_manual = r2_manual(y_reg_true, y_reg_pred)print(f"Manual R²: {r2_val_manual}")# Keras Implementation (Custom)def r2_keras(y_true, y_pred):    ss_res = K.sum(K.square(y_true - y_pred))    ss_tot = K.sum(K.square(y_true - K.mean(y_true)))    return 1 - ss_res / (ss_tot + K.epsilon())r2_val_keras = r2_keras(y_reg_true, y_reg_pred).numpy()print(f"Keras R²: {r2_val_keras}")

#### Comparison / InterpretationBoth ~0.97. 1 is perfect fit; negative means worse than mean predictor.

## Section 2: Classification Metrics

### 2.1 Accuracy#### DefinitionAccuracy is the ratio of correct predictions to total predictions.#### Why It Is Used in ML / ForecastingBasic metric for classification; used in balanced datasets (e.g., sentiment analysis).#### Pros & Cons**Pros**: Simple and intuitive.  **Cons**: Misleading in imbalanced classes.#### Equation\[ \text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{TN} + \text{FP} + \text{FN}} \]#### Code Implementation

In [None]:
# Manual Implementationdef accuracy_manual(y_true, y_pred):    return np.mean(y_true == y_pred)acc_val_manual = accuracy_manual(y_class_true, y_class_pred_binary)print(f"Manual Accuracy: {acc_val_manual}")# Keras Implementationacc_keras = Accuracy()acc_keras.update_state(y_class_true, y_class_pred_binary)acc_val_keras = acc_keras.result().numpy()print(f"Keras Accuracy: {acc_val_keras}")

#### Comparison / InterpretationBoth ~0.5 (random). High accuracy good for balanced data.

### 2.2 Precision#### DefinitionPrecision is the ratio of true positives to predicted positives.#### Why It Is Used in ML / ForecastingMinimizes false positives (e.g., fraud detection).#### Pros & Cons**Pros**: Focuses on positive prediction quality.  **Cons**: Ignores false negatives.#### Equation\[ \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} \]#### Code Implementation

In [None]:
# Manual Implementationdef precision_manual(y_true, y_pred):    tp = np.sum((y_true == 1) & (y_pred == 1))    fp = np.sum((y_true == 0) & (y_pred == 1))    return tp / (tp + fp) if (tp + fp) > 0 else 0prec_val_manual = precision_manual(y_class_true, y_class_pred_binary)print(f"Manual Precision: {prec_val_manual}")# Keras Implementationprec_keras = Precision()prec_keras.update_state(y_class_true, y_class_pred_binary)prec_val_keras = prec_keras.result().numpy()print(f"Keras Precision: {prec_val_keras}")

#### Comparison / InterpretationValues match. High precision reduces false alarms.

### 2.3 Recall (Sensitivity)#### DefinitionRecall is the ratio of true positives to actual positives.#### Why It Is Used in ML / ForecastingCaptures all positives (e.g., disease screening).#### Pros & Cons**Pros**: Minimizes missed positives.  **Cons**: May increase false positives.#### Equation\[ \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} \]#### Code Implementation

In [None]:
# Manual Implementationdef recall_manual(y_true, y_pred):    tp = np.sum((y_true == 1) & (y_pred == 1))    fn = np.sum((y_true == 1) & (y_pred == 0))    return tp / (tp + fn) if (tp + fn) > 0 else 0rec_val_manual = recall_manual(y_class_true, y_class_pred_binary)print(f"Manual Recall: {rec_val_manual}")# Keras Implementationrec_keras = Recall()rec_keras.update_state(y_class_true, y_class_pred_binary)rec_val_keras = rec_keras.result().numpy()print(f"Keras Recall: {rec_val_keras}")

#### Comparison / InterpretationMatching values. Trade-off with precision.

### 2.4 Specificity#### DefinitionSpecificity is the ratio of true negatives to actual negatives.#### Why It Is Used in ML / ForecastingComplements recall by focusing on negative class (e.g., spam filtering).#### Pros & Cons**Pros**: Good for imbalanced negatives.  **Cons**: Ignores positives.#### Equation\[ \text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}} \]#### Code Implementation

In [None]:
# Manual Implementationdef specificity_manual(y_true, y_pred):    tn = np.sum((y_true == 0) & (y_pred == 0))    fp = np.sum((y_true == 0) & (y_pred == 1))    return tn / (tn + fp) if (tn + fp) > 0 else 0spec_val_manual = specificity_manual(y_class_true, y_class_pred_binary)print(f"Manual Specificity: {spec_val_manual}")# Keras Implementation (Using SpecificityAtSensitivity, but for binary, custom)def specificity_keras(y_true, y_pred):    y_pred = K.round(y_pred)    tn = K.sum(K.cast((y_true == 0) & (y_pred == 0), 'float32'))    fp = K.sum(K.cast((y_true == 0) & (y_pred == 1), 'float32'))    return tn / (tn + fp + K.epsilon())spec_val_keras = specificity_keras(y_class_true, y_class_pred_binary).numpy()print(f"Keras Specificity: {spec_val_keras}")

#### Comparison / InterpretationBoth match. High specificity means few false positives.

### 2.5 F1 Score#### DefinitionF1 Score is the harmonic mean of precision and recall.#### Why It Is Used in ML / ForecastingBalances precision and recall in imbalanced datasets.#### Pros & Cons**Pros**: Single metric for trade-off.  **Cons**: Assumes equal weight; less intuitive.#### Equation\[ F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} \]#### Code Implementation

In [None]:
# Manual Implementationdef f1_manual(y_true, y_pred):    prec = precision_manual(y_true, y_pred)    rec = recall_manual(y_true, y_pred)    return 2 * (prec * rec) / (prec + rec) if (prec + rec) > 0 else 0f1_val_manual = f1_manual(y_class_true, y_class_pred_binary)print(f"Manual F1: {f1_val_manual}")# Keras Implementation (Custom)def f1_keras(y_true, y_pred):    y_pred = K.round(y_pred)    tp = K.sum(K.cast(y_true * y_pred, 'float'))    fp = K.sum(K.cast((1 - y_true) * y_pred, 'float'))    fn = K.sum(K.cast(y_true * (1 - y_pred), 'float'))    prec = tp / (tp + fp + K.epsilon())    rec = tp / (tp + fn + K.epsilon())    return 2 * (prec * rec) / (prec + rec + K.epsilon())f1_val_keras = f1_keras(y_class_true, y_class_pred_binary).numpy()print(f"Keras F1: {f1_val_keras}")

#### Comparison / InterpretationMatching. Ideal for uneven classes.

### 2.6 ROC Curve & AUC#### DefinitionROC plots TPR vs FPR; AUC is area under curve, measuring discriminability.#### Why It Is Used in ML / ForecastingThreshold-independent evaluation for binary classification.#### Pros & Cons**Pros**: Handles imbalance; probabilistic.  **Cons**: Computation heavy for large data.#### EquationAUC computed via integral or trapezoidal rule.#### Code Implementation

In [None]:
# Manual Implementation (sklearn)auc_manual = roc_auc_score(y_class_true, y_class_pred_prob)fpr, tpr, _ = roc_curve(y_class_true, y_class_pred_prob)print(f"Manual AUC: {auc_manual}")# Plot ROCplt.plot(fpr, tpr)plt.show()# Keras Implementationauc_keras = AUC()auc_keras.update_state(y_class_true, y_class_pred_prob)auc_val_keras = auc_keras.result().numpy()print(f"Keras AUC: {auc_val_keras}")

#### Comparison / InterpretationBoth ~0.5 (random). 1 is perfect separation.

## Section 3: Statistical Concepts

### 3.1 Bias vs Variance#### DefinitionBias is error from simplistic assumptions; Variance is error from sensitivity to training data.#### Why It Is Used in ML / ForecastingUnderpins trade-off in model complexity; guides regularization.#### Pros & Cons**Pros**: Conceptual framework for optimization.  **Cons**: Hard to measure separately.#### Equation\[ \text{Error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible Error} \]#### Code Implementation

In [None]:
# No direct code, but demonstration via overfitting/underfitting below

#### Comparison / InterpretationBalance for generalization.

### 3.2 Overfitting vs Underfitting (with visualization)#### DefinitionOverfitting: Model too complex, fits noise. Underfitting: Too simple, misses patterns.#### Why It Is Used in ML / ForecastingIdentifies poor generalization; use cross-validation to detect.#### Pros & Cons**Pros**: Guides model selection.  **Cons**: Subjective without metrics.#### EquationNo specific equation; monitored via train/test error gap.#### Code Implementation

In [None]:
# Simple visualizationfrom sklearn.linear_model import LinearRegressionfrom sklearn.preprocessing import PolynomialFeaturesfrom sklearn.pipeline import make_pipelineX_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg_true, test_size=0.2)# Underfit (degree 1)model_under = LinearRegression().fit(X_train, y_train)y_pred_under = model_under.predict(X_test)# Overfit (degree 15)model_over = make_pipeline(PolynomialFeatures(15), LinearRegression()).fit(X_train, y_train)y_pred_over = model_over.predict(X_test)# Plotplt.scatter(X_reg, y_reg_true)plt.plot(X_test, y_pred_under, label='Underfit')plt.plot(X_test, y_pred_over, label='Overfit')plt.legend()plt.show()

#### Comparison / InterpretationVisualize train vs test error; aim for balance.

### 3.3 Cross-Entropy Loss (Log Loss)#### DefinitionMeasures difference between predicted probabilities and true labels.#### Why It Is Used in ML / ForecastingStandard loss for classification; encourages calibrated probabilities.#### Pros & Cons**Pros**: Probabilistic interpretation.  **Cons**: Sensitive to extremes.#### Equation\[ \text{CE} = - \sum_{i=1}^{n} y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \]#### Code Implementation

In [None]:
# Manual Implementationdef ce_manual(y_true, y_pred):    y_pred = np.clip(y_pred, 1e-15, 1 - 1e-15)    return -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))ce_val_manual = ce_manual(y_class_true, y_class_pred_prob)print(f"Manual CE: {ce_val_manual}")# Keras Implementationce_keras = BinaryCrossentropy()ce_val_keras = ce_keras(y_class_true, y_class_pred_prob).numpy()print(f"Keras CE: {ce_val_keras}")

#### Comparison / InterpretationLower is better; used in logistic regression.

### 3.4 KL Divergence#### DefinitionMeasures how one probability distribution diverges from another.#### Why It Is Used in ML / ForecastingUsed in variational autoencoders, GANs for distribution matching.#### Pros & Cons**Pros**: Asymmetric, useful for approximation.  **Cons**: Not a true metric (non-symmetric).#### Equation\[ \text{KL}(P || Q) = \sum P(x) \log \frac{P(x)}{Q(x)} \]#### Code Implementation

In [None]:
# Manual Implementation (for discrete)def kl_manual(p, q):    p = np.clip(p, 1e-15, 1)    q = np.clip(q, 1e-15, 1)    return np.sum(p * np.log(p / q))# Example distributionsp = np.array([0.1, 0.4, 0.5])q = np.array([0.2, 0.3, 0.5])kl_val_manual = kl_manual(p, q)print(f"Manual KL: {kl_val_manual}")# Keras Implementationdef kl_keras(p, q):    p = K.clip(p, K.epsilon(), 1)    q = K.clip(q, K.epsilon(), 1)    return K.sum(p * K.log(p / q), axis=-1)kl_val_keras = kl_keras(p, q).numpy()print(f"Keras KL: {kl_val_keras}")

#### Comparison / Interpretation0 means identical distributions.

## Section 4: Other Useful ML Metrics

### 4.1 Huber Loss (robust regression)#### DefinitionHybrid of MSE and MAE, less sensitive to outliers.#### Why It Is Used in ML / ForecastingRobust loss for regression with outliers (e.g., financial data).#### Pros & Cons**Pros**: Balances sensitivity.  **Cons**: Requires delta tuning.#### Equation\[ \text{Huber} = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & |y - \hat{y}| \leq \delta \\ \delta |y - \hat{y}| - \frac{1}{2}\delta^2 & \text{otherwise} \end{cases} \]#### Code Implementation

In [None]:
# Manual Implementationdef huber_manual(y_true, y_pred, delta=1.0):    error = y_true - y_pred    is_small = np.abs(error) <= delta    squared = 0.5 * (error ** 2)    linear = delta * np.abs(error) - 0.5 * (delta ** 2)    return np.mean(np.where(is_small, squared, linear))huber_val_manual = huber_manual(y_reg_true, y_reg_pred)print(f"Manual Huber: {huber_val_manual}")# Keras Implementationfrom tensorflow.keras.losses import Huberhuber_keras = Huber()huber_val_keras = huber_keras(y_reg_true, y_reg_pred).numpy()print(f"Keras Huber: {huber_val_keras}")

#### Comparison / InterpretationSimilar to MAE for large errors, MSE for small.

### 4.2 Cosine Similarity (NLP / embeddings)#### DefinitionMeasures angle between vectors, for similarity.#### Why It Is Used in ML / ForecastingIn NLP for semantic similarity; embeddings comparison.#### Pros & Cons**Pros**: Magnitude-independent.  **Cons**: Ignores magnitude.#### Equation\[ \text{Cosine} = \frac{A \cdot B}{\|A\| \|B\|} \]#### Code Implementation

In [None]:
# Manual Implementationdef cosine_manual(a, b):    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))# Example vectorsa = np.array([1, 2, 3])b = np.array([4, 5, 6])cos_val_manual = cosine_manual(a, b)print(f"Manual Cosine: {cos_val_manual}")# Keras Implementationdef cosine_keras(a, b):    return K.dot(a, b) / (K.linalg.norm(a) * K.linalg.norm(b))cos_val_keras = cosine_keras(tf.constant(a, dtype='float32'), tf.constant(b, dtype='float32')).numpy()print(f"Keras Cosine: {cos_val_keras}")

#### Comparison / Interpretation1 is identical direction; 0 orthogonal.