In [11]:
%use dataframe
%use lets-plot
%use kotlin-dl

# MSE - Mean Square error

## What is MSE
Is a commonly used metric to measure the average of the squares of the errors--tha is, the difference between the actual values and the predicated values in a model

### 📌 Formula:

$$
MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2
$$

Where:
* $y_i$ is the actual (true) value.
* $\hat{y}$ is the predicted value.
* $n$ is the total number of data points.

### 📊 Example:

| Actual ($y$) | Predicted ($\hat{y}$) |
|--------------|-----------------------|
| 3            | 2.5                   |
| 5            | 4.8                   |
| 2            | 2.2                   |


Calculate MSE:
$MSE = \frac{ (3-2.4)^2 + (5-4.8)^2 + (2-2.2)^2}{3} = \frac {0.25 + 0.04 + 0.04}{3} = \frac{0.33}{3} = 0.11$

### How it's show in graph?
If you keep calculating accordingly with the input values that you have, you'll see a graph similar like this:
![MSE](./images/mse.loss.function.behaviour.png)


### Let's do some practice

Let's create the function: $MSE = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$

In [8]:
fun meanSquaredError(actual: List<Double>, predicted: List<Double>): Double {
    require(actual.size == predicted.size) { "Both lists must have the same size" }
    return actual.zip(predicted).map { (a, b) -> abs(a - b).pow(2.0) }.sum() / predicted.size
}

And the values on below it's the values simulation

In [14]:
// Your actual and predicted values
val actual = listOf(-0.3, -0.2, 5.1, 50.0)
val predicted = listOf(0.2, -0.8, 4.4, 35.0)
val index = actual.indices.map { it.toDouble() }.toList() // Create an index for plotting

val output = meanSquaredError(actual, predicted) // the output should be 56.525
val outputFormatted = "MSE: %.3f".format(output)

The output should be: 56.525

Now let's add it in a graph

In [13]:
val df = dataFrameOf(
    "Index" to index,
    "Actual" to actual,
    "Predicted" to predicted,
    "MSE" to actual.zip(predicted).map { (a, b) -> abs(a - b).pow(2.0) }
)
df

Index,Actual,Predicted,MSE
0.0,-0.3,0.2,0.25
1.0,-0.2,-0.8,0.36
2.0,5.1,4.4,0.49
3.0,50.0,35.0,225.0


In [15]:
val plot = letsPlot(df.toMap()) +
        geomPoint(color = "blue", size = 3) {
            x = "Index"
            y = "Actual"
        } +
        geomLine(color = "blue", size = 1.0, linetype = "dashed") { // Optional: connect actual points
            x = "Index"
            y = "Actual"
        } +
        geomPoint(color = "red", size = 3, shape = 4) { // Shape 4 is 'x'
            x = "Index"
            y = "Predicted"
        } +
        geomLine(color = "red", size = 1.0) { // Optional: connect predicted points
            x = "Index"
            y = "Predicted"
        } +
        geomSegment(color = "orange", alpha = 0.6, size = 0.8) {
            x = "Index"
            y = "Actual"
            xend = "Index"
            yend = "Predicted"
        } +
        labs(
            title = "Actual vs. Predicted Values",
            x = "Data Point Index",
            y = "Value"
        ) +
        // Add text annotation for MSE
        geomText(
            x = actual.size * 0.8, // Position the text towards the right
            y = (actual.maxOrNull() ?: 0.0) + 1.0, // Position slightly above max value
            label = outputFormatted,
            size = 4.0,
            color = "darkgreen"
        ) +
        theme().legendPositionNone() // Hide legend if not needed

plot.show()

This type of function is useful when you want to penalize large errors more heavily(since we are squaring them), you should use it when your outliers are important to detect and avoid since it's sensible for large errors.

Take a look in  summarization table for differences between MSE and MAE

| Metric | Sensitive to Outliers | Penalizes Large Errors | Easy to Interpret | Smooth for Training |
| ------ | --------------------- | ---------------------- | ----------------- | ------------------- |
| MAE    | No                    | No                     | Yes               | No (non-smooth)     |
| MSE    | Yes                   | Yes (quadratic)        | Somewhat          | Yes (smooth loss)   |
