# Model Bias versus Model Variance

| Signs of Bias | Signs of Variance |
| --- | --- |
| Poor or inconsistent intuition | Poor intuition with new & testing data |
| Poor intuition with Training data | Noise in data set |
| Poor intuition compared to similar models | Overfitting |
| Underfitting | Complexity |
| Simplicity | High MSE |

Model Bias ≠ ML/AI Bias, they are different concepts, Model Bias is about relisation error.

| Term | Definition |
| --- | --- |
| Generalisation | A model that generalizes well can make accurate predictions or classifications on data it wasn't trained on | 
| Bias | Bias represents the difference between the average prediction and the true value |
| Variance | Variance measures the variability or inconsistency of predictions made by a model when trained on different subsets of the same data. It reflects how much the model's predictions change if the training data is shuffled or split differently.  |
| Noise | Random or irrelevant data points or variations that obscure meaningful patterns and can negatively impact the accuracy of analysis or machine learning models |

The bias-variance tradeoff in machine learning refers to the fundamental relationship between a model's ability to fit the training data (bias) and its ability to generalize to new, unseen data (variance). It's a balancing act where reducing bias often increases variance, and vice versa. The goal is to find the optimal model complexity that minimizes overall error. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from utils_common import compute_model_output, generate_data, compute_model_output
plt.style.use('ggplot')

In [None]:
#Good Fit/Intuition with low bias and low variance
o = np.sort(generate_data(-10, 10, -10, 10, 8, 0.2))

fig, ax = plt.subplots(1,3, figsize=(15, 5))
ax[0].scatter(o[0], o[1], color='blue', s=100)
ax[0].plot(o[0], o[1], color='red')
x = np.linspace(10, -10, 100)
y = -x**2 + 4*x + 2  
ax[2].plot(x, y, color='red')
x = x + np.random.uniform(-5, 5, size=x.shape)
y = y + np.random.uniform(-10, 10, size=y.shape)
ax[1].scatter(x, y, c='b')
ax[2].scatter(x, y, c='b')
x_lin = np.array([-10,10])
tmp_f_mb = compute_model_output(x_lin, 10, -10,)
ax[1].plot(x_lin, tmp_f_mb, c='r')
ax[0].title.set_text("High Variance/Overfitting")
ax[1].title.set_text("High Bias/Underfitting")
ax[2].title.set_text("Low Bias and Low Variance/Good Fit")
for ax in ax.flat:
    ax.set_xticks([])
    ax.set_xticklabels([])
    ax.set_yticks([])
    ax.set_yticklabels([])

plt.show()

#### Relationship between Bias, Variance and Generalisation (Fit/Intuition)

##### 1. High Variance (Overfitting):

A model with high variance is too complex and learns the training data too well, including the noise.

__Example__: Using a very high-degree polynomial regression model on a relatively small dataset.

__Consequences__: Excellent performance on the training data but poor performance on new data, as the model has memorized the training set instead of learning the underlying patterns.

##### 2. High Bias (Underfitting):

A model with high bias is too simple and doesn't capture the underlying patterns in the data.

__Example__: Using a linear regression model to fit highly non-linear data.

__Consequences__: Poor performance on both training and test data, as the model is unable to represent the true relationship.


##### 3. Low Variance / Low Bias (Good Fit or Intution):

The model is complex enough to capture the underlying patterns but not so complex that it overfits to the noise in the training data. The model should perform equally well with the training data and new/unseen data such as testing data or real world data.

## Inconsistent Fit/Intuition

Solutions ???

In [None]:
m = generate_data(25, 50, 25, 50, 100, 0.9)
n = generate_data(0, 50, 0, 50, 10, 0.01)
w = 1
b = -1

tmp_f_wb = compute_model_output(np.concatenate([m[0], n[0]]), w, b,)
plt.plot(np.concatenate([m[0], n[0]]), tmp_f_wb, c='r', label=f'Prediction: y = {w}x + {b}' )
plt.xlabel("Feature")
plt.ylabel("Target")
plt.scatter(m[0], m[1], color='blue')
plt.scatter(n[0], n[1], color='blue')
plt.show()