Short Version (just read this part)

Given some datapoints, we can model the data with a regression model.
However, when choosing a regression model, one may under- or overfit the data.
Two key quantities that determine the error of a model, alongside any intrinsic error or noise in the data itself is the variance and the bias of the model.
Variance measure the spread of the fit function, while bias measures the deviation of the model from the true function that describes the data.
Variance is typical small for low order polynomial fits, while the bias is often larger for very simplistic models e.g. low order polynomials.

Bias and Variance

Given some datapoints, we can model the data with a regression model.
However, when choosing a regression model, one may under- or overfit the data.
Two key quantities that determine the error of a model, alongside any intrinsic error or noise in the data itself is the variance and the bias of the model.

Variance:
Given an expectation value $<f(D)>$, where D is a dataset and f is a model.
The variance of the model for a given dataset corresponds to ${\rm Var}(f)=<f^2(D)> - <f(D)>^2$.
A high variance indicates overfitting as the model highly adjusts to variations in a dataset.

Bias is a measure of our assumptions about the expectation about how the dataset may be modeled best.
Let us assume we know the perfect model $\hat{f}$.
Then the bias of the model corresponds to ${\rm Bias}(f, \hat{f})<\hat{f}> - <f>$.

Then the total error of a model can be approximated as intrinsic error + Var + Bias$^2$.
In general we want to keep said error as low as possible.
The intrinsic error is related to the lack of a perfect model in general and related to the incompleteness and variability of the dataset in question.

Example:
Point of Data

In [34]:
import numpy as np
x = [1, 2, 3, 4, 5, 6, 7]
y = np.exp(x)

Model exp and true model exp:
In that case bias will vanish for this dataset obviously, but variance will not.


In [35]:
#assumes \hat{f} and f to be exponential
Erx = 0
Erx2 = 0
for v in x:
    Erx2 += np.exp(v) ** 2
    Erx += np.exp(v)
Erx2 /= len(x)
Erx /= len(x)
T1 = Erx2 - Erx ** 2
print('Var ', Erx2 - Erx ** 2)
#Bias vanishes
#Var = Err

Var  137379.83489128752


For other models bias will not vanish but variance may be smaller. Example $x^2$

In [36]:
#assumes \hat{f} as exp(x) and f as a * x^2
Er = 0
Erx2 = 0
Erx4 = 0
#Cal coef a
s1 = 0
s2 = 0
for v in x:
    s1 += v ** 2 * np.exp(v)
    s2 += v ** 4
a = s1 / s2
for v in x:
    Er += np.exp(v)
    Erx2 += v ** 2
    Erx4 += v ** 4
Er /= len(x)
Erx2 /= len(x)
Erx2 *= a
Erx4 /= len(x)
Erx4 *= a * a
print('Bias ', Er - Erx2)
print('Variance ', Erx4 - Erx2 ** 2)
print('Err ', (Er - Erx2) ** 2 + Erx4 - Erx2 ** 2)
T2 = (Er - Erx2) ** 2 + Erx4 - Erx2 ** 2
print('Exp error over x^2 error: ', T1/T2)
#1/(len(x))*(np.sum(pow(x, 2) ** 2) - 1/(len(x)) * np.sum(pow(x,2)) ** 2)

Bias  -64.86020575727244
Variance  65416.98062232368
Err  69623.8269131994
Exp error over x^2 error:  1.973172705122344


Bad example!

Works for small x_i but does no longer work for several large x_i and becomes progressively worse

Pitfall