# Methods to fit an overparameterized linear model

Basic algebra teaches us that a system of equations having N equations with N unknown parameters can be solved-- with a unique solution. But perfect solvability isn't the only situation. Define p as the number of unknown parameters.  Suppose there are N equations. Then there are two other cases:
* underparameterized: When p < N, then there are no solutions (or the system can be re-written as fewer equations).
* overparameterized: When p > N, there are many possible solutions.

Here I focus on the overparameterized case.  How do we choose a single solution?

# Setting up the example

But first, consider the simplest case of N equations with N unknowns, which is one equation with one unknown.  Here is an example:

`10 = 2 x`

That has one solution, namely `x = 5`.  (Note that we solve this by taking the product of 10 by the inverse of 2, which is 1/2.)

Now consider the overparameterized case, which means there are more unknowns than equations, i.e. p > N. In the simplest case, there is one equation with TWO unknowns.  Let's add one more parameter to the above equation

`10 = 2 x + 4 y`

Clearly, there are many possible solutions.


# Libraries

In [1]:
import pandas as pd
import numpy as np

pd.set_option('display.max_rows', 14)

# Example solutions

We consider some possible values of x and determine what y would match those.

In [2]:
x = np.arange(0.5, 1.3, 0.1)

In [3]:
df = pd.DataFrame.from_dict({'x':x, 'y':(10-2*x)/4})

In [4]:
print(df.to_string(index=False))

  x    y
0.5 2.25
0.6 2.20
0.7 2.15
0.8 2.10
0.9 2.05
1.0 2.00
1.1 1.95
1.2 1.90


# Choosing a solution with the min L2 norm

One possible way to choose a solution is to choose the one that minimizes the l2 norm of the parameters, meaning `x^2 + y^2`.  It is possible to solve for this, and that would lead to the Moore-Penrose inverse.  But here le't just look at the L2 values for our possible solutions and choose the minimum.

In [5]:
df["L2"] = df["x"]**2 + df["y"]**2
print(df.to_string(index=False))

  x    y     L2
0.5 2.25 5.3125
0.6 2.20 5.2000
0.7 2.15 5.1125
0.8 2.10 5.0500
0.9 2.05 5.0125
1.0 2.00 5.0000
1.1 1.95 5.0125
1.2 1.90 5.0500


The minimum L2, namely of 5.00, is attained at:
`(x=1.0, y=2.0)`

So let's choose that solution.