# Week 3: Least Squares Fitting (cont.)

## Goals
- Nonlinear fitting
- Computing $r^2$ values

## Nonlinear fitting

We will take the example we looked at in class and find a parabola of best fit. 

Thus, we want to find coefficients for 
$$
    y = b_0 + b_1x_1 + b_2x_2. 
$$

The data for this example is found in `data/nonlinear_ex.csv`.

In [None]:
import pandas as pd 

df = pd.read_csv("data/nonlinear_ex.csv")
print(df)

Alternatively, you can run the following code to get the same data. To turn it "on" remove all of the `#` symbols. 

In [None]:
# df = pd.DataFrame({
#     "x_i" : [2.27, 5.06, 1.45, 5.89, 0.48, -0.22, 1.44, -1.77, 2.45, -1.54, 7.55, 1.76, 5.16, 3.26, 3.23, 0.85],
#     "y_i" : [2.5, -16.13, 4.23, -22.46, 1.37, 0.86, 11.85, -14.71, 9.42, -14.07, -55.62, 4.45, -19.56, -2.79, 5.2, 8.09],
# })

Let's plot our data and verify it is what we expect. 

In [None]:
import matplotlib.pyplot as plt 

fig, ax = plt.subplots()
ax.scatter(df["x_i"], df["y_i"])
ax.grid()

Good. It appears that a parabola might be good enough to describe the general trend of the data. 

Let's try to fit the parabola:
$$
    y = b_0 + b_1x + b_2x^2
$$

From the lecture, we want to *instead* look for the plane of best fit: 
$$
    y = b_0 + b_1x + b_2z,
$$

where $z=x^2$.

This means we need a new columns for $z$, and we know how to get it from $x_i$.

We build a new data frame.

In [None]:
df2 = pd.DataFrame({
    "x_i" : df["x_i"],
    "x_i^2" : [x**2 for x in df["x_i"]],
    "y_i" : df["y_i"],
})
print(df2)

Now we compute the plane of best fit.

First we build the matrix $X$

In [None]:
import numpy as np 

X = np.array([
    [1]*len(df2),
    df2["x_i"],
    df2["x_i^2"],
]).T
print(X)

And then we build the matrix (or column vector) $Y$.

In [None]:
Y = np.array([df2["y_i"]]).T
print(Y)

Recall the formula for the $b_i$ values: 
$$
    X^{\mathrm{t}}XB = X^{\mathrm{t}} Y.
$$

In [None]:
B = np.linalg.inv(X.T @ X) @ X.T @ Y 
print(B)

Therefore the *plane* of best for the data frame `df2` is 
$$
    y = 1.30 + 6.03x - 1.81z.
$$

Since $z=x^2$, the *parabola* of best fit for the data frame `df` is 
$$
    y = 1.30 + 6.03x - 1.81x^2. 
$$

**Note.** We declared the variables $x$ and $z$ to be independent---that is how we got the plane of best fit; however this is not the case, and that is fine. The data in `df2` have a higher redundancy than the data in `df`.

Now let's plot the scatter plot together with our parabola.

First, we get all of our data in order to plot the parabola.

In [None]:
xs = np.linspace(-2, 8, 100)
ys = B[0,0] + B[1,0]*xs + B[2,0]*xs**2

In [None]:
fig, ax = plt.subplots()
ax.scatter(df["x_i"], df["y_i"])
ax.plot(xs, ys, c="orange")
ax.grid()

Not bad! 😃

## Computing $r^2$ values

We need to compute distances in order to get the $r^2$ value, so we will define an *anonymous function*.

#### Anonymous functions

In Python, we can use the word `lambda` to build such functions. 

Let's start off small.

In [None]:
f = lambda x: x + 1

We have defined a function, called `f`, that takes input `x` and returns `x + 1`. 

If we evaluate `f` on $3$, we get $4$:

In [None]:
f(3)

We can evaluate `f` on any object we want, but we may get an error.

In [None]:
f(3.1415)
# f("hello world")

We can define different kinds of functions this way.

For example, the following function takes a string and returns a set of words.

In [None]:
sen_to_words = lambda s: set(s.replace('.', '').replace('!', '').split(' '))

print(sen_to_words("Nitwit! Blubber! Oddment! Tweak!"))
print(sen_to_words("Repetition legitimizes. Repetition legitimizes."))

Now let's create anonymous functions, so that we may easily determine the $r^2$ values.

Recall that 
$$
    r^2 = \dfrac{SSR}{SST} = \dfrac{\|\widehat{Y} - \overline{Y}\|^2}{\| Y - \overline{Y}\|^2}. 
$$

First we will address $\overline{Y}$.

We have that 
$$
    \overline{Y} = \overline{y} \begin{pmatrix} 
        1 \\ 1 \\ \vdots \\ 1 
    \end{pmatrix} ,
$$

where 
$$
    \overline{y} = \dfrac{1}{n}\sum_{i=1}^n y_i. 
$$

We will create a function to turn a column from our `pandas` data frame to a `numpy` column vector. 

We will use this more than once, so we are "factoring it out" from each individual instance.

In [None]:
np_col_vec = lambda d: np.array([d]).T 
print(df["x_i"])
print(np_col_vec(df["x_i"]))

In [None]:
mean = lambda d: sum(d)/len(d)                          # y bar
mean_col = lambda d: np_col_vec([mean(d)]*len(d))        # Y bar 
Y_bar = mean_col(df["y_i"])
print(Y_bar)


Next we need to compute $\widehat{Y}$, which is given by 
$$
    \widehat{y}_i = b_0 + b_1x_{i1} + b_2 x_{i2} + \cdots + b_{p-1}x_{i,p-1}, 
$$
provided we are fitting a hyperplane.

In our above example, we are fitting a parabola, so let's use that for $\widehat{Y}$.

We'll set up an anonymous function to get $\widehat{Y}$ using `map`.

In [None]:
fit_para = lambda x: B[0,0] + B[1,0]*x + B[2,0]*x**2
print(list(map(fit_para, df["x_i"])))
fit_col = lambda fit, d: np_col_vec(list(map(fit, d)))
Y_hat = fit_col(fit_para, df["x_i"])
print(Y_hat)

And finally, our vector $Y$

In [None]:
Y = np_col_vec(df["y_i"])
print(Y)

In order to compute $r^2$, we need to compute the sqaure of the (Euclidean) distance in $\R^n$. 

Let's write another function for that.

In [None]:
dist2 = lambda u, v: sum((w[0])**2 for w in u - v)       # squared distance

**Note:** In the above code we could use just `w` instead of `w[0]`. The former is adding the row vectors, and the latter is adding the zeroth entry of the row vectors. Thus, the output of the former is a vector, and the output of the latter is a scalar. 

Now we compute the $r^2$ value.

In [None]:
r2 = dist2(Y_hat, Y_bar)/dist2(Y, Y_bar)
print(r2)

Our curve fits *very well*.

## Bonus Fun :)

With the same data, let's fit a line and see the $r^2$ valued. 

Let $r^2_c$ be the $r^2$ value for the curve above, so $r^2_c \approx 0.97$, and let $r^2_{\ell}$ be the $r^2$ of the line we will compute. 

**Question.** What do we expect the for the value of $r^2_{\ell}$ compared to $r^2_c$? 
- Similar or quite different? 
- Which should be greater? 

We know all the equations by now, but instead of $b_i$ and the matrix `B`, we will use $\ell_i$ and a matrix `L`.

In [None]:
ones = np_col_vec([1]*len(df))
x_vec = np_col_vec(df["x_i"])
XX = np.concatenate((ones, x_vec), axis=1)
print(X)
L = np.linalg.inv(X.T @ X) @ X.T @ Y
print(L)

We can quickly plot our new line

In [None]:
xxs = np.linspace(-2, 8, 2)
yys = L[0,0] + L[1,0]*xxs
fig, ax = plt.subplots()
ax.scatter(df["x_i"], df["y_i"])
ax.plot(xxs, yys, c="orange")
ax.grid()

We can use the above functions to compute the $r^2$ values now.

In [None]:
fit_line = lambda x: L[0,0] + L[1,0]*x
YY_hat = fit_col(fit_line, df["x_i"])
r2_l = dist2(YY_hat, Y_bar)/dist2(Y, Y_bar)
print(r2_l)

So $r^2_{\ell} \approx 0.32$, which is much worse than the previous $r^2$ value. 