### CS4102 - Geometric Foundations of Data Analysis I
Prof. Götz Pfeiffer<br />
School of Mathematical and Statistical Sciences<br />
University of Galway

# Week 2: More Least Squares Fitting

## Answers to last week's Questions

1. Python: see tutorial ...
2. Data Input: DIY
3. Visualization: `matplotlib`
4. Matrix Arithmetic: `numpy`

## 1. Python

* A short **Python Refresher Tutorial** is now in `python.ipynb`.
* Work through it in your own time, write down any comments, suggestions, questions and bring them to next week's class.

## 2. Data

* The jupyter **magic** command `%cat` allows us to list the contents of a (text) file.

In [None]:
%cat production.csv

* Today, we use **basic python** file handling, string manipulation and list processing.

In [None]:
xs, ys = [], []  # parallel assignment
with open('production.csv') as textfile:
    textfile.readline()  # ignore header line
    for line in textfile:  # loop over remaining lines 
        line = line.strip()  # remove whitespace
        i, x, y = line.split(',') # break line into data
        xs.append(int(x))  # convert to int ...
        ys.append(int(y))  # ... and add to list

* Now the $x$- and $y$-values are separate python **lists**.

In [None]:
print(xs)
print(ys)

## 3. Plotting

* matplotlib

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(xs, ys, 'o')

* How to add the straight line?

### 3D Plotting

* For a 3D plot, we import `mplot3d` from the `mpl_toolkits` package.

In [None]:
from mpl_toolkits import mplot3d

* We set up a **figure** and **axes** objects.
* `%matplotlib notebook` magic provides an interactive 3D space inside this notebook.

In [None]:
%matplotlib notebook
fig = plt.figure()
ax = plt.axes(projection="3d")

* Skin care data are contained in a file `cream.csv`

In [None]:
%cat cream.csv

* Process the file as before (`line.strip()` isn't needed)

In [None]:
x1s, x2s, yys = [], [], []
with open('cream.csv') as textfile:
    textfile.readline()
    for line in textfile:
        i, y, x1, x2 = line.split(',')
        x1s.append(int(x1))
        x2s.append(int(x2))
        yys.append(int(y))

* Data are now contained in three python lists

In [None]:
print(x1s)
print(x2s)
print(yys)

* A 3d plot of the data reveals that ...

In [None]:
ax.scatter(x1s, x2s, yys)

## 4. Matrix Algebra

* numpy

In [None]:
import numpy as np

*  Recall normal equation(s): $B = (X^t X)^{-1} (X^t Y)$

In [None]:
Y = np.array(ys)
Y

In [None]:
[[1, x] for x in xs]

In [None]:
X = np.array([[1,x] for x in xs])
X

* `@` is the **matrix multiplication** operator for numpy arrays, and `T` is the **transpose** operator.

In [None]:
XtX = X.T @ X
XtX

In [None]:
XtY = X.T @ Y
XtY

* Matrix **inversion** is provided by the `inv` method in the `np.linalg` sub-package.

In [None]:
XtX1 = np.linalg.inv(XtX)
XtX1

In [None]:
XtX1 @ XtY # matrix multiplication

### 3D: Skin Care

In [None]:
Y = np.array(yys)
Y

In [None]:
X = np.array([[1, x1, x2] for x1, x2 in zip(x1s, x2s)])
X

In [None]:
XtX = X.T @ X
XtX

In [None]:
XtY = X.T @ Y
XtY

In [None]:
XtX1 = np.linalg.inv(XtX)
XtX1

In [None]:
B = XtX1 @ XtY
B

## Exercises

* Work through the python code in the `python.ipynb` notebook, write down any comments, suggestions, questions and bring them to next week's class.

* Find and study the **documentation** for those elements of python in this notebook which are new to you.

* In the calculations above, is `Y` a row vector or a column vector?

* Using the solution ($b_0 = 10$, $b_1 = 2$) of the spare parts example, to draw a the least squares fit as a stright line on top of the datda points plot.

* Following the examples in this worksheet, solve the problems from last week's lecture notes.