# Exercise 1: Linear Regression

The *bodyfat* data set contains several body measurements that can be done using a scale and a tape measure. These can be used to predict the body fat percentage (`body.fat` column). Measuring body fat requires a special apparatus; if our resulting model fits well, we have a low-cost alternative. The measurements are age, weight, height, BMI, neck, chest, abdomen, hip, thigh, knee, ankle, bicep, forearm, and wrist.

In [None]:
from pathlib import Path
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
bodyfat = pd.read_csv(Path('../datasets/bodyfat.csv'))

bodyfat['body.fat'].hist()
bodyfat.head()

## Tasks

The objective in ordinary least squares regression is to minimise the squared error:
$$
\arg⁡\min_{\beta_0, \beta} \quad \frac{1}{2} \sum_{i=1}^n (y_i - \beta_0 - \mathbf{x}_i^\top \beta)^2
$$

1. Create a function that takes a data matrix $\mathbf{X} \in \mathbb{R}^{n \times m}$ of measurements and a vector $\mathbb{y} \in \mathbb{R}^n$ of body fat content. The function should return the **ordinary least squares (OLS)** estimate of the coefficients $\beta$ (including the intercept).

2. Create multiple models that predict the amount of body fat based on one of the 14 features mentioned above, respectively. For each model create a scatter plot which depicts the data and the model.

3. Create a single model that contains all of the 14 features mentioned above. Which features have the highest/lowest coefficients? You can use [np.linalg.solve](https://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.solve.html#numpy.linalg.solve) to solve a system of linear equations of the form $Ax=b$.

**Note**: For numerical stability, it is recommended to standardise the features in $\mathbf{X}$ such that
$$
\sum_{i=1}^n x_{i,j} = 0 \quad \forall j \in \{1,\ldots,m\} \\
\frac{1}{n} \sum_{i=1}^n x_{i,j}^2 = 1 \quad \forall j \in \{1,\ldots,m\} \\
$$
To standardise the $j$-th feature, substract the mean and divide by the standard deviation of the $j$-th column of $\mathbf{X}$. The same standardisation used during training must be applied for prediction.