In [1]:
%matplotlib inline

# Linear Regression task - Diamond price training

### Imports

In [2]:
print(__doc__)

import csv
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error

Automatically created module for IPython interactive environment


### CSV reading

In [3]:
# columns 1, 5, 6, 8, 9, 10 have numerical variables
# column 7 contains the target: diamond price
diamonds_data = np.genfromtxt('diamonds.csv', delimiter=",", skip_header=1,
                       usecols=(1, 5, 6, 8, 9, 10, 7))

print(diamonds_data.shape)
print(diamonds_data[:, np.newaxis, 6]) # target

(53940, 7)
[[  326.]
 [  326.]
 [  327.]
 ..., 
 [ 2757.]
 [ 2757.]
 [ 2757.]]


### Feature and target selection

In [4]:
features_df = pd.DataFrame(diamonds_data[:, 0:6])
target_df = pd.DataFrame(diamonds_data[:, np.newaxis, 6])

#print(diamonds_features_df)
#print(diamonds_target_df)

features = diamonds_data[:, 0:6]
target = diamonds_data[:, np.newaxis, 6]

m, n = features.shape

## Linear Regression implementation

### Cost function

The general form of the cost function is

$$
h_\theta\left(x\right) = \theta_0 x_0 + \theta_1 x_1 +
    \theta_2 x_2 + \cdots + \theta_n x_n 
$$

This can be also interpreted as the dot product betweeen the
parameters array and the features array

$$
x =
\begin{bmatrix}
x_0 && x_1 && x_2 && \cdots && x_n
\end{bmatrix},~~
\theta =
\begin{bmatrix}
\theta_0 && \theta_1 && \theta_2&& \cdots && \theta_n
\end{bmatrix},
\\
h_\theta\left(x\right) = \theta \cdot x
$$

This dot product can be calculated with 2 methods in
Numpy:

```python
>>> numpy.dot(parameters, features)
...
>>> parameters @ features
```

The later method will be used, since it is cleaner
than the former.

### Gradient Descent

The main formula for the iteration steps in the
Gradient Descent algorithm is

$$
\theta_j = \theta_j - \alpha \frac{1}{m}
    \sum_{i=1}^{m}{\left(h_\theta\left(x^{\left(i\right)}\right) -
    y^{\left(i\right)}\right) x_j^{\left(i\right)}}
$$

for all $j = 0, 1, ..., n$, where

* $i$ is the index of the data example,
* $j$ is the index of the parameter,
* $m$ is the number of data examples and
* $n$ is the number of parameters.

We define a difference $k$ between the cost and the target,
such that

$$
k^{\left(i\right)} = h_\theta\left(x^{\left(i\right)}\right) -
    y^{\left(i\right)} ~
\implies
\theta_j = \theta_j - \alpha \frac{1}{m}
    \sum_{i=1}^{m}{k^{\left(i\right)} ~ x_j^{\left(i\right)}}
$$

Note that the sum
$\sum_{i=1}^{m}{k^{\left(i\right)} ~ x_j^{\left(i\right)}}$
is also a dot product, and we can simplify even more the
formula to

$$
\theta_j = \theta_j - \alpha \frac{1}{m} \left(
    k \cdot x_j\right)
$$


In [5]:
# this is our model, what we want to have as a final result
params = np.array([10.0, 10.0, 10.0, 10.0, 10.0, 10.0])

lrate = 0.1

for _ in range(10):

    k_diff = [(params @ features[i, :]) - target[i, 0]
              for i in range(m)]

    for j in range(n):
        params[j] = params[j] - lrate / m * (k_diff @ features[:, j])

    print(params)