# Module 02 : Multivariate Linear Regression

*Summary: Building on what you did on the previous modules you will extend the linear
regression to handle more than one features. Then you will see how to build polynomial
models and how to detect overfitting.*

## Notions of the module
Multivariate linear hypothesis, multivariate linear gradient descent, polynomial models. 
Training and test sets, overfitting.

## Useful Ressources  
  
You are strongly advise to use the following resource:
[Machine Learning MOOC - Stanford](https://www.coursera.org/learn/machine-learning/home/week/2)  
Here are the sections of the MOOC that are relevant for today's exercises: 

### Week 2: 

**Multivariate Linear Regression:**
* Multiple Features (Video + Reading)
* Gradient Descent for Multiple Variables (Video + Reading)
* Gradient Descent in Practice I- Feature Scaling (Video + Reading)
* Gradient Descent in Practice II- Learning Rate (Video + Reading)
* Features and Polynomial Regression (Video + Reading)
* Review (Reading + Quiz)

## General rules
See directly the 42Paris intra subject or the corresponding repository on the [github of 42AI](https://github.com/42-AI/bootcamp_machine-learning)


## Helper

Ensure that you have the right Python interpreter (at least 3.7).

## Note:
```git push --quiet github``` to push on github when working from the vogsphere repository

## Exercise 00 - Multivariate Hypothesis - Iterative Version

In [None]:
from ex00.prediction import simple_predict
import numpy as np

x = np.arange(1,13).reshape((4,3))
print("# Example 0:")
theta1 = np.array([[5],[0],[0],[0]])
pred = simple_predict(x, theta1)
# Ouput:
# array([[5.],[ 5.],[ 5.],[ 5.]])
# Do you understand why y_hat contains only 5’s here?
expected_pred = np.array([[5.],[ 5.],[ 5.],[ 5.]])
print("my prediction:".ljust(25), pred.reshape(1, -1))
print("expected prediction:".ljust(25), pred.reshape(1, -1))


print("\n# Example 1:")
theta2 = np.array([[0],[1],[0],[0]])
pred = simple_predict(x, theta2)
# Output:
# array([[ 1.],[ 4.],[ 7.],[ 10.]])
# Do you understand why y_hat == x[:,0] here?
expected_pred = np.array([[ 1.],[ 4.],[ 7.],[ 10.]])
print("my prediction:".ljust(25), pred.reshape(1, -1))
print("expected prediction:".ljust(25), expected_pred.reshape(1, -1))


print("\n# Example 2:")
theta3 = np.array([[-1.5],[0.6],[2.3],[1.98]])
pred = simple_predict(x, theta3)
# Output:
# array([[ 9.64],[ 24.28],[ 38.92],[ 53.56]])
expected_pred = np.array([[ 9.64],[ 24.28],[ 38.92],[ 53.56]])
print("my prediction:".ljust(25), pred.reshape(1, -1))
print("expected prediction:".ljust(25), expected_pred.reshape(1, -1))


print("\n# Example 3:")
theta4 = np.array([[-3],[1],[2],[3.5]])
pred = simple_predict(x, theta4)
# Output:
# array([[12.5],[ 32. ],[ 51.5],[ 71. ]])
expected_pred = np.array([[12.5],[ 32. ],[ 51.5],[ 71. ]])
print("my prediction:".ljust(25), pred.reshape(1, -1))
print("expected prediction:".ljust(25), expected_pred.reshape(1, -1))

## Exercise 01 - Mulltivariate hypothesis - Vectorized Version

In [2]:
import numpy as np
from ex01.prediction import predict_

x = np.arange(1,13).reshape((4,-3))
# Example 0:
theta1 = np.array([[5],[ 0],[ 0],[ 0]])
pred = predict_(x, theta1)
# Ouput:
# array([[5.],[ 5.],[ 5.],[ 5.]])
# Do you understand why y_hat contains only 5’s here?
expected_pred = np.array([[5.],[ 5.],[ 5.],[ 5.]])
print("my prediction:".ljust(25), pred.reshape(1, -1))
print("expected prediction:".ljust(25), expected_pred.reshape(1, -1))

# Example 1:
theta2 = np.array([[0],[ 1],[ 0],[ 0]])
pred = predict_(x, theta2)
# Output:
# array([[ 1.],[ 4.],[ 7.],[ 10.]])
# Do you understand why y_hat == x[:,0] here?
expected_pred = np.array([[1.],[ 4.],[ 7.],[ 10.]])
print("my prediction:".ljust(25), pred.reshape(1, -1))
print("expected prediction:".ljust(25), expected_pred.reshape(1, -1))

# Example 2:
theta3 = np.array([[-1.5],[ 0.6],[ 2.3],[ 1.98]])
pred = predict_(x, theta3)
# Output:
# array([[ 9.64],[ 24.28],[ 38.92],[ 53.56]])
expected_pred = np.array([[ 9.64],[ 24.28],[ 38.92],[ 53.56]])
print("my prediction:".ljust(25), pred.reshape(1, -1))
print("expected prediction:".ljust(25), expected_pred.reshape(1, -1))

# Example 3:
theta4 = np.array([[-3],[ 1],[ 2],[ 3.5]])
pred = predict_(x, theta4)
# Output:
# array([[12.5],[ 32. ],[ 51.5],[ 71. ]])
expected_pred = np.array([[12.5],[ 32. ],[ 51.5],[ 71. ]])
print("my prediction:".ljust(25), pred.reshape(1, -1))
print("expected prediction:".ljust(25), expected_pred.reshape(1, -1))

my prediction:            [[5. 5. 5. 5.]]
expected prediction:      [[5. 5. 5. 5.]]
my prediction:            [[ 1.  4.  7. 10.]]
expected prediction:      [[ 1.  4.  7. 10.]]
my prediction:            [[ 9.64 24.28 38.92 53.56]]
expected prediction:      [[ 9.64 24.28 38.92 53.56]]
my prediction:            [[12.5 32.  51.5 71. ]]
expected prediction:      [[12.5 32.  51.5 71. ]]


## Exercise 02 - Vectorized Loss Function

In [1]:
import numpy as np
from ex02.loss import loss_

X = np.array([[0],[ 15],[ -9],[ 7],[ 12],[ 3],[ -21]])
Y = np.array([[2],[ 14],[ -13],[ 5],[ 12],[ 4],[ -19]])

print("# Example 0:")
loss = loss_(X, Y)
expected_loss = 2.1428571428571436
print("my loss: ".ljust(20), loss)
print("expected loss: ".ljust(20), expected_loss)

print("# Example 1:")
loss = loss_(X, X)
expected_loss = 0.0
print("my loss: ".ljust(20), loss)
print("expected loss: ".ljust(20), expected_loss)

# Example 0:
my loss:             2.142857142857143
expected loss:       2.1428571428571437
# Example 1:
my loss:             0.0
expected loss:       0.0


## Exercise 03 - Multivariate Gradient

In [1]:
import numpy as np
from ex03.gradient import gradient

x = np.array([[ -6, -7, -9],
              [ 13, -2, 14],
              [ -7, 14, -1],
              [ -8, -4, 6],
              [ -5, -9, 6],
              [ 1, -5, 11],
              [ 9, -11, 8]])
y = np.array([[2],[ 14],[ -13],[ 5],[ 12],[ 4],[ -19]])

print("# Example 0:")
theta1 = np.array([[0],[ 3],[ 0.5],[ -6]])
grad = gradient(x, y, theta1)
# Output:
expected_grad = np.array([[ -33.71428571],[ -37.35714286],[ 183.14285714],[ -393.]])
print("my gradient:".ljust(20), grad.reshape(1, -1))
print("expected gradient:".ljust(20), expected_grad.reshape(1, -1))

print("\n# Example 1:")
theta2 = np.array([[0],[ 0],[ 0],[ 0]])
grad = gradient(x, y, theta2)
# Output:
expected_grad = np.array([[ -0.71428571],[ 0.85714286],[ 23.28571429],[ -26.42857143]])
print("my gradient:".ljust(20), grad.reshape(1, -1))
print("expected gradient:".ljust(20), expected_grad.reshape(1, -1))

# Example 0:
my gradient:         [[ -33.71428571  -37.35714286  183.14285714 -393.        ]]
expected gradient:   [[ -33.71428571  -37.35714286  183.14285714 -393.        ]]

# Example 1:
my gradient:         [[ -0.71428571   0.85714286  23.28571429 -26.42857143]]
expected gradient:   [[ -0.71428571   0.85714286  23.28571429 -26.42857143]]


## Exercise 04 - Multivariate Gradient Descent

In [1]:
import numpy as np
from ex04.fit import fit_
from utils.prediction import predict_


x = np.array([[0.2, 2., 20.], [0.4, 4., 40.], [0.6, 6., 60.], [0.8, 8., 80.]])
y = np.array([[19.6], [-2.8], [-25.2], [-47.6]])
theta = np.array([[42.], [1.], [1.], [1.]])


print("# Example 0:")
nw_theta = fit_(x, y, theta, alpha = 0.0005, max_iter=42000)
# Output:
expected_theta = np.array([[41.99],[0.97], [0.77], [-1.20]])
print("initial value of theta:\n", theta)
print("After training value of theta:".ljust(40), nw_theta.reshape(1, -1))
print("After training expected value of theta:".ljust(40), nw_theta.reshape(1, -1))


print("\n# Example 1:")
pred = predict_(x, nw_theta)
# Output:
# np.array([[19.5992..], [-2.8003..], [-25.1999..], [-47.5996..]])
expected_pred = np.array([[19.5992], [-2.8003], [-25.1999], [-47.5996]])
print("my prediction:".ljust(20), pred.reshape(1,-1))
print("expected prediction:".ljust(20), expected_pred.reshape(1,-1))

# Example 0:
initial value of theta:
 [[42.]
 [ 1.]
 [ 1.]
 [ 1.]]
After training value of theta:           [[41.99888822  0.97792316  0.77923161 -1.20768386]]
After training expected value of theta:  [[41.99888822  0.97792316  0.77923161 -1.20768386]]

# Example 1:
my prediction:       [[ 19.59925884  -2.80037055 -25.19999994 -47.59962933]]
expected prediction: [[ 19.5992  -2.8003 -25.1999 -47.5996]]


## Exercise 05 - Multivariate Linear Regression With Class

In [6]:
import numpy as np
from ex05.mylinearregression import MyLinearRegression as MyLR

X = np.array([[1., 1., 2., 3.], [5., 8., 13., 21.], [34., 55., 89., 144.]])
Y = np.array([[23.], [48.], [218.]])
mylr = MyLR([[1.], [1.], [1.], [1.], [1]])

print("# Example 0:")
pred = mylr.predict_(X)
# Output:
expected_pred = np.array([[8.], [48.], [323.]])
print("my prediction:".ljust(20), pred.reshape(1, -1))
print("expected prediction:".ljust(20), pred.reshape(1, -1))


print("\n# Example 1:")
loss_e = mylr.loss_elem_(X,Y)
# Output:
expected_loss_e = np.array([[225.], [0.], [11025.]])
print("my loss elem:".ljust(20), loss_e.reshape(1, -1))
print("expected loss elem:".ljust(20), expected_loss_e.reshape(1, -1))


print("\n# Example 2:")
loss = mylr.loss_(X,Y)
# Output:
expected_loss = 1875.0
print("my loss:".ljust(15), loss)
print("expected loss:".ljust(15), expected_loss)


print("\n# Example 3:")
mylr.alpha = 1.6e-4
mylr.max_iter = 200000
mylr.fit_(X, Y)
# Output:
expected_thetas = np.array([[18.188], [2.767], [-0.374], [1.392], [0.017]])
print("my theta after training:\n", mylr.thetas.reshape(1, -1))
print("expected theta after training:\n", expected_thetas.reshape(1, -1))


print("\n# Example 4:")
pred = mylr.predict_(X)
# Output:
expected_pred = np.array([[23.417], [47.489], [218.065]])
print("my prediction:\n", pred.reshape(1, -1))
print("expected prediction:\n", expected_pred.reshape(1, -1))


print("\n# Example 5:")
loss_e = mylr.loss_elem_(X,Y)
# Output:
expected_loss_e = np.array([[0.174], [0.260], [0.004]])
print("my loss elem:\n", loss_e.reshape(1, -1))
print("expected loss elem:\n", expected_loss_e.reshape(1, -1))

print("\n# Example 6:")
loss = mylr.loss_(X,Y)
# Output:
expected_loss = 0.0732
print("my loss:".ljust(15), loss)
print("expected loss:".ljust(15), expected_loss)

# Example 0:
my prediction:       [[  8.  48. 323.]]
expected prediction: [[  8.  48. 323.]]

# Example 1:
my loss elem:        [[  225.     0. 11025.]]
expected loss elem:  [[  225.     0. 11025.]]

# Example 2:
my loss:        1875.0
expected loss:  1875.0

# Example 3:
my theta after training:
 [[ 1.81883792e+01  2.76697788e+00 -3.74782024e-01  1.39219585e+00
   1.74138279e-02]]
expected theta after training:
 [[ 1.8188e+01  2.7670e+00 -3.7400e-01  1.3920e+00  1.7000e-02]]

# Example 4:
my prediction:
 [[ 23.41720822  47.48924883 218.06563769]]
expected prediction:
 [[ 23.417  47.489 218.065]]

# Example 5:
my loss elem:
 [[0.1740627  0.26086676 0.00430831]]
expected loss elem:
 [[0.174 0.26  0.004]]

# Example 6:
my loss:        0.07320629376956732
expected loss:  0.0732


## Exercise 06 - Practicing Multivariate Linear Regression

In [9]:
import pandas as pd
import numpy as np
data = pd.read_csv('ex06/spacecraft_data.csv')

[dt == np.float64 for dt in data.dtypes]

  """


[True, True, True, True]

## Exercise 07 - Polynommial models

## Exercise 08 - Let's Train Polynomial Models!

## Exercise 09 - DataSpliter

## Exercise 10 - Machine Learning for Grown-ups: Training and Test Sets