# Linear regression

We use **sklearn.linear_model.LinearRegression** to demonstrate the difference between **multiple linear regression & multivariate linear regression.**

## - Definitions
**1. Multiple linear regression:** Multiple inputs & ONE output **(y is a scalr)**, $$ y=f(x) $$<br>
**2. Multivariate linear regression:** Multiple inputs & multiple outputs **(y is a vector)**, $$ y_1,y_2,...,y_m=f(x_1,x_2,...,x_n)$$

#### - Reference
[matlab](https://nl.mathworks.com/help/stats/linear-regression.html?s_tid=CRUX_lftnav) <br>
[stackexchange](https://stats.stackexchange.com/a/224234)<br>

In [1]:
from random import random

import pandas as pd
import numpy as np

from sklearn import linear_model

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

## Dummy Dataset

In [2]:
lr = lambda : [random() for i in range(100)]
x = pd.DataFrame({'x1': lr(), 'x2':lr(), 'x3':lr()})

y = x.x1 + x.x2 * 2 + x.x3 * 3 + 4

# Expected regression results: 
# R2 = 1, 
# coefficient=[1,2,3]
# bias=4

## Let's begin with Multiple linear regression

In [3]:
# multiple linear regression
model = linear_model.LinearRegression()
model.fit( x[["x1", "x2", "x3"]], y)

LinearRegression()

In [4]:
# check results
model.score(x[["x1", "x2", "x3"]], y) # R2
model.coef_ # slopes
model.intercept_ # bias

1.0

array([1., 2., 3.])

4.000000000000001

## What happen if we input multiple y in LinearRegression()? 
Let's use y to create two targets, y1 & y2 and see how LinearRegression does.

In [5]:
# multivariate linear regression
y_multiple = pd.DataFrame({"y1":y, "y2":y})
model.fit( x[["x1", "x2", "x3"]], y_multiple)

LinearRegression()

In [6]:
# check results
model.score(x[["x1", "x2", "x3"]], y_multiple)
model.coef_ # a (2,3) matrix
model.intercept_ # a (2,) vector

1.0

array([[1., 2., 3.],
       [1., 2., 3.]])

array([4., 4.])

## Ans: It becomes multivariate linear regression, meaning that LinearRegression fits two y's at once.


Suppose you have k samples, the regression is calculated on both y1 & y2,
$$(x_{11}, x_{21}, x_{31}) -> (y_{11}, y_{21})$$ <br>
$$(x_{12}, x_{22}, x_{32}) -> (y_{12}, y_{22})$$ <br>
$$ ........... $$ <br>
$$(x_{1k}, x_{2k}, x_{3k}) -> (y_{1k}, y_{2k})$$ <br>

### What if I want one regressor for each y? Ans: MultiOutputRegressor

In [7]:
from sklearn.multioutput import MultiOutputRegressor

wrapper = MultiOutputRegressor(model).fit(x[["x1", "x2", "x3"]], y_multiple)

In [8]:
# check results
wrapper.estimators_
wrapper.estimators_[0].score(x[["x1", "x2", "x3"]], y_multiple['y1']) # only ONE y
wrapper.estimators_[0].coef_

[LinearRegression(), LinearRegression()]

1.0

array([1., 2., 3.])