# Multiple Linear Regression

In multiple linear regression tasks, the action function is mutiple regression function.

### data preparation
Here we prepare a dataset $\{ (\boldsymbol{x}_{i},y_{i})\},i=1,\dots,m$, where $\boldsymbol{x}_{i}=(x_{i1},\dots,x_{id})^{\top}\in\mathbb{R}^{d}$ and $y_{i}\in\mathbb{R}$.<br>
In the training set, we prepare $m=100$ samples with feature dimension $d=2$. Then we create a matrix $X_{train}=(x_{1}^{T},\dots,x_{m}^{T})^{T}\in\mathbb{R}^{m\times d}$<br>
We set the real parameters $\boldsymbol{w}_{real}=(1,2)^{T} ,b=4,\boldsymbol{b}=(b,\dots,b)^{T}$ for testing.<br>
And $\boldsymbol{Y} = \boldsymbol{X}_{train}\boldsymbol{w}_{real}+\boldsymbol{b}+Noise$ is the labels in the dataset.

In [3]:
import numpy as np
import os
def create_folder(folder_path):
    if not os.path.exists(folder_path):
        # 如果文件夹不存在，则创建它
        os.makedirs(folder_path)
        print("文件夹已创建")
    else:
        print("文件夹已存在，不需要创建")

In [4]:
create_folder("./data")

文件夹已创建


In [29]:
feature_dim = 2
train_m = 100
X_train = np.random.rand(train_m,feature_dim)#m*d
w_real = np.array([[1], [2]])
b = np.array([[4]])
Y = X_train@w_real+ b + np.random.randn(train_m,1)*0.01
X_train.shape, Y.shape, w_real.shape, b.shape

((100, 2), (100, 1), (2, 1), (1, 1))

### the rule of vector derivative
Before deriving the optimal paramter $w$, we firstly definne the rules for vector derivative<br>
if we have a scalar $F=F(\boldsymbol{x})$ where the independent varible $\boldsymbol{x}\in\mathbb{R}^{d}$, then the derivative formula of $F$ with respect to $\boldsymbol{x}$ is as follows:<br>
$$ \frac{\partial F}{\partial \boldsymbol{x}}= [\frac{\partial F}{\partial x_{1}},\dots,\frac{\partial F}{\partial x_{d}}]^{T}\in\mathbb{R}^{d} $$
if we have a vector $\boldsymbol{F}=\boldsymbol{F}(\boldsymbol{x})=[F_{1}(\boldsymbol{x}),\dots,F_{1}(\boldsymbol{x})]^{T}\in\mathbb{R}^{m}$ where the independent varible $\boldsymbol{x}\in\mathbb{R}^{d}$, then the derivative formula of $\boldsymbol{F}$ with respect to $\boldsymbol{x}$ is as follows:<br>
$$ \frac{\partial \boldsymbol{F}}{\partial \boldsymbol{x}}= [\frac{\partial F_{1}}{\partial \boldsymbol{x}},\dots,\frac{\partial F_{m}}{\partial \boldsymbol{x}}]\in\mathbb{R}^{d\times m}$$

### derivation of optimal parameter $w$
Now we can derive the optimal parameter $w$ for multiple linear regression.<br>
We use the least square loss principle: the residual sum of squares (RSS) is minimized.<br>
$$ RSS(\bar{\boldsymbol{w}})=||\hat{\boldsymbol{y}}-\boldsymbol{y}||^{2}_{2}=||\boldsymbol{y}-\bar{\boldsymbol{X}}\bar{\boldsymbol{w}}||^{2}_{2}$$
$$ \frac{\partial RSS(\bar{\boldsymbol{w}})}{\partial \bar{\boldsymbol{w}}}= \frac{\partial \{ (\boldsymbol{y}-\bar{\boldsymbol{X}}\bar{\boldsymbol{w}})^{T}(\boldsymbol{y}-\bar{\boldsymbol{X}}\bar{\boldsymbol{w}})\} }{\partial \bar{\boldsymbol{w}}} = 2\bar{\boldsymbol{X}}^{T}(\bar{\boldsymbol{X}}\bar{\boldsymbol{w}}-\boldsymbol{y}) $$
Here, $\bar{\boldsymbol{w}}=(\boldsymbol{w},b)\in\mathbb{R}^{d+1}$ represents the expanded vector , and $\bar{\boldsymbol{X}}\in\mathbb{R}^{m \times (d+1) }$ is the expanded matrix with $1$ as the **last column**.<br>

$$\frac{\partial RSS(\bar{\boldsymbol{w}})}{\partial \bar{\boldsymbol{w}}}= 0 \Rightarrow \bar{\boldsymbol{w}}=(\boldsymbol{X}^{T} \boldsymbol{X})^{-1}\boldsymbol{X}^{T}\boldsymbol{y} $$

In [30]:
X_bar = np.hstack((X_train,np.ones((m,1))))

In [36]:
w = (np.linalg.inv(X_bar.T@X_bar))@X_bar.T@Y
w

array([[0.99906013],
       [2.00311505],
       [3.99771095]])

From this result we can see that the optimal parameter $\bar{\boldsymbol{w}}$ is very close to the real parameter $\bar{\boldsymbol{w}}_{real}=(w_{real},b)=((1,2),3)$.