# Multiple Linear Regression

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

%matplotlib notebook

Here is a data set of sales figures from different stores.

In [11]:
data = pd.read_csv('sales.csv')
data

Unnamed: 0,net_sales,sq_ft,inventory,advertising,district_size,competing_stores
0,231.0,3.0,294,8.2,8.2,11
1,156.0,2.2,232,6.9,4.1,12
2,10.0,0.5,149,3.0,4.3,15
3,519.0,5.5,600,12.0,16.1,1
4,437.0,4.4,567,10.6,14.1,5
5,487.0,4.8,571,11.8,12.7,4
6,299.0,3.1,512,8.1,10.1,10
7,195.0,2.5,347,7.7,8.4,12
8,20.0,1.2,212,3.3,2.1,15
9,68.0,0.6,102,4.9,4.7,8


## Two features

Let's try to predict net sales from two variables: the square footage (size) of the store, and the number of competing stores in the area. Our model will be:

$$
\text{net sales} \approx w_0 + w_1 \times \text{sqft} + w_2 \times \text{competitors}
$$

Do you expect $w_1$ to be positive or negative? What about $w_2$?

Let's plot the data.

In [18]:
data.plot(kind='scatter', x='sq_ft', y='net_sales')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1110fe6a0>

In [19]:
data.plot(kind='scatter', x='competing_stores', y='net_sales')

<IPython.core.display.Javascript object>

<matplotlib.axes._subplots.AxesSubplot at 0x1112937b8>

**Note**: the plot below is interactive. Try clicking and dragging to move the camera.

In [20]:
sq_ft = np.asarray(data['sq_ft'])
competing = np.asarray(data['competing_stores'])
net_sales = np.asarray(data['net_sales'])

%matplotlib notebook
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(sq_ft, competing, net_sales)
plt.xlabel('sq_ft')
plt.ylabel('competing_stores')

<IPython.core.display.Javascript object>

Text(0.5, 0, 'competing_stores')

Our design matrix is:
    
$$
\begin{pmatrix}
 1 & s_1 & c_1\\
 1 & s_2 & c_2\\
 \vdots & \vdots & \vdots\\
 1 & s_n & c_n
\end{pmatrix}
$$

where $s_i$ is the size of the $i$th store, and $c_n$ is the number of competitors. In code:

In [21]:
X = np.column_stack((
    np.ones_like(sq_ft),
    sq_ft,
    competing
))

Solving the system $X^\intercal X \vec w = X^\intercal \vec y$:

In [22]:
w = np.linalg.solve(X.T @ X, X.T @ net_sales)
w

array([303.49073761,  45.15092186, -21.5851804 ])

The function $H$ that we have fit is not a line; it is a plane:

In [23]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(sq_ft, competing, net_sales)
plt.xlabel('sq_ft')
plt.ylabel('competing_stores')

XX, YY = np.mgrid[1:10:2, 0:16:2]
Z = w[0] + w[1]*XX + w[2]*YY
ax.plot_wireframe(XX, YY, Z, color='black', alpha=.5)

<IPython.core.display.Javascript object>

<mpl_toolkits.mplot3d.art3d.Line3DCollection at 0x112086668>

## All features

Let's fit a prediction rule using all of the features.

In [30]:
X = np.column_stack([
    np.ones(data.shape[0]),
    data.iloc[:, 1:].values
])

In [36]:
w = np.linalg.solve(X.T @ X, X.T @ net_sales)
w

array([-18.85941416,  16.20157356,   0.17463515,  11.52626903,
        13.5803129 ,  -5.31097141])

In [37]:
feature_names = list(data.columns[1:])

In [49]:
for name, weight in zip(feature_names, w[1:]):
    print(f'{name}:\t{weight:0.2f}')

sq_ft:	16.20
inventory:	0.17
advertising:	11.53
district_size:	13.58
competing_stores:	-5.31


## Which feature is most "important"?

We should standardize in order to account for the difference in units and scale between the features.

In [62]:
features = data.iloc[:, 1:].values

In [63]:
standardized_features = (features - features.mean(axis=0))/features.std(axis=0)

In [64]:
X = np.column_stack([
    np.ones(data.shape[0]),
    standardized_features
])

In [65]:
w = np.linalg.solve(X.T @ X, X.T @ net_sales)
w

array([286.57407407,  31.97302867,  32.76054166,  42.69274551,
        68.49841225, -25.51529781])

In [66]:
for name, weight in zip(feature_names, w[1:]):
    print(f'{name}:\t{weight:0.2f}')

sq_ft:	31.97
inventory:	32.76
advertising:	42.69
district_size:	68.50
competing_stores:	-25.52


The district size appears to have the largest effect on the net sales.

In [71]:
x = np.array([10, 20, -30, 5, 15])

In [78]:
(x - 4) / np.sqrt(np.mean((x - x.mean())**2))

array([ 0.33859959,  0.90293224, -1.918731  ,  0.05643326,  0.62076591])