# Multiple Linear Regression

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

%matplotlib notebook

Here is a data set of sales figures from different stores.

In [None]:
data = pd.read_csv('sales.csv')
data

## Two features

Let's try to predict net sales from two variables: the square footage (size) of the store, and the number of competing stores in the area. Our model will be:

$$
\text{net sales} \approx w_0 + w_1 \times \text{sqft} + w_2 \times \text{competitors}
$$

Do you expect $w_1$ to be positive or negative? What about $w_2$?

Let's plot the data.

In [None]:
data.plot(kind='scatter', x='sq_ft', y='net_sales')

In [None]:
data.plot(kind='scatter', x='competing_stores', y='net_sales')

**Note**: the plot below is interactive. Try clicking and dragging to move the camera.

In [None]:
sq_ft = np.asarray(data['sq_ft'])
competing = np.asarray(data['competing_stores'])
net_sales = np.asarray(data['net_sales'])

%matplotlib notebook
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(sq_ft, competing, net_sales)
plt.xlabel('sq_ft')
plt.ylabel('competing_stores')

Our design matrix is:
    
$$
\begin{pmatrix}
 1 & s_1 & c_1\\
 1 & s_2 & c_2\\
 \vdots & \vdots & \vdots\\
 1 & s_n & c_n
\end{pmatrix}
$$

where $s_i$ is the size of the $i$th store, and $c_n$ is the number of competitors. In code:

In [None]:
X = np.column_stack((
    np.ones_like(sq_ft),
    sq_ft,
    competing
))

Solving the system $X^\intercal X \vec w = X^\intercal \vec y$:

In [None]:
w = np.linalg.solve(X.T @ X, X.T @ net_sales)
w

The function $H$ that we have fit is not a line; it is a plane:

In [None]:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(sq_ft, competing, net_sales)
plt.xlabel('sq_ft')
plt.ylabel('competing_stores')

XX, YY = np.mgrid[1:10:2, 0:16:2]
Z = w[0] + w[1]*XX + w[2]*YY
ax.plot_wireframe(XX, YY, Z, color='black', alpha=.5)

## All features

Let's fit a prediction rule using all of the features.

In [None]:
X = np.column_stack([
    np.ones(data.shape[0]),
    data.iloc[:, 1:].values
])

In [None]:
w = np.linalg.solve(X.T @ X, X.T @ net_sales)
w

In [None]:
feature_names = list(data.columns[1:])

In [None]:
for name, weight in zip(feature_names, w[1:]):
    print(f'{name}:\t{weight:0.2f}')

## Which feature is most "important"?

We should standardize in order to account for the difference in units and scale between the features.

In [None]:
features = data.iloc[:, 1:].values

In [None]:
standardized_features = (features - features.mean(axis=0))/features.std(axis=0)

In [None]:
X = np.column_stack([
    np.ones(data.shape[0]),
    standardized_features
])

In [None]:
w = np.linalg.solve(X.T @ X, X.T @ net_sales)
w

In [None]:
for name, weight in zip(feature_names, w[1:]):
    print(f'{name}:\t{weight:0.2f}')

The district size appears to have the largest effect on the net sales.