# Multiple Linear Regression Backward Elimination

As explained in the previous tutorial, Backward Elimination is irrelevant in Python because the Scikit-Learn library automatically takes care of selecting the statistically significant features when training the model to make accurate predictions.

However, if one does really want to learn how to manually implement Backward Elimination in Python and identify the most statistically significant features, please find the solution in the link below in old videos on how to implement Backward Elimination in Python:

<https://www.dropbox.com/sh/pknk0g9yu4z06u7/AADSTzieYEMfs1HHxKHt9j1ba?dl=0>

These are old videos made on Spyder, but the dataset and the code are the same as in the previous video lectures of the section on Multiple Linear Regression, except that the lecturer had manually removed the first column to avoid the Dummy Variable Trap with one line of code below.

## Import Libraries

In [7]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Import Dataset

In [8]:
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [9]:
print(*X[:25], sep='\n')

[165349.2 136897.8 471784.1 'New York']
[162597.7 151377.59 443898.53 'California']
[153441.51 101145.55 407934.54 'Florida']
[144372.41 118671.85 383199.62 'New York']
[142107.34 91391.77 366168.42 'Florida']
[131876.9 99814.71 362861.36 'New York']
[134615.46 147198.87 127716.82 'California']
[130298.13 145530.06 323876.68 'Florida']
[120542.52 148718.95 311613.29 'New York']
[123334.88 108679.17 304981.62 'California']
[101913.08 110594.11 229160.95 'Florida']
[100671.96 91790.61 249744.55 'California']
[93863.75 127320.38 249839.44 'Florida']
[91992.39 135495.07 252664.93 'California']
[119943.24 156547.42 256512.92 'Florida']
[114523.61 122616.84 261776.23 'New York']
[78013.11 121597.55 264346.06 'California']
[94657.16 145077.58 282574.31 'New York']
[91749.16 114175.79 294919.57 'Florida']
[86419.7 153514.11 0.0 'New York']
[76253.86 113867.3 298664.47 'California']
[78389.47 153773.43 299737.29 'New York']
[73994.56 122782.75 303319.26 'Florida']
[67532.53 105751.03 304768.73 

In [10]:
print(*y[:25], sep='\n')

192261.83
191792.06
191050.39
182901.99
166187.94
156991.12
156122.51
155752.6
152211.77
149759.96
146121.95
144259.4
141585.52
134307.35
132602.65
129917.04
126992.93
125370.37
124266.9
122776.86
118474.03
111313.02
110352.25
108733.99
108552.04


## Encode Categorical Data

* Apply one hot encoding (convert categories to bit values) to state column and do not transform remaining columns as remainder is defined as `passthrough`
* Transform the state column using the `ColumnTransformer` class and turn the results into an array

In [11]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

ct = ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough')
X = np.array(ct.fit_transform(X))

## Avoiding the Dummy Variable Trap

In [12]:
X = X[:, 1:]

Keep this for this Backward Elimination implementation, but keep in mind that in general one does not have to manually remove a dummy variable column because Scikit-Learn takes care of it.

And also, please find the full implementation below for this Backward Elimination technique using Multiple Linear Regression.

In [13]:
print(*X[:25], sep='\n')

[0.0 1.0 165349.2 136897.8 471784.1]
[0.0 0.0 162597.7 151377.59 443898.53]
[1.0 0.0 153441.51 101145.55 407934.54]
[0.0 1.0 144372.41 118671.85 383199.62]
[1.0 0.0 142107.34 91391.77 366168.42]
[0.0 1.0 131876.9 99814.71 362861.36]
[0.0 0.0 134615.46 147198.87 127716.82]
[1.0 0.0 130298.13 145530.06 323876.68]
[0.0 1.0 120542.52 148718.95 311613.29]
[0.0 0.0 123334.88 108679.17 304981.62]
[1.0 0.0 101913.08 110594.11 229160.95]
[0.0 0.0 100671.96 91790.61 249744.55]
[1.0 0.0 93863.75 127320.38 249839.44]
[0.0 0.0 91992.39 135495.07 252664.93]
[1.0 0.0 119943.24 156547.42 256512.92]
[0.0 1.0 114523.61 122616.84 261776.23]
[0.0 0.0 78013.11 121597.55 264346.06]
[0.0 1.0 94657.16 145077.58 282574.31]
[1.0 0.0 91749.16 114175.79 294919.57]
[0.0 1.0 86419.7 153514.11 0.0]
[0.0 0.0 76253.86 113867.3 298664.47]
[0.0 1.0 78389.47 153773.43 299737.29]
[1.0 0.0 73994.56 122782.75 303319.26]
[1.0 0.0 67532.53 105751.03 304768.73]
[0.0 1.0 77044.01 99281.34 140574.81]


## Split Dataset into Training Set and Test Set

In [14]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

## Train Multiple Linear Regression Model on Training Set

In [15]:
from sklearn.linear_model import LinearRegression

regressor = LinearRegression()
regressor.fit(X_train, y_train)

## Predict Test Set Results

In [16]:
y_pred = regressor.predict(X_test)

In [17]:
np.set_printoptions(precision=2)
print(np.concatenate((y_pred.reshape(len(y_pred), 1), y_test.reshape(len(y_test), 1)), 1))

[[103015.2  103282.38]
 [132582.28 144259.4 ]
 [132447.74 146121.95]
 [ 71976.1   77798.83]
 [178537.48 191050.39]
 [116161.24 105008.31]
 [ 67851.69  81229.06]
 [ 98791.73  97483.56]
 [113969.44 110352.25]
 [167921.07 166187.94]]


## Build Optimal Model using Backward Elimination

In [18]:
import statsmodels.api as sm

X = np.append(arr=np.ones((50, 1)).astype(int), values=X, axis=1)
X_opt = X[:, [0, 1, 2, 3, 4, 5]]
X_opt = X_opt.astype(np.float64)
regressor_OLS = sm.OLS(endog=y, exog=X_opt).fit()
regressor_OLS.summary()
X_opt = X[:, [0, 1, 3, 4, 5]]
X_opt = X_opt.astype(np.float64)
regressor_OLS = sm.OLS(endog=y, exog=X_opt).fit()
regressor_OLS.summary()
X_opt = X[:, [0, 3, 4, 5]]
X_opt = X_opt.astype(np.float64)
regressor_OLS = sm.OLS(endog=y, exog=X_opt).fit()
regressor_OLS.summary()
X_opt = X[:, [0, 3, 5]]
X_opt = X_opt.astype(np.float64)
regressor_OLS = sm.OLS(endog=y, exog=X_opt).fit()
regressor_OLS.summary()
X_opt = X[:, [0, 3]]
X_opt = X_opt.astype(np.float64)
regressor_OLS = sm.OLS(endog=y, exog=X_opt).fit()
regressor_OLS.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.947
Model:,OLS,Adj. R-squared:,0.945
Method:,Least Squares,F-statistic:,849.8
Date:,"Thu, 12 Jun 2025",Prob (F-statistic):,3.5000000000000004e-32
Time:,20:22:27,Log-Likelihood:,-527.44
No. Observations:,50,AIC:,1059.0
Df Residuals:,48,BIC:,1063.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,4.903e+04,2537.897,19.320,0.000,4.39e+04,5.41e+04
x1,0.8543,0.029,29.151,0.000,0.795,0.913

0,1,2,3
Omnibus:,13.727,Durbin-Watson:,1.116
Prob(Omnibus):,0.001,Jarque-Bera (JB):,18.536
Skew:,-0.911,Prob(JB):,9.44e-05
Kurtosis:,5.361,Cond. No.,165000.0
