## Multiple Linear Regression

$
y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + + \beta_n X_n
$

- Anscombe's quartet
- Multiple Linear Regression assumptions
- Dummy Variable Trap
- P-value
- Backward Elimination
- Statistical Significance ($\alpha$) or SL


### Importing the libraries


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### Importing the dataset


In [None]:
dataset = pd.read_csv('50_Startups.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [None]:
print(X)

### Encoding categorical data


In [None]:
# one hot encoding
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(
    transformers=[('encoder', OneHotEncoder(), [3])], remainder='passthrough')
X = np.array(ct.fit_transform(X))
print(X)

### Splitting the dataset into the Training set and Test set


In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=0)

### Training the Multiple Linear Regression model on the Training set


In [None]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)


### Predicting the Test set results


In [None]:
# try other models and compare the performance
y_pred = regressor.predict(X_test)
np.set_printoptions(precision=2)

array_pred = y_pred.reshape(len(y_pred), 1)
array_test = y_test.reshape(len(y_test), 1)

print(np.concatenate((array_pred, array_test), 1))