### 🔹 Multiple Linear Regression  

Multiple Linear Regression is an extension of Simple Linear Regression, where instead of using a single input feature, we use **two or more features** to predict the target variable.  

The mathematical model is written as:  

$$
Y = a_1X_1 + a_2X_2 + \dots + a_nX_n + b
$$  

where:  
- \(a1, a2, ...) are the coefficients (showing how much each feature contributes to the prediction),  
- \(b\) is the intercept (the baseline value of Y when all features are zero).  

This model allows us to:  
- Capture the combined influence of multiple factors on the target.  
- Make more accurate predictions in complex real-world problems.  
- Serve as a foundation for advanced regression and machine learning techniques.  

In this notebook, we will implement Multiple Linear Regression and explore how multiple features together can improve prediction power 🚀.

# --------------------------------------------------------------------------

# import dataset

In [1]:
# from google.colab import files, drive

# up = files.upload()
# drive.mount('/content/drive')

In [2]:
import pandas as pd

df = pd.read_csv('dataset.csv')
df.head()

Unnamed: 0,f1,f2,f3,T
0,2.0,4,8.5,196.0
1,2.4,4,9.6,221.0
2,1.5,4,5.9,136.0
3,3.5,6,11.1,255.0
4,3.5,6,10.6,244.0


In [3]:
df.describe()

Unnamed: 0,f1,f2,f3,T
count,1067.0,1067.0,1064.0,1065.0
mean,3.346298,5.794752,11.583271,256.295775
std,1.415895,1.797447,3.489784,63.410924
min,1.0,3.0,4.7,108.0
25%,2.0,4.0,9.0,207.0
50%,3.4,6.0,10.9,251.0
75%,4.3,8.0,13.4,294.0
max,8.4,12.0,25.8,488.0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1067 entries, 0 to 1066
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   f1      1067 non-null   float64
 1   f2      1067 non-null   int64  
 2   f3      1064 non-null   float64
 3   T       1065 non-null   float64
dtypes: float64(3), int64(1)
memory usage: 33.5 KB


In [5]:
df.isna().sum()

f1    0
f2    0
f3    3
T     2
dtype: int64

# cleaning

In [6]:
df[df.isna().any(axis=1)]

Unnamed: 0,f1,f2,f3,T
5,3.5,6,,230.0
6,3.5,6,10.1,
23,2.0,4,,230.0
61,4.0,8,,281.0
65,2.0,4,9.1,


### ffill , interpolate

In [7]:
# clean the data

df['T'] = df['T'].ffill()

df['f3'] = df['f3'].interpolate(method = 'polynomial', order = 2)

# encoding

In [8]:
# encode the data

# define x , y

In [9]:
import numpy as np

x = df[['f1', 'f2', 'f3']].values    # 2D
y = df['T'].values                   # 1D

# x = df.values[: , :-1]
# y = df.values[: , -1]

In [10]:
x[:5]

array([[ 2. ,  4. ,  8.5],
       [ 2.4,  4. ,  9.6],
       [ 1.5,  4. ,  5.9],
       [ 3.5,  6. , 11.1],
       [ 3.5,  6. , 10.6]])

In [11]:
y[:5]

array([196., 221., 136., 255., 244.])

# spliting

In [12]:
# # finding best random state 

# from sklearn.model_selection import train_test_split
# from sklearn.linear_model import LinearRegression
# from sklearn.metrics import r2_score
# from sklearn.preprocessing import MinMaxScaler

# import time
# t1 = time.time()
# lst = []
# for i in range(1,10):
#     x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=i)
#     mms = MinMaxScaler().fit(x_train)
#     x_train = mms.transform(x_train)
#     x_test = mms.transform(x_test)
#     mlr = LinearRegression()
#     mlr.fit(x_train, y_train)
#     yhat_test = mlr.predict(x_test)
#     r2 = r2_score(y_test, yhat_test)
#     lst.append(r2)
# t2 = time.time()
# print(f"run time: {round((t2 - t1)/60, 2)} min")
# print(f"r2_score: {round(max(lst), 2)}")
# rs = np.argmax(lst) + 1
# print(f"random_state: {rs}")

In [13]:
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=42)

# scaling

In [14]:
from sklearn.preprocessing import MinMaxScaler, StandardScaler

mms = StandardScaler().fit(x_train)

In [15]:
x_train = mms.transform(x_train)
x_test = mms.transform(x_test)

# fit train data

In [16]:
### K-fold cross validation

# from sklearn.linear_model import LinearRegression
# from sklearn.model_selection import GridSearchCV

# parameters = {
#     'fit_intercept': [True, False],
#     'copy_X': [True, False],
#     'n_jobs': [None],
#     'positive': [True, False]
# }

# lr = LinearRegression()
# gs = GridSearchCV(estimator=lr, param_grid=parameters, cv=5)

# gs.fit(x_train, y_train)

# best_params = gs.best_params_
# print(best_params)

In [17]:
# def param
# fit_intercept=True, copy_X=True, n_jobs=None, positive=False

In [18]:
from sklearn.linear_model import LinearRegression

mlr = LinearRegression()
mlr.fit(x_train, y_train)

$$
Y = a_1X_1 + a_2X_2 + \dots + a_nX_n + b
$$

In [19]:
print(mlr.intercept_)   # interception (b)
print(mlr.coef_)        # coefficient (a1, a2, a3)

256.52625000000006
[16.3466259  12.12520904 33.27585656]


# predict test data

In [20]:
yhat_test = mlr.predict(x_test)

# evaluate the model

In [21]:
from sklearn.metrics import r2_score

print("r2-score (train data): %0.4f" % r2_score(y_train, mlr.predict(x_train)))
print("r2-score (test data): %0.4f" % r2_score(y_test, yhat_test))

r2-score (train data): 0.8608
r2-score (test data): 0.8724


In [22]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

print(f"MSE (train data): {mean_squared_error(y_train, mlr.predict(x_train))}")
print(f"RMSE (train data): {np.sqrt(mean_squared_error(y_train, mlr.predict(x_train)))}")
print(f"MAE (train data): {mean_absolute_error(y_train, mlr.predict(x_train))}")
print('------------')
print(f"MSE (test data): {mean_squared_error(y_test, yhat_test)}")
print(f"RMSE (test data): {np.sqrt(mean_squared_error(y_test, yhat_test))}")
print(f"MAE (test data): {mean_absolute_error(y_test, yhat_test)}")

MSE (train data): 552.2105906572357
RMSE (train data): 23.49916148838583
MAE (train data): 16.862139417246617
------------
MSE (test data): 529.2825497681079
RMSE (test data): 23.00614156628851
MAE (test data): 17.139100260461838


# predict new data

In [23]:
mlr.predict(mms.transform([[2, 4, 8.5]]))

array([199.52255188])

# save the model

In [24]:
# import joblib
# joblib.dump(mlr, 'mlr_model.pkl')

# load the model

In [25]:
# import joblib
# mlr = joblib.load('mlr_model.pkl')