# Multiple Linear Regression (Çoklu Doğrusal Regresyon)
- The main goal is to find the linear function that expresses the relationship between the dependent and independent variable. 
- (Temel amaç, bağımlı ve bağımsız değişken arasındaki ilişkiyi ifade eden doğrusal fonksiyonu bulmaktır)

![](./img/05-ss.png)
![](./img/06-ss.png)
![](./img/07-ss.png)
![](./img/08-ss.png)

In [2]:
import pandas as pd
# ad = pd.read_csv("./data/Advertising.csv", usecols=[1,2,3,4])
ad = pd.read_csv("./data/Advertising.csv")
ad = ad.iloc[:,1:]
df = ad.copy()
df.head()

Unnamed: 0,TV,radio,newspaper,sales
0,230.1,37.8,69.2,22.1
1,44.5,39.3,45.1,10.4
2,17.2,45.9,69.3,9.3
3,151.5,41.3,58.5,18.5
4,180.8,10.8,58.4,12.9


In [11]:
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict

In [12]:
X = df.drop("sales",axis = 1)
X.head()

Unnamed: 0,TV,radio,newspaper
0,230.1,37.8,69.2
1,44.5,39.3,45.1
2,17.2,45.9,69.3
3,151.5,41.3,58.5
4,180.8,10.8,58.4


In [13]:
y = df.sales
y.head()

0    22.1
1    10.4
2     9.3
3    18.5
4    12.9
Name: sales, dtype: float64

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=42)

In [19]:
print("X_train -> " + str(X_train.shape))
print("X_test -> " + str(X_test.shape))
print("y_train -> " + str(y_train.shape))
print("y_test -> " + str(y_test.shape))


X_train -> (160, 3)
X_test -> (40, 3)
y_train -> (160,)
y_test -> (40,)


In [21]:
training = df.copy()
training.shape

(200, 4)

### StatsModel

In [23]:
import statsmodels.api as sm

In [25]:
lm = sm.OLS(y_train,X_train)
model = lm.fit()
model.summary()

0,1,2,3
Dep. Variable:,sales,R-squared (uncentered):,0.982
Model:,OLS,Adj. R-squared (uncentered):,0.982
Method:,Least Squares,F-statistic:,2935.0
Date:,"Mon, 20 Mar 2023",Prob (F-statistic):,1.28e-137
Time:,20:39:34,Log-Likelihood:,-336.65
No. Observations:,160,AIC:,679.3
Df Residuals:,157,BIC:,688.5
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
TV,0.0531,0.001,36.467,0.000,0.050,0.056
radio,0.2188,0.011,20.138,0.000,0.197,0.240
newspaper,0.0239,0.008,3.011,0.003,0.008,0.040

0,1,2,3
Omnibus:,11.405,Durbin-Watson:,1.895
Prob(Omnibus):,0.003,Jarque-Bera (JB):,15.574
Skew:,-0.432,Prob(JB):,0.000415
Kurtosis:,4.261,Cond. No.,13.5


In [27]:
model.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
TV,0.0531,0.001,36.467,0.000,0.050,0.056
radio,0.2188,0.011,20.138,0.000,0.197,0.240
newspaper,0.0239,0.008,3.011,0.003,0.008,0.040


### Scikit-Learn

In [28]:
from sklearn.linear_model import LinearRegression

In [29]:
lm = LinearRegression()
model = lm.fit(X_train,y_train)

In [30]:
model.intercept_

2.979067338122629

In [32]:
model.coef_

array([0.04472952, 0.18919505, 0.00276111])

### Predit (Tahmin)
- Sales = 2.97 + TV * 0.04 + Radio * 0.18 + Newspaper * 0.002

In [40]:
data = pd.DataFrame([[30],[10],[40]]).T
data

Unnamed: 0,0,1,2
0,30,10,40


In [41]:
model.predict(data)



array([6.32334798])

In [48]:
data_to_predit = X_train.iloc[155:156]
data_to_predit

Unnamed: 0,TV,radio,newspaper
106,25.0,11.0,29.7


In [49]:
model.predict(data_to_predit)


array([6.26045597])

In [50]:
real_value = y_train[155:156]
real_value

106    7.2
Name: sales, dtype: float64

In [54]:
import numpy as np
from sklearn.metrics import mean_squared_error

In [55]:
rmse_train = np.sqrt(mean_squared_error(y_train, model.predict(X_train)))
rmse_train  # train error (eğitim hatası)

1.6447277656443373

In [56]:
rmse_test = np.sqrt(mean_squared_error(y_test, model.predict(X_test)))
rmse_test  # test error (test hatası)

1.7815996615334506

In [60]:
lm.score(X_train,y_train).round(3)

0.896

In [59]:
lm.score(X_test,y_test).round(3)

0.899

### Model Tuning (Model Doğrulama)