### Day 50 – Linear Regression (Part 2)
* Multiple Linear Regression (more than 1 feature)
* Evaluation metrics: R², Adjusted R², MSE, RMSE
* Python practice with dataset (house prices, salaries, etc.)

### Multiple Linear Regression (MLR)?
In Simple Linear Regression, we had:
y=mx+c
where only one independent variable (x) predicts the dependent variable (y).

Multiple Linear Regression, we use two or more independent variables to predict the outcome:

y=b0​+b1​x1​+b2​x2​+...+bn​xn​

y = dependent variable (what we want to predict, e.g., house price)

x1,x2 are independent variable

b1,b2,..bn = intercept
Assumptions of Multiple Linear Regression

Linearity – Relationship between predictors and target is linear.

No multicollinearity – Independent variables should not be highly correlated with each other.

Homoscedasticity – Constant variance of residuals (errors).

Normality of residuals – Errors should be normally distributed.



In [11]:
import pandas as pd
from sklearn.linear_model import LinearRegression
data={
    'size' : [1000,1500,2000,2500,3000],
    'bedrooms':[2,3,3,4,5],
    'age':[10,5,8,4,2],
    'price':[300000, 400000, 500000, 600000, 700000]
}
df = pd.DataFrame(data)
X = df[['size','bedrooms','age']]
y = df['price']

model = LinearRegression()
model.fit(X,y)

print("coefficient is :",model.coef_)
print("Intercept is :",model.intercept_)

predicted_price = model.predict([[2200,4,3]])
print("Predicted Price is : ",predicted_price)


coefficient is : [ 2.00000000e+02 -9.46891927e-11 -1.67096763e-11]
Intercept is : 100000.00000000017
Predicted Price is :  [540000.]




### R² (Coefficient of Determination)

R2 = 1 - SS(res)/SS(tot)
SS
SS(res) = sum of squared residuals (errors),
SS(tot) = total variance in data.
Meaning:
“What fraction of the variation in y is explained by the model.”
R2 =0: Model explains nothing.
R2=1: Model explains everything (perfect fit).

### 2.Adjusted R²

Problem: R² always increases when you add more features—even if they are useless.

Solution: Adjusted R² penalizes unnecessary features.

### 3. MSE (Mean Squared Error)
Meaning: Average of squared differences between actual and predicted values.

### 4. RMSE (Root Mean Squared Error)
Same as MSE but in the same units as y (e.g., rupees for house price).

In [14]:
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np

# Predictions
y_pred = model.predict(X)

# R²
r2 = r2_score(y, y_pred)

# Adjusted R²
n = X.shape[0]   # number of rows
p = X.shape[1]   # number of predictors
adj_r2 = 1 - (1-r2)*(n-1)/(n-p-1)

# MSE & RMSE
mse = mean_squared_error(y, y_pred)
rmse = np.sqrt(mse)

print("R²:", r2)
print("Adjusted R²:", adj_r2)
print("MSE:", mse)
print("RMSE:", rmse)


The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.
R²: 1.0
Adjusted R²: 1.0
MSE: 1.2874900798265365e-20
RMSE: 1.1346762004318837e-10
