# Linear Regression 
    Linear regression is a supervised machine learning fundational statistical algorithm/model used to model the relationship between a dependent variable (y) and one or more independent variables (x). 
    The goal is to find the best-fitting straight line (or hyperplane in higher dimensions) that minimizes the difference between the observed data points and the line's predicted values. It is based on the assumption that the relationship between the variables is linear.
    
    -> Why is it called “Linear”?
    Because the relationship between input and output is assumed to be linear, meaning change in input is directly proportional change in output
    Graphically, it looks like a straight line.
<div style="display:flex; gap:15px;">
  <img src="image1.png" width="550">
  <img src="image2.png" width="500">
</div>


### Types of Linear Regression
    1. Simple Linear Regression -> Only one independent variable
    Example -> Predict salary based on years of experience.
    y = β0 ​+ β1​x
    Where:
    y → predicted output
    x → input feature
    β₀ (c) → intercept
    β₁ (m) → slope (weight)

    2. Multiple Linear Regression -> Two or more independent variables
    Example -> Predict house price based on :: Area, Location, Number of rooms, Age of building
    y = β0​ + β1​x1​ + β2​x2 ​+...+ βn​xn​
    Where:
    x₁, x₂, … xₙ → multiple features
    β₁, β₂, … βₙ → coefficients

    - Slope (β₁) -> Shows how much y changes when x increases by 1 unit
    Indicates direction & strength of relationship
    - Intercept (β₀) -> Value of y when x = 0
    Starting point of the regression line

### Import Libraries

In [93]:
import numpy as np
import pandas as pd
import seaborn as sns 
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso, Ridge
from sklearn.metrics import r2_score

### Load dataset

In [95]:
data_1 = pd.read_csv("bangalore house price prediction OHE-data.csv")
data_1.sample(3)

Unnamed: 0,bath,balcony,price,total_sqft_int,bhk,price_per_sqft,area_typeSuper built-up Area,area_typeBuilt-up Area,area_typePlot Area,availability_Ready To Move,...,location_Kalena Agrahara,location_Horamavu Agara,location_Vidyaranyapura,location_BTM 2nd Stage,location_Hebbal Kempapura,location_Hosur Road,location_Horamavu Banaswadi,location_Domlur,location_Mahadevpura,location_Tumkur Road
1607,2.0,1.0,45.48,1070.0,2,4250.46729,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2329,3.0,2.0,63.0,1500.0,3,4200.0,1,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3642,2.0,2.0,50.0,1175.0,2,4255.319149,1,0,0,1,...,0,0,0,0,0,0,0,0,0,0


### Extract feature for sample working

In [97]:
data_1_sample = data_1[['bath','balcony','bhk','price']]
data_1_sample.sample(3)

Unnamed: 0,bath,balcony,bhk,price
2661,1.0,1.0,1,20.0
1208,2.0,1.0,2,68.32
2095,3.0,2.0,3,52.47


### X-Input variable, y-Output variable and split X & y into train and test set

In [99]:
X = data_1_sample[[	'bath','balcony','bhk']]
y = data_1_sample[["price"]]
X_train,X_test, y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42)
print('Shape of X_train : ',X_train.shape)
print('Shape of X_test : ',X_test.shape)
print('Shape of y_train : ',y_train.shape)
print('Shape of y_test : ',y_test.shape)

Shape of X_train :  (5696, 3)
Shape of X_test :  (1424, 3)
Shape of y_train :  (5696, 1)
Shape of y_test :  (1424, 1)


### Fitting data Into LinearRegression Model 

In [101]:
lr = LinearRegression()
lr.fit(X_train,y_train)

### Look for Coeffient and Intercept

In [103]:
print('Coeffient ::',lr.coef_)
print("Intercept ::",lr.intercept_)

Coeffient :: [[51.38853109 -2.70092015 21.56725622]]
Intercept :: [-75.77757235]


### Ridge Regression and Lasso Regression
    Both Ridge and Lasso Regression are regularized versions of Linear Regression used to prevent overfitting and handle multicollinearity.
    -> Why Regularization is Needed?
        In Linear Regression, problems arise when:
        Dataset has many features
        Features are highly correlated
        Model fits training data too well → overfitting
        Regularization adds a penalty term to the loss function to control model complexity.

### Fitting Data Into Lasso And Ridge Regression Model

In [106]:
lasso = Lasso(alpha=1)
ridge = Ridge(alpha=1)
lasso.fit(X_train,y_train),ridge.fit(X_train,y_train)

(Lasso(alpha=1), Ridge(alpha=1))

#### Check Accuracy Score For Linear, Lasso, and Ridge Regression 

In [108]:
print("Linear Regression Score ::",lr.score(X_test,y_test))
print("Lasso Regression Score  :: ",lasso.score(X_test,y_test))
print("Ridge Regression Score  ::",ridge.score(X_test,y_test))

Linear Regression Score :: 0.3294892815488314
Lasso Regression Score  ::  0.329358034006472
Ridge Regression Score  :: 0.32948978102808757
