# Predicting Housing Prices with Regularized Regression
You work for a real estate analytics firm, and your task is to build a predictive model to estimate house prices based on 
various features. You have a dataset containing information about houses, such as square footage, number of bedrooms, 
number of bathrooms, and other relevant attributes. In this case study, you'll explore the application of Lasso and 
Ridge regression to improve the predictive performance of the model:

# 1.Data Prepartion

In [18]:
import pandas as pd
data = pd.read_csv('data.csv')

In [2]:
data.head()

Unnamed: 0,date,price,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,sqft_above,sqft_basement,yr_built,yr_renovated,street,city,statezip,country
0,2014-05-02 00:00:00,313000.0,3.0,1.5,1340,7912,1.5,0,0,3,1340,0,1955,2005,18810 Densmore Ave N,Shoreline,WA 98133,USA
1,2014-05-02 00:00:00,2384000.0,5.0,2.5,3650,9050,2.0,0,4,5,3370,280,1921,0,709 W Blaine St,Seattle,WA 98119,USA
2,2014-05-02 00:00:00,342000.0,3.0,2.0,1930,11947,1.0,0,0,4,1930,0,1966,0,26206-26214 143rd Ave SE,Kent,WA 98042,USA
3,2014-05-02 00:00:00,420000.0,3.0,2.25,2000,8030,1.0,0,0,4,1000,1000,1963,0,857 170th Pl NE,Bellevue,WA 98008,USA
4,2014-05-02 00:00:00,550000.0,4.0,2.5,1940,10500,1.0,0,0,4,1140,800,1976,1992,9105 170th Ave NE,Redmond,WA 98052,USA


In [3]:
from sklearn.model_selection import train_test_split
X = data['sqft_lot']  
y = data['price']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=17)

In [4]:
data.shape

(4600, 18)

In [5]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error

# Load your dataset (replace 'your_data.csv' with your data file)
data = pd.read_csv('data.csv')

# Select features and target variable
X = data[['condition', 'bathrooms', 'floors']]  
# Replace with your selected features
y = data['price']  # Replace with your target variable


In [6]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [7]:
data.shape

(4600, 18)

In [8]:
# Create a Lasso regression model
lasso = Lasso(alpha=0.1)  # You can adjust the alpha (regularization strength) as needed



In [9]:
# Fit the model on the training data
lasso.fit(X_train, y_train)

In [10]:
# Make predictions on the test data
y_pred= lasso.predict(X_test)
print("The predictied values are\n ", y_pred)

The predictied values are
  [ 613285.04369251  585097.42100904  887309.05474508  708015.61519359
  581130.65671228  613285.04369251  466145.99112125  613285.04369251
  617251.80798927  617251.80798927  672760.7586364   224276.3670489
  660650.32944305  287718.84628956  434857.89886082  910663.61841199
  700082.08660007  613285.04369251  529588.4703619   613285.04369251
  736203.23787706  613285.04369251  466145.99112125  676727.52293316
  375382.18391692  287718.84628956  613285.04369251  740170.00217382
  402703.51188059  541698.89955526 1232052.9152151   529588.4703619
 1267517.90807536  732236.4735803   466145.99112125  640606.37165617
  613285.04369251 1386469.33796315  287718.84628956  613285.04369251
  613285.04369251  402703.51188059  525621.70606514  434857.89886082
  406670.27617735  343227.79693669  525621.70606514  605141.37879592
  918597.14700551  656683.56514629  883342.29044832  823866.57550442
  553809.32874861  613285.04369251  192988.27478847  224276.3670489
  851187.

In [11]:
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")


Mean Squared Error: 993444305590.9279


# 4 Implement Ridge Regression

In [12]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

# Load your dataset
data = pd.read_csv('data.csv')

# Selecting features and target variable
X = data[['bedrooms', 'bathrooms', 'floors']]  # Replace with your actual feature names
y = data['price']

In [13]:
#4b Splitting the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Implementing Ridge regression
alpha = 1.0  # Regularization strength (hyperparameter to be tuned)
ridge_model = Ridge(alpha=alpha)
ridge_model.fit(X_train, y_train)

# Making predictions on the test set
predictions = ridge_model.predict(X_test)

# Calculating mean squared error to evaluate the model
mse = mean_squared_error(y_test, predictions)
print("Mean Squared Error (Ridge Regression):", mse)


Mean Squared Error (Ridge Regression): 996086397466.3423


# 5 Evaluate Ridge Regression Model

In [16]:
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Calculating MAE, MSE, and RMSE
mae = mean_absolute_error(y_test, predictions)
mse = mean_squared_error(y_test, predictions)
rmse = np.sqrt(mse)

print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
print("Root Mean Squared Error (RMSE):", rmse)

Mean Absolute Error (MAE): 249838.3296032353
Mean Squared Error (MSE): 996086397466.3423
Root Mean Squared Error (RMSE): 998041.2804420177


# Diagnosing and Remedying Heteroscedasticity and Multicollinearity

1.Initial Linear Regression Model.

In [31]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error


In [44]:
# Load your dataset
data = pd.read_csv('Employee.csv')

In [45]:
data

Unnamed: 0,Education,JoiningYear,City,PaymentTier,Age,Gender,EverBenched,ExperienceInCurrentDomain,LeaveOrNot
0,Bachelors,2017,Bangalore,3,34,Male,No,0,0
1,Bachelors,2013,Pune,1,28,Female,No,3,1
2,Bachelors,2014,New Delhi,3,38,Female,No,2,0
3,Masters,2016,Bangalore,3,27,Male,No,5,1
4,Masters,2017,Pune,3,24,Male,Yes,2,1
...,...,...,...,...,...,...,...,...,...
4648,Bachelors,2013,Bangalore,3,26,Female,No,4,0
4649,Masters,2013,Pune,2,37,Male,No,2,1
4650,Masters,2018,New Delhi,3,27,Male,No,5,1
4651,Bachelors,2012,Bangalore,3,30,Male,Yes,2,0
