# Linear Regression  Exercise

---
---
---
## Complete the tasks in bold

**TASK: Run the cells under the Imports and Data section to make sure you have imported the correct general libraries as well as the correct datasets. Later on you may need to run further imports from scikit-learn.**

### Imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Data

In [2]:
df = pd.read_csv("C:/Users/DELL/Desktop/Semester10/MachineLearning/RegressionTask/AMES_Final_DF.csv")

In [3]:
df.head()

Unnamed: 0,Lot Frontage,Lot Area,Overall Qual,Overall Cond,Year Built,Year Remod/Add,Mas Vnr Area,BsmtFin SF 1,BsmtFin SF 2,Bsmt Unf SF,...,Sale Type_ConLw,Sale Type_New,Sale Type_Oth,Sale Type_VWD,Sale Type_WD,Sale Condition_AdjLand,Sale Condition_Alloca,Sale Condition_Family,Sale Condition_Normal,Sale Condition_Partial
0,141.0,31770,6,5,1960,1960,112.0,639.0,0.0,441.0,...,0,0,0,0,1,0,0,0,1,0
1,80.0,11622,5,6,1961,1961,0.0,468.0,144.0,270.0,...,0,0,0,0,1,0,0,0,1,0
2,81.0,14267,6,6,1958,1958,108.0,923.0,0.0,406.0,...,0,0,0,0,1,0,0,0,1,0
3,93.0,11160,7,5,1968,1968,0.0,1065.0,0.0,1045.0,...,0,0,0,0,1,0,0,0,1,0
4,74.0,13830,5,5,1997,1998,0.0,791.0,0.0,137.0,...,0,0,0,0,1,0,0,0,1,0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2925 entries, 0 to 2924
Columns: 274 entries, Lot Frontage to Sale Condition_Partial
dtypes: float64(11), int64(263)
memory usage: 6.1 MB


**TASK: The label we are trying to predict is the SalePrice column. Separate out the data into X features and y labels**

In [5]:
X = df.drop('SalePrice',axis=1)
y = df['SalePrice']

**TASK: Use scikit-learn to split up X and y into a training set and test set. Since we will later be using a Grid Search strategy, set your test proportion to 10%. To get the same data split as the solutions notebook, you can specify random_state = 101**

In [6]:
from sklearn.model_selection import train_test_split

In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=101)

**TASK: The dataset features has a variety of scales and units. For optimal regression performance, scale the X features. Take carefuly note of what to use for .fit() vs what to use for .transform()**

In [8]:
from sklearn.preprocessing import StandardScaler

In [9]:
scaler = StandardScaler()

In [10]:
scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test)

**TASK: Fit the data using a Linear Regression Model**

In [11]:
from sklearn.linear_model import LinearRegression

In [12]:
model = LinearRegression()

In [13]:
model.fit(scaled_X_train, y_train)


In [15]:
test_predictions = model.predict(scaled_X_test)


**TASK: Evaluate your model's performance on the unseen 10% scaled test set.**

In [16]:
from sklearn.metrics import mean_absolute_error,mean_squared_error

In [17]:
# Predictions on the scaled test set
test_predictions = model.predict(scaled_X_test)


# Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error: {mae}")

# Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Root Mean Squared Error (RMSE)
rmse = mean_squared_error(y_test, y_pred, squared=False)
print(f"Root Mean Squared Error: {rmse}")


Mean Absolute Error: 14583.186226552378
Mean Squared Error: 434963354.8468295
Root Mean Squared Error: 20855.775095805704



**TASK: Repeat the above steps using Polynomial Regression and Regularization.**
**Note: Only Try one Polynomial Degree and one regularization technique.**

In [18]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge


In [19]:
# Polynomial transformation with degree 2
poly = PolynomialFeatures(degree=2)
X_poly_train = poly.fit_transform(scaled_X_train)
X_poly_test = poly.transform(scaled_X_test)


In [20]:
# Create Ridge regression model
ridge_model = Ridge(alpha=1.0)  # alpha controls the regularization strength


In [22]:
ridge_model.fit(X_poly_train, y_train)
y_poly_pred = ridge_model.predict(X_poly_test)


In [23]:
# Evaluate using the same metrics

mae_poly = mean_absolute_error(y_test, y_poly_pred)
mse_poly = mean_squared_error(y_test, y_poly_pred)
rmse_poly = mean_squared_error(y_test, y_poly_pred, squared=False)

print(f"Polynomial Regression with Ridge Regularization:")
print(f"Mean Absolute Error: {mae_poly}")
print(f"Mean Squared Error: {mse_poly}")
print(f"Root Mean Squared Error: {rmse_poly}")


Polynomial Regression with Ridge Regularization:
Mean Absolute Error: 25815.475921212565
Mean Squared Error: 1331052700.1774943
Root Mean Squared Error: 36483.594945913625


In [26]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge
# Polynomial transformation with degree 1
poly = PolynomialFeatures(degree=1)
X_poly_train = poly.fit_transform(scaled_X_train)
X_poly_test = poly.transform(scaled_X_test)
# Create Ridge regression model
ridge_model = Ridge(alpha=1.0)  # alpha controls the regularization strength
ridge_model.fit(X_poly_train, y_train)
y_poly_pred = ridge_model.predict(X_poly_test)
# Evaluate using the same metrics

mae_poly = mean_absolute_error(y_test, y_poly_pred)
mse_poly = mean_squared_error(y_test, y_poly_pred)
rmse_poly = mean_squared_error(y_test, y_poly_pred, squared=False)

print(f"Polynomial Regression with Ridge Regularization:")
print(f"Mean Absolute Error: {mae_poly}")
print(f"Mean Squared Error: {mse_poly}")
print(f"Root Mean Squared Error: {rmse_poly}")


Polynomial Regression with Ridge Regularization:
Mean Absolute Error: 14565.777104897914
Mean Squared Error: 434466555.15881336
Root Mean Squared Error: 20843.86133034888


## Great work!

----