<h1 style="color: red;">House Prices Modeling</h1>

#### Data Science in Production - Practical Work 1

## Goal

To predict the sales price for each house. For each Id in the test set, there must be a prediction of value SalePrice. 

## 1.Data setup 

<h3 style="color: blue;">Loading the data:</h3>
<br>

In [77]:
# Import the necessary libraries

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error,mean_squared_log_error,r2_score

In [44]:
# let's load the data set.
train_data = pd.read_csv('../data/house-prices-advanced-regression-techniques/train.csv')

test_data = pd.read_csv('../data/house-prices-advanced-regression-techniques/test.csv')


# now let's do some basic data analysis 

train_data.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition,SalePrice
0,1,60,RL,65.0,8450,Pave,,Reg,Lvl,AllPub,...,0,,,,0,2,2008,WD,Normal,208500
1,2,20,RL,80.0,9600,Pave,,Reg,Lvl,AllPub,...,0,,,,0,5,2007,WD,Normal,181500
2,3,60,RL,68.0,11250,Pave,,IR1,Lvl,AllPub,...,0,,,,0,9,2008,WD,Normal,223500
3,4,70,RL,60.0,9550,Pave,,IR1,Lvl,AllPub,...,0,,,,0,2,2006,WD,Abnorml,140000
4,5,60,RL,84.0,14260,Pave,,IR1,Lvl,AllPub,...,0,,,,0,12,2008,WD,Normal,250000


In [45]:
test_data.head()

Unnamed: 0,Id,MSSubClass,MSZoning,LotFrontage,LotArea,Street,Alley,LotShape,LandContour,Utilities,...,ScreenPorch,PoolArea,PoolQC,Fence,MiscFeature,MiscVal,MoSold,YrSold,SaleType,SaleCondition
0,1461,20,RH,80.0,11622,Pave,,Reg,Lvl,AllPub,...,120,0,,MnPrv,,0,6,2010,WD,Normal
1,1462,20,RL,81.0,14267,Pave,,IR1,Lvl,AllPub,...,0,0,,,Gar2,12500,6,2010,WD,Normal
2,1463,60,RL,74.0,13830,Pave,,IR1,Lvl,AllPub,...,0,0,,MnPrv,,0,3,2010,WD,Normal
3,1464,60,RL,78.0,9978,Pave,,IR1,Lvl,AllPub,...,0,0,,,,0,6,2010,WD,Normal
4,1465,120,RL,43.0,5005,Pave,,IR1,HLS,AllPub,...,144,0,,,,0,1,2010,WD,Normal


In [46]:
# let's check the shape of the data

print(f"shape of train data {train_data.shape} and test data {test_data.shape}")

shape of train data (1460, 81) and test data (1459, 80)


In [47]:
# let's check the coloumn names 

train_data.columns

Index(['Id', 'MSSubClass', 'MSZoning', 'LotFrontage', 'LotArea', 'Street',
       'Alley', 'LotShape', 'LandContour', 'Utilities', 'LotConfig',
       'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType',
       'HouseStyle', 'OverallQual', 'OverallCond', 'YearBuilt', 'YearRemodAdd',
       'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType',
       'MasVnrArea', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual',
       'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinSF1',
       'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', 'Heating',
       'HeatingQC', 'CentralAir', 'Electrical', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'KitchenQual',
       'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'FireplaceQu', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive

In [48]:
# let's check if there are any missing values in the data set

train_data.isnull().sum()

Id                 0
MSSubClass         0
MSZoning           0
LotFrontage      259
LotArea            0
                ... 
MoSold             0
YrSold             0
SaleType           0
SaleCondition      0
SalePrice          0
Length: 81, dtype: int64

In [49]:
# let's check the data types

train_data.dtypes

Id                 int64
MSSubClass         int64
MSZoning          object
LotFrontage      float64
LotArea            int64
                  ...   
MoSold             int64
YrSold             int64
SaleType          object
SaleCondition     object
SalePrice          int64
Length: 81, dtype: object

<h3 style="color: blue;">Splitting the dataset:</h3>
<br>

In [50]:
'''
now let's split the train_data set into train and test so that we can analyse 
the performance of the model in an unbiased way
'''

# taking features into one dataframe and target into another dataframe

X = train_data.drop('SalePrice', axis=1)
y = train_data['SalePrice']

# let's split the data - 80 % for training and 20% for testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"shape after splitting : \n train data {X_train.shape} {y_train.shape} \n test data {X_test.shape} {y_test.shape}")

shape after splitting : 
 train data (1168, 80) (1168,) 
 test data (292, 80) (292,)


## 2.Feature selection
<br>

In [51]:
# let's check which features are categorical and which are continuous
print("continuous features :\n",train_data.select_dtypes(include=['int64']).dtypes.head())
print("categorical features :\n",train_data.select_dtypes(include=['object']).dtypes.head())

continuous features :
 Id             int64
MSSubClass     int64
LotArea        int64
OverallQual    int64
OverallCond    int64
dtype: object
categorical features :
 MSZoning       object
Street         object
Alley          object
LotShape       object
LandContour    object
dtype: object


The following features are picked

#### categorical features - HouseStyle  , Neighborhood , BldgType ,KitchenQual , ExterQual    

#### continuous features -  OverallQual, YearBuilt , GrLivArea, TotalBsmtSF , GarageArea

<br>
    
let's learn more about the picked features

In [52]:
cate_feat = ['HouseStyle','Neighborhood','BldgType','KitchenQual','ExterQual'] 
cont_feat = ['OverallQual','YearBuilt','GrLivArea','TotalBsmtSF','GarageArea']

def display(feat):
    for features in feat:
        print(train_data[features].head())

display(cate_feat)

0    2Story
1    1Story
2    2Story
3    2Story
4    2Story
Name: HouseStyle, dtype: object
0    CollgCr
1    Veenker
2    CollgCr
3    Crawfor
4    NoRidge
Name: Neighborhood, dtype: object
0    1Fam
1    1Fam
2    1Fam
3    1Fam
4    1Fam
Name: BldgType, dtype: object
0    Gd
1    TA
2    Gd
3    Gd
4    Gd
Name: KitchenQual, dtype: object
0    Gd
1    TA
2    Gd
3    TA
4    Gd
Name: ExterQual, dtype: object


In [53]:
display(cont_feat)

0    7
1    6
2    7
3    7
4    8
Name: OverallQual, dtype: int64
0    2003
1    1976
2    2001
3    1915
4    2000
Name: YearBuilt, dtype: int64
0    1710
1    1262
2    1786
3    1717
4    2198
Name: GrLivArea, dtype: int64
0     856
1    1262
2     920
3     756
4    1145
Name: TotalBsmtSF, dtype: int64
0    548
1    460
2    608
3    642
4    836
Name: GarageArea, dtype: int64


## 3.Feature processing
<br>

<h3 style="color: blue;">Handling Missing Values:</h3>
<br>

In [54]:
def checkformissingval(feat):
    for val in feat:
        count = X_train[val].isnull().sum()
        if(count != 0):
            print(f"Null values in feature {val} is {count}")
        else:
            print(f"No null value in feature {val}")
checkformissingval(cont_feat)

No null value in feature OverallQual
No null value in feature YearBuilt
No null value in feature GrLivArea
No null value in feature TotalBsmtSF
No null value in feature GarageArea


In [55]:
checkformissingval(cate_feat)

No null value in feature HouseStyle
No null value in feature Neighborhood
No null value in feature BldgType
No null value in feature KitchenQual
No null value in feature ExterQual


In [57]:
# check for missing values in the final test data
def checkformissingval2(feat):
    for val in feat:
        count = test_data[val].isnull().sum()
        if(count != 0):
            print(f"Null values found in feature {val} is {count}")
        else:
            print(f"No null value in feature {val}")
checkformissingval2(cont_feat)
checkformissingval2(cate_feat)

No null value in feature OverallQual
No null value in feature YearBuilt
No null value in feature GrLivArea
Null values found in feature TotalBsmtSF is 1
Null values found in feature GarageArea is 1
No null value in feature HouseStyle
No null value in feature Neighborhood
No null value in feature BldgType
Null values found in feature KitchenQual is 1
No null value in feature ExterQual


since there are missing values in the test set , we will do some imputation

In [58]:
test_data['TotalBsmtSF'].fillna(test_data['TotalBsmtSF'].median(), inplace=True)
test_data['GarageArea'].fillna(test_data['GarageArea'].median(), inplace=True)
test_data['KitchenQual'].fillna(test_data['KitchenQual'].mode()[0], inplace=True)

In [59]:
checkformissingval2(cont_feat)
checkformissingval2(cate_feat)

No null value in feature OverallQual
No null value in feature YearBuilt
No null value in feature GrLivArea
No null value in feature TotalBsmtSF
No null value in feature GarageArea
No null value in feature HouseStyle
No null value in feature Neighborhood
No null value in feature BldgType
No null value in feature KitchenQual
No null value in feature ExterQual


<h3 style="color: blue;">Scaling Numerical Features:</h3>
<br>

Scaling is a technique used to normalize the range of independent variables or features of data.

In [60]:
# seperate the dataframe  into cont subset
X_train_cont = X_train[cont_feat]
X_test_cont = X_test[cont_feat]
test_data_cont = test_data[cont_feat]

In [61]:
def findrange(data):
    min_values = data.min()
    max_values = data.max()

    feature_ranges = max_values - min_values

    print("Feature Minimums:\n", min_values)
    print("\nFeature Maximums:\n", max_values)
    print("\nFeature Ranges:\n", feature_ranges)

In [62]:
findrange(X_train_cont)

Feature Minimums:
 OverallQual       1
YearBuilt      1872
GrLivArea       334
TotalBsmtSF       0
GarageArea        0
dtype: int64

Feature Maximums:
 OverallQual      10
YearBuilt      2010
GrLivArea      5642
TotalBsmtSF    6110
GarageArea     1418
dtype: int64

Feature Ranges:
 OverallQual       9
YearBuilt       138
GrLivArea      5308
TotalBsmtSF    6110
GarageArea     1418
dtype: int64


In [63]:
findrange(X_test_cont)

Feature Minimums:
 OverallQual       2
YearBuilt      1880
GrLivArea       480
TotalBsmtSF       0
GarageArea        0
dtype: int64

Feature Maximums:
 OverallQual      10
YearBuilt      2009
GrLivArea      4316
TotalBsmtSF    3206
GarageArea     1390
dtype: int64

Feature Ranges:
 OverallQual       8
YearBuilt       129
GrLivArea      3836
TotalBsmtSF    3206
GarageArea     1390
dtype: int64


as we can see there are huge differences in the Feature ranges , let's proceed with scaling

In [64]:
print(X_train_cont.head())

print(f"\nthe shape of X_train_cont is \n {X_train_cont.shape}")

# scaling cont features
scaler = StandardScaler()
X_train_cont_scaled = pd.DataFrame(scaler.fit_transform(X_train_cont), columns=X_train_cont.columns)
X_test_cont_scaled = pd.DataFrame(scaler.fit_transform(X_test_cont), columns=X_test_cont.columns)

      OverallQual  YearBuilt  GrLivArea  TotalBsmtSF  GarageArea
254             5       1957       1314         1314         294
1066            6       1993       1571          799         380
638             5       1910        796          796           0
799             5       1937       1768          731         240
380             5       1924       1691         1026         308

the shape of X_train_cont is 
 (1168, 5)


In [65]:
# let's see how scaled data looks like
print(X_train_cont_scaled.head())
print("\n",X_test_cont_scaled.head())

   OverallQual  YearBuilt  GrLivArea  TotalBsmtSF  GarageArea
0    -0.820445  -0.455469  -0.407093     0.572612   -0.863837
1    -0.088934   0.718609   0.083170    -0.596547   -0.456264
2    -0.820445  -1.988293  -1.395250    -0.603357   -2.257169
3    -0.820445  -1.107734   0.458975    -0.750921   -1.119755
4    -0.820445  -1.531707   0.312087    -0.081209   -0.797488

    OverallQual  YearBuilt  GrLivArea  TotalBsmtSF  GarageArea
0    -0.007138  -0.335994  -0.758540     0.044012   -0.874821
1     1.382350   0.763181   2.190510     0.982892    1.126763
2    -0.701882  -1.612456  -0.834449    -0.074510   -0.445910
3    -0.007138  -0.903310   0.372498    -0.083806   -0.177841
4     2.077094   1.224126   0.288999     1.347753    2.020328


In [66]:
# see if range is reduced or not
findrange(X_train_cont_scaled)

Feature Minimums:
 OverallQual   -3.746488
YearBuilt     -3.227597
GrLivArea     -2.276580
TotalBsmtSF   -2.410445
GarageArea    -2.257169
dtype: float64

Feature Maximums:
 OverallQual     2.837110
YearBuilt       1.273035
GrLivArea       7.849169
TotalBsmtSF    11.460545
GarageArea      4.463051
dtype: float64

Feature Ranges:
 OverallQual     6.583598
YearBuilt       4.500632
GrLivArea      10.125749
TotalBsmtSF    13.870991
GarageArea      6.720220
dtype: float64


<h3 style="color: blue;">Encoding Categorical Features:</h3>
<br>

In [67]:
# Categorical features need to be encoded to numerical values.

# seperate the dataframe  into categorical subset
X_train_cate = X_train[cate_feat]
X_test_cate = X_test[cate_feat]
test_data_cate = test_data[cate_feat]

print(X_train_cate.head())

print(f"\nthe shape of X_train_cont is \n {X_train_cate.shape} and \nX_test_cate is {X_test_cate.shape}")

#encoding the categorical values
encoder = OneHotEncoder(sparse_output=False, drop='first')  # drop='first' to avoid dummy variable trap
encoder.fit(X_train_cate)
X_train_cate_encoded = pd.DataFrame(encoder.transform(X_train_cate),
                                    columns=encoder.get_feature_names_out(X_train_cate.columns))

X_test_cate_encoded = pd.DataFrame(encoder.transform(X_test_cate),
                                   columns=encoder.get_feature_names_out(X_train_cate.columns))

test_cate_encoded = pd.DataFrame(encoder.transform(test_data_cate),
                                   columns=encoder.get_feature_names_out(test_data_cate.columns))

print(f"\nthe shape after encoding X_train_cate_encoded  \n {X_train_cate_encoded.shape} \nX_test_cate_encoded X_test_cate is {X_test_cate_encoded.shape}")

     HouseStyle Neighborhood BldgType KitchenQual ExterQual
254      1Story        NAmes     1Fam          TA        TA
1066     2Story      Gilbert     1Fam          TA        Gd
638      1Story      Edwards     1Fam          TA        TA
799      1.5Fin        SWISU     1Fam          Gd        TA
380      1.5Fin        SWISU     1Fam          Gd        TA

the shape of X_train_cont is 
 (1168, 5) and 
X_test_cate is (292, 5)

the shape after encoding X_train_cate_encoded  
 (1168, 41) 
X_test_cate_encoded X_test_cate is (292, 41)


In [68]:
# see if the data is encoded or not
X_train_cate_encoded.head()

Unnamed: 0,HouseStyle_1.5Unf,HouseStyle_1Story,HouseStyle_2.5Fin,HouseStyle_2.5Unf,HouseStyle_2Story,HouseStyle_SFoyer,HouseStyle_SLvl,Neighborhood_Blueste,Neighborhood_BrDale,Neighborhood_BrkSide,...,BldgType_2fmCon,BldgType_Duplex,BldgType_Twnhs,BldgType_TwnhsE,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,ExterQual_Fa,ExterQual_Gd,ExterQual_TA
0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
2,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0


In [69]:
X_test_cate_encoded.head()

Unnamed: 0,HouseStyle_1.5Unf,HouseStyle_1Story,HouseStyle_2.5Fin,HouseStyle_2.5Unf,HouseStyle_2Story,HouseStyle_SFoyer,HouseStyle_SLvl,Neighborhood_Blueste,Neighborhood_BrDale,Neighborhood_BrkSide,...,BldgType_2fmCon,BldgType_Duplex,BldgType_Twnhs,BldgType_TwnhsE,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,ExterQual_Fa,ExterQual_Gd,ExterQual_TA
0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
1,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
2,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
4,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [70]:
# Concatenate back the scaled and encoded features

X_train_preprocessed = pd.concat([X_train_cont_scaled, X_train_cate_encoded], axis=1)
X_test_preprocessed = pd.concat([X_test_cont_scaled, X_test_cate_encoded], axis=1)
test_preprocessed = pd.concat([X_test_cont_scaled, X_test_cate_encoded], axis=1)
X_train_preprocessed.head()

Unnamed: 0,OverallQual,YearBuilt,GrLivArea,TotalBsmtSF,GarageArea,HouseStyle_1.5Unf,HouseStyle_1Story,HouseStyle_2.5Fin,HouseStyle_2.5Unf,HouseStyle_2Story,...,BldgType_2fmCon,BldgType_Duplex,BldgType_Twnhs,BldgType_TwnhsE,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,ExterQual_Fa,ExterQual_Gd,ExterQual_TA
0,-0.820445,-0.455469,-0.407093,0.572612,-0.863837,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
1,-0.088934,0.718609,0.08317,-0.596547,-0.456264,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0
2,-0.820445,-1.988293,-1.39525,-0.603357,-2.257169,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
3,-0.820445,-1.107734,0.458975,-0.750921,-1.119755,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0
4,-0.820445,-1.531707,0.312087,-0.081209,-0.797488,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0


In [71]:
X_test_preprocessed.head()

Unnamed: 0,OverallQual,YearBuilt,GrLivArea,TotalBsmtSF,GarageArea,HouseStyle_1.5Unf,HouseStyle_1Story,HouseStyle_2.5Fin,HouseStyle_2.5Unf,HouseStyle_2Story,...,BldgType_2fmCon,BldgType_Duplex,BldgType_Twnhs,BldgType_TwnhsE,KitchenQual_Fa,KitchenQual_Gd,KitchenQual_TA,ExterQual_Fa,ExterQual_Gd,ExterQual_TA
0,-0.007138,-0.335994,-0.75854,0.044012,-0.874821,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
1,1.38235,0.763181,2.19051,0.982892,1.126763,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0
2,-0.701882,-1.612456,-0.834449,-0.07451,-0.44591,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
3,-0.007138,-0.90331,0.372498,-0.083806,-0.177841,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0
4,2.077094,1.224126,0.288999,1.347753,2.020328,0.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0


## 4.Model Training
<br>

In [72]:
model = RandomForestRegressor(random_state=42)

# Fit the model on the training data
model.fit(X_train_preprocessed, y_train)

In [73]:
# Predict on the training set 
y_train_pred = model.predict(X_train_preprocessed)
y_test_pred = model.predict(X_test_preprocessed)

In [78]:
train_mse = mean_squared_error(y_train, y_train_pred)
test_mse = mean_squared_error(y_test, y_test_pred)
train_r2 = r2_score(y_train, y_train_pred)
test_r2 = r2_score(y_test, y_test_pred)

In [79]:
print(f"Training MSE: {train_mse}")
print(f"Test MSE: {test_mse}")
print(f"Training R^2: {train_r2}")
print(f"Test R^2: {test_r2}")

Training MSE: 139702130.3650075
Test MSE: 868949106.3762934
Training R^2: 0.9765779276776965
Test R^2: 0.8867128877045563


In [86]:
model.score(X_test_preprocessed,y_test)

0.8867128877045563

In [85]:
# now let's run on the real test data
test_pred = model.predict(test_preprocessed)

In [94]:
print("The first 10 predicted SalePrice values of the test set are :\n",test_pred[:10])

The first 10 predicted SalePrice values of the test set are :
 [140247.   359854.52 112630.   173667.   341441.01  81329.   209614.6
 161952.    81148.5  123214.86]


## 5.Model Evaluation
<br>

In [95]:
def compute_rmsle(y_test: np.ndarray, y_pred: np.ndarray, precision: int = 2) -> float:
    rmsle = np.sqrt(mean_squared_log_error(y_test, y_pred))
    return round(rmsle, precision)

train_rmsle = compute_rmsle(y_train, y_train_pred)
test_rmsle = compute_rmsle(y_test, y_test_pred)

print(f"Train RMSLE: {train_rmsle}")
print(f"Test RMSLE: {test_rmsle}")

Train RMSLE: 0.06
Test RMSLE: 0.17
