In [None]:
The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning housing in the area of Boston MA. The following describes the dataset columns:

CRIM - per capita crime rate by town
ZN - proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS - proportion of non-retail business acres per town.
CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise)
NOX - nitric oxides concentration (parts per 10 million)
RM - average number of rooms per dwelling
AGE - proportion of owner-occupied units built prior to 1940
DIS - weighted distances to five Boston employment centres
RAD - index of accessibility to radial highways
TAX - full-value property-tax rate per $10,000
PTRATIO - pupil-teacher ratio by town
B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
LSTAT - % lower status of the population
MEDV - Median value of owner-occupied homes in $1000's

In [3]:
from pycaret.datasets import get_data
from pycaret.regression import *

# Завантажуємо датасет
dataset = get_data("boston")
dataset.head()

# Встановлюємо середовище для регресії
reg_experiment = setup(data=dataset, target='medv', session_id=123)

# Порівнюємо доступні моделі
best_model = compare_models()

# Тюнінг кращої моделі
tuned_model = tune_model(best_model)

# Оцінка моделі
evaluate_model(tuned_model)

# Збереження моделі
save_model(tuned_model, 'boston_model')

Unnamed: 0,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,396.9,5.33,36.2


Unnamed: 0,Description,Value
0,Session id,123
1,Target,medv
2,Target type,Regression
3,Original data shape,"(506, 14)"
4,Transformed data shape,"(506, 14)"
5,Transformed train set shape,"(354, 14)"
6,Transformed test set shape,"(152, 14)"
7,Numeric features,13
8,Preprocess,True
9,Imputation type,simple


Unnamed: 0,Model,MAE,MSE,RMSE,R2,RMSLE,MAPE,TT (Sec)
et,Extra Trees Regressor,2.0378,9.4958,2.9283,0.8884,0.1298,0.1,0.027
gbr,Gradient Boosting Regressor,2.1751,9.694,3.0293,0.8786,0.1434,0.1117,0.018
rf,Random Forest Regressor,2.2316,10.631,3.1448,0.8673,0.1447,0.1129,0.038
lightgbm,Light Gradient Boosting Machine,2.3305,11.5722,3.2969,0.8566,0.1482,0.1157,0.01
ada,AdaBoost Regressor,2.8789,15.9344,3.8667,0.8055,0.1803,0.1517,0.016
dt,Decision Tree Regressor,2.9307,19.615,4.2538,0.7368,0.1909,0.1455,0.007
ridge,Ridge Regression,3.2793,22.8402,4.644,0.73,0.246,0.1664,0.006
lr,Linear Regression,3.3006,22.7209,4.6402,0.7295,0.2526,0.1668,0.823
br,Bayesian Ridge,3.3158,23.2769,4.6862,0.7258,0.2453,0.1673,0.007
lar,Least Angle Regression,3.3327,22.9251,4.6793,0.7236,0.2557,0.1725,0.006


Unnamed: 0_level_0,MAE,MSE,RMSE,R2,RMSLE,MAPE
Fold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,2.6595,13.4674,3.6698,0.8493,0.1394,0.0988
1,1.9068,6.7518,2.5984,0.8542,0.1191,0.1022
2,2.5426,9.53,3.0871,0.9048,0.1732,0.1542
3,3.0106,23.9678,4.8957,0.7637,0.2017,0.1597
4,2.5438,10.9426,3.308,0.9072,0.1524,0.1308
5,2.2067,9.133,3.0221,0.8301,0.1552,0.1207
6,1.6454,4.2093,2.0517,0.9376,0.0933,0.0788
7,3.0894,34.0583,5.8359,0.6875,0.2059,0.1324
8,2.0147,6.7065,2.5897,0.8801,0.1588,0.1282
9,2.2593,10.9488,3.3089,0.8733,0.1439,0.0972


Fitting 10 folds for each of 10 candidates, totalling 100 fits
Original model was better than the tuned model, hence it will be returned. NOTE: The display metrics are for the tuned model (not the original one).


interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Pipeline Plot', 'pipelin…

Transformation Pipeline and Model Successfully Saved


(Pipeline(memory=Memory(location=None),
          steps=[('numerical_imputer',
                  TransformerWrapper(include=['crim', 'zn', 'indus', 'chas',
                                              'nox', 'rm', 'age', 'dis', 'rad',
                                              'tax', 'ptratio', 'black',
                                              'lstat'],
                                     transformer=SimpleImputer())),
                 ('categorical_imputer',
                  TransformerWrapper(include=[],
                                     transformer=SimpleImputer(strategy='most_frequent'))),
                 ('trained_model',
                  ExtraTreesRegressor(n_jobs=-1, random_state=123))]),
 'boston_model.pkl')