# Rusty Bargain's App to Attract New Customers

## Introduction


Rusty Bargain used car sales service is developing an app to attract new customers. In that app, you can quickly find out the market value of your car. You have access to historical data: technical specifications, trim versions, and prices. You need to build the model to determine the value. 

Rusty Bargain is interested in:

- the quality of the prediction;
- the speed of the prediction;
- the time required for training

## Data Preprocessing

In [1]:
# imported libraries

import pandas as pd
import seaborn as sns
import numpy as np
import re

from matplotlib import pyplot as plt
from scipy import stats as st

from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.ensemble import RandomForestRegressor

from catboost import CatBoostRegressor
from lightgbm import LGBMRegressor

from sklearn.model_selection import train_test_split, GridSearchCV

from sklearn.metrics import mean_squared_error

from sklearn.preprocessing import StandardScaler

## Data preparation

In [None]:
# dataset - car_data
car_data = pd.read_csv('car_data.csv')
car_data.head()

Unnamed: 0,DateCrawled,Price,VehicleType,RegistrationYear,Gearbox,Power,Model,Mileage,RegistrationMonth,FuelType,Brand,NotRepaired,DateCreated,NumberOfPictures,PostalCode,LastSeen
0,24/03/2016 11:52,480,,1993,manual,0,golf,150000,0,petrol,volkswagen,,24/03/2016 00:00,0,70435,07/04/2016 03:16
1,24/03/2016 10:58,18300,coupe,2011,manual,190,,125000,5,gasoline,audi,yes,24/03/2016 00:00,0,66954,07/04/2016 01:46
2,14/03/2016 12:52,9800,suv,2004,auto,163,grand,125000,8,gasoline,jeep,,14/03/2016 00:00,0,90480,05/04/2016 12:47
3,17/03/2016 16:54,1500,small,2001,manual,75,golf,150000,6,petrol,volkswagen,no,17/03/2016 00:00,0,91074,17/03/2016 17:40
4,31/03/2016 17:25,3600,small,2008,manual,69,fabia,90000,7,gasoline,skoda,no,31/03/2016 00:00,0,60437,06/04/2016 10:17


In [3]:
# column names
column_names = car_data.columns.tolist()
display(column_names)

['DateCrawled',
 'Price',
 'VehicleType',
 'RegistrationYear',
 'Gearbox',
 'Power',
 'Model',
 'Mileage',
 'RegistrationMonth',
 'FuelType',
 'Brand',
 'NotRepaired',
 'DateCreated',
 'NumberOfPictures',
 'PostalCode',
 'LastSeen']

In [4]:
# converted column names into snakecase with _:
# regex pattern
car_data.columns = [re.sub(r'(?<!^)(?=[A-Z])', '_', col).lower() for col in car_data.columns]
print(car_data.columns)

Index(['date_crawled', 'price', 'vehicle_type', 'registration_year', 'gearbox',
       'power', 'model', 'mileage', 'registration_month', 'fuel_type', 'brand',
       'not_repaired', 'date_created', 'number_of_pictures', 'postal_code',
       'last_seen'],
      dtype='object')


In [5]:
# Know the total number of rows and columns in the dataset
n_rows, n_cols = car_data.shape

print(f"The DataFrame has {n_rows} rows and {n_cols} columns")

The DataFrame has 354369 rows and 16 columns


In [6]:
car_data.describe()

Unnamed: 0,price,registration_year,power,mileage,registration_month,number_of_pictures,postal_code
count,354369.0,354369.0,354369.0,354369.0,354369.0,354369.0,354369.0
mean,4416.656776,2004.234448,110.094337,128211.172535,5.714645,0.0,50508.689087
std,4514.158514,90.227958,189.850405,37905.34153,3.726421,0.0,25783.096248
min,0.0,1000.0,0.0,5000.0,0.0,0.0,1067.0
25%,1050.0,1999.0,69.0,125000.0,3.0,0.0,30165.0
50%,2700.0,2003.0,105.0,150000.0,6.0,0.0,49413.0
75%,6400.0,2008.0,143.0,150000.0,9.0,0.0,71083.0
max,20000.0,9999.0,20000.0,150000.0,12.0,0.0,99998.0


In [7]:
car_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 354369 entries, 0 to 354368
Data columns (total 16 columns):
 #   Column              Non-Null Count   Dtype 
---  ------              --------------   ----- 
 0   date_crawled        354369 non-null  object
 1   price               354369 non-null  int64 
 2   vehicle_type        316879 non-null  object
 3   registration_year   354369 non-null  int64 
 4   gearbox             334536 non-null  object
 5   power               354369 non-null  int64 
 6   model               334664 non-null  object
 7   mileage             354369 non-null  int64 
 8   registration_month  354369 non-null  int64 
 9   fuel_type           321474 non-null  object
 10  brand               354369 non-null  object
 11  not_repaired        283215 non-null  object
 12  date_created        354369 non-null  object
 13  number_of_pictures  354369 non-null  int64 
 14  postal_code         354369 non-null  int64 
 15  last_seen           354369 non-null  object
dtypes:

In [8]:
# total number of duplicates in the dataset
duplicates = car_data.duplicated().sum()
display(duplicates)

262

In [9]:
# drop the duplicated found in the dataset
car_data.drop_duplicates(inplace=True)

In [10]:
# determining number of missing values in this dataset
car_data.isna().sum() 

date_crawled              0
price                     0
vehicle_type          37484
registration_year         0
gearbox               19830
power                     0
model                 19701
mileage                   0
registration_month        0
fuel_type             32889
brand                     0
not_repaired          71145
date_created              0
number_of_pictures        0
postal_code               0
last_seen                 0
dtype: int64

In [11]:
# filling column 'vehicle_type' missing values with 'unknown'
car_data['vehicle_type'] = car_data['vehicle_type'].fillna('unknown')

In [12]:
# filling column 'gearbox' missing values with 'unknown'
car_data['gearbox'] = car_data['gearbox'].fillna('unknown')

In [13]:
# filling column 'model' missing values with 'unknown'
car_data['model'] = car_data['model'].fillna('unknown')

In [14]:
# filling column 'fuel_type' missing values with 'unknown'
car_data['fuel_type'] = car_data['fuel_type'].fillna('unknown')

In [15]:
# filling column 'not_repaired' missing values with 'no info'
car_data['not_repaired'] = car_data['not_repaired'].fillna('no info')

In [16]:
# finding out if any missing values are left after filling them
car_data.isna().sum() 

date_crawled          0
price                 0
vehicle_type          0
registration_year     0
gearbox               0
power                 0
model                 0
mileage               0
registration_month    0
fuel_type             0
brand                 0
not_repaired          0
date_created          0
number_of_pictures    0
postal_code           0
last_seen             0
dtype: int64

By analyzing the dataset, there are 262 duplicates and many columns had many missing values found. So, 262 duplicates were dropeed and missing values among all the columns were filled using fillna with 'unknown' expect 'not_repaired' column was filled with 'no info'. 

## Model training

In [17]:
# dropped the columns mentioned below, as I don't fell they are important for the model
car_data = car_data.drop(['date_crawled','date_created', 'last_seen','number_of_pictures', 'postal_code'], axis=1)

In [18]:
car_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 354107 entries, 0 to 354368
Data columns (total 11 columns):
 #   Column              Non-Null Count   Dtype 
---  ------              --------------   ----- 
 0   price               354107 non-null  int64 
 1   vehicle_type        354107 non-null  object
 2   registration_year   354107 non-null  int64 
 3   gearbox             354107 non-null  object
 4   power               354107 non-null  int64 
 5   model               354107 non-null  object
 6   mileage             354107 non-null  int64 
 7   registration_month  354107 non-null  int64 
 8   fuel_type           354107 non-null  object
 9   brand               354107 non-null  object
 10  not_repaired        354107 non-null  object
dtypes: int64(5), object(6)
memory usage: 32.4+ MB


In [19]:
# Use one-hot encoder to convert categorical columns to numerical columns
car_data = pd.get_dummies(car_data, columns=['vehicle_type', 'gearbox', 'model', 'fuel_type', 'brand', 'not_repaired'], drop_first=True)

### Splitting the dataset

In [20]:
car_data_train, car_data_temp = train_test_split(car_data, test_size=0.40, random_state=7)

In [21]:
features_train = car_data_train.drop(['price'], axis=1)
print(features_train.shape)

(212464, 312)


In [22]:
target_train = car_data_train['price']
print(target_train.shape)

(212464,)


In [23]:
car_data_valid, car_data_test = train_test_split(car_data_temp, test_size=0.50, random_state=7)

In [24]:
features_valid = car_data_valid.drop(['price'], axis=1)
print(features_valid.shape)

(70821, 312)


In [25]:
target_valid = car_data_valid['price']
print(target_valid.shape)

(70821,)


In [26]:
features_test = car_data_test.drop(['price'], axis=1)
print(features_test.shape)

(70822, 312)


In [27]:
target_test = car_data_test['price']
print(target_test.shape)

(70822,)


In [28]:
# Scaling numerical columns
num_cols = ['power', 'mileage', 'registration_month']

scaler = StandardScaler()

car_data_train = car_data_train.copy()  
car_data_train[num_cols] = scaler.fit_transform(car_data_train[num_cols])
car_data_valid[num_cols] = scaler.transform(car_data_valid[num_cols])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  car_data_valid[num_cols] = scaler.transform(car_data_valid[num_cols])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(loc, value[:, i].tolist(), pi)


### Linear Regression

In [29]:
%%time
# Linear Regression model for Sanity Check
model = LinearRegression() 
model.fit(features_train, target_train)

CPU times: user 6.7 s, sys: 997 ms, total: 7.7 s
Wall time: 7.64 s


LinearRegression()

In [30]:
%%time
predictions_valid = model.predict(features_valid)
display(predictions_valid)

array([2695.09580508, 1930.12587265,  276.6884671 , ..., 5633.38128708,
        159.06049967, 7214.68358692])

CPU times: user 44.7 ms, sys: 82.7 ms, total: 127 ms
Wall time: 113 ms


In [31]:
rmse = mean_squared_error(target_valid, predictions_valid) ** 0.5 
print("RMSE of the linear regression model on the validation set:", rmse)

RMSE of the linear regression model on the validation set: 3169.2133274352436


Linear Regression is used for Sanity Check for this project. When running the above model, RMSE is 3169.2133274352436. RMSE measures how far predictions are from actual values. A lower RMSE means better model performance. An RMSE of 3169 and the mean of target column 'price' is 4417 representing that RMSE is almost as large as the mean price suggesting that the model's predictions deviate significantly, and thus not much reliable. The training time for model fiting is 7.7ms while for making predictions is 127ms. It shows that Linear Regression is very fast compared to complex models like Random Forest or Gradient Boosting and Training is slower than predictions which is normal. 

### Random Forest Regression

In [32]:
param_grid_rf = {
    "n_estimators": [10, 20, 50], 
    "max_depth": [5, 10],  
    "min_samples_split": [2, 5],  
    "min_samples_leaf": [1, 2] 
}

In [33]:
rf = RandomForestRegressor(random_state=7)  

In [34]:
# Perform Grid Search
grid_search_rf = GridSearchCV(rf, param_grid_rf, cv=4, scoring='neg_root_mean_squared_error', verbose=1)

In [35]:
%%time
# Fit the model
grid_search_rf.fit(features_train, target_train)

Fitting 4 folds for each of 24 candidates, totalling 96 fits
CPU times: user 39min 7s, sys: 6.85 s, total: 39min 14s
Wall time: 39min 16s


GridSearchCV(cv=4, estimator=RandomForestRegressor(random_state=7),
             param_grid={'max_depth': [5, 10], 'min_samples_leaf': [1, 2],
                         'min_samples_split': [2, 5],
                         'n_estimators': [10, 20, 50]},
             scoring='neg_root_mean_squared_error', verbose=1)

In [36]:
# Get best parameters and best model
best_params_rf = grid_search_rf.best_params_
best_model_rf = grid_search_rf.best_estimator_

In [49]:
display(best_params_rf)

{'max_depth': 10,
 'min_samples_leaf': 2,
 'min_samples_split': 2,
 'n_estimators': 50}

In [51]:
%%time
rf = RandomForestRegressor(random_state=7, 
                           max_depth=best_params_rf['max_depth'],
                           min_samples_leaf=best_params_rf['min_samples_leaf'],
                           min_samples_split=best_params_rf['min_samples_split'],
                           n_estimators=best_params_rf['n_estimators']
                           ) 
rf.fit(features_train, target_train)

CPU times: user 1min 17s, sys: 64 ms, total: 1min 17s
Wall time: 1min 17s


RandomForestRegressor(max_depth=10, min_samples_leaf=2, n_estimators=50,
                      random_state=7)

In [52]:
%%time
# Predict using the best model
predictions_valid_rf = rf.predict(features_valid)

CPU times: user 236 ms, sys: 20 ms, total: 256 ms
Wall time: 262 ms


In [38]:
# Calculate Mean Absolute Error
mse_rf = mean_squared_error(target_valid, predictions_valid_rf)
rmse_rf = mse_rf ** 0.5

print("Root Mean Squared Error:", rmse_rf)
print("Best Parameters:", best_params_rf)

Root Mean Squared Error: 2018.2058209696345
Best Parameters: {'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 50}


When running the Random Forest Regressor model with best parameters (max_depth: 10, min_samples_leaf: 2, min_samples_split: 5, n_estimators: 50), RMSE is 2018.2058209696345 suggesting that the model's predictions on average deviate by ~2018 from the actual values. The training time for model fiting is 1min 17s while for making predictions is 256ms. Random Forest trains much slower because it grows many independent decision trees, but it is still the best model based on RMSE. Also, prediction time is slowest becasue it needs to aggregrate results from many tress. 

### CatBoost Regression

In [39]:
# Using CatBoostRegressor model to get the best model
param_grid_cbr = {
    'learning_rate': [0.01, 0.05, 0.1],  
    'iterations': [10, 20, 30],         
    'depth': [1, 2, 5, 10]              
}

In [40]:
# Initialize CatBoost model
cbr = CatBoostRegressor(random_seed=7, loss_function="RMSE")  

# Grid search with 4-fold cross-validation
grid_search_cbr = GridSearchCV(cbr, param_grid_cbr, cv=4, verbose=1)

In [41]:
%%time
# Fit the model
grid_search_cbr.fit(features_train, target_train)

# Get best parameters and best model
best_params_cbr = grid_search_cbr.best_params_
best_model_cbr = grid_search_cbr.best_estimator_

Fitting 4 folds for each of 36 candidates, totalling 144 fits
0:	learn: 4496.7995412	total: 59.6ms	remaining: 536ms
1:	learn: 4483.8656752	total: 70.7ms	remaining: 283ms
2:	learn: 4471.1528770	total: 81ms	remaining: 189ms
3:	learn: 4458.6578797	total: 93.1ms	remaining: 140ms
4:	learn: 4446.4228049	total: 105ms	remaining: 105ms
5:	learn: 4434.3233669	total: 115ms	remaining: 76.7ms
6:	learn: 4422.4325757	total: 125ms	remaining: 53.7ms
7:	learn: 4410.7472953	total: 136ms	remaining: 33.9ms
8:	learn: 4399.2644239	total: 146ms	remaining: 16.2ms
9:	learn: 4387.9396261	total: 156ms	remaining: 0us
0:	learn: 4502.4907229	total: 10.1ms	remaining: 91.3ms
1:	learn: 4489.6618206	total: 21.1ms	remaining: 84.4ms
2:	learn: 4476.8750349	total: 31.5ms	remaining: 73.5ms
3:	learn: 4464.3071569	total: 42ms	remaining: 62.9ms
4:	learn: 4452.0685296	total: 52.3ms	remaining: 52.3ms
5:	learn: 4439.8978986	total: 62.7ms	remaining: 41.8ms
6:	learn: 4428.0491866	total: 73ms	remaining: 31.3ms
7:	learn: 4416.2649495	

19:	learn: 4289.5167838	total: 212ms	remaining: 0us
0:	learn: 4506.5829621	total: 9.86ms	remaining: 187ms
1:	learn: 4493.5483358	total: 20.4ms	remaining: 184ms
2:	learn: 4480.7362935	total: 30.8ms	remaining: 175ms
3:	learn: 4468.1435515	total: 41.2ms	remaining: 165ms
4:	learn: 4455.7668601	total: 51.4ms	remaining: 154ms
5:	learn: 4443.6876499	total: 61.5ms	remaining: 144ms
6:	learn: 4431.7030566	total: 71.7ms	remaining: 133ms
7:	learn: 4420.0090801	total: 81.9ms	remaining: 123ms
8:	learn: 4408.4058307	total: 92.2ms	remaining: 113ms
9:	learn: 4397.0864450	total: 102ms	remaining: 102ms
10:	learn: 4385.8539929	total: 113ms	remaining: 92.1ms
11:	learn: 4374.8170724	total: 123ms	remaining: 81.7ms
12:	learn: 4364.0267628	total: 133ms	remaining: 71.5ms
13:	learn: 4353.5641461	total: 143ms	remaining: 61.2ms
14:	learn: 4343.1251978	total: 153ms	remaining: 51ms
15:	learn: 4332.7416645	total: 163ms	remaining: 40.9ms
16:	learn: 4322.6421399	total: 174ms	remaining: 30.6ms
17:	learn: 4312.5949096	to

19:	learn: 3382.5902468	total: 210ms	remaining: 0us
0:	learn: 4393.5059040	total: 9.91ms	remaining: 188ms
1:	learn: 4284.0224501	total: 20.4ms	remaining: 184ms
2:	learn: 4193.2450084	total: 30.9ms	remaining: 175ms
3:	learn: 4107.9465725	total: 41.3ms	remaining: 165ms
4:	learn: 4031.8607084	total: 51.4ms	remaining: 154ms
5:	learn: 3962.1665872	total: 61.7ms	remaining: 144ms
6:	learn: 3899.1916915	total: 72.1ms	remaining: 134ms
7:	learn: 3840.2637400	total: 82.3ms	remaining: 124ms
8:	learn: 3785.8844389	total: 92.8ms	remaining: 113ms
9:	learn: 3733.5666379	total: 103ms	remaining: 103ms
10:	learn: 3688.3379479	total: 113ms	remaining: 92.7ms
11:	learn: 3644.8255158	total: 124ms	remaining: 82.5ms
12:	learn: 3606.7940672	total: 134ms	remaining: 72.2ms
13:	learn: 3568.5268872	total: 144ms	remaining: 61.9ms
14:	learn: 3533.1045489	total: 155ms	remaining: 51.6ms
15:	learn: 3498.6004446	total: 165ms	remaining: 41.3ms
16:	learn: 3469.0879560	total: 175ms	remaining: 30.9ms
17:	learn: 3438.4642524	

0:	learn: 4445.1169067	total: 10.1ms	remaining: 293ms
1:	learn: 4385.7748338	total: 20.6ms	remaining: 289ms
2:	learn: 4330.8668423	total: 30.7ms	remaining: 276ms
3:	learn: 4280.6832220	total: 41.1ms	remaining: 267ms
4:	learn: 4234.3060184	total: 52ms	remaining: 260ms
5:	learn: 4190.3136903	total: 62.1ms	remaining: 248ms
6:	learn: 4147.8016147	total: 72.3ms	remaining: 238ms
7:	learn: 4108.0009530	total: 82.5ms	remaining: 227ms
8:	learn: 4068.9103720	total: 92.8ms	remaining: 216ms
9:	learn: 4033.1138240	total: 103ms	remaining: 206ms
10:	learn: 3998.0828136	total: 113ms	remaining: 195ms
11:	learn: 3965.0994066	total: 123ms	remaining: 185ms
12:	learn: 3933.1404197	total: 133ms	remaining: 174ms
13:	learn: 3903.2601751	total: 144ms	remaining: 164ms
14:	learn: 3874.1314326	total: 154ms	remaining: 154ms
15:	learn: 3844.7887435	total: 164ms	remaining: 143ms
16:	learn: 3817.2373819	total: 174ms	remaining: 133ms
17:	learn: 3790.9545803	total: 184ms	remaining: 123ms
18:	learn: 3765.1986422	total: 

20:	learn: 3357.8048250	total: 218ms	remaining: 93.3ms
21:	learn: 3334.2239207	total: 228ms	remaining: 83.1ms
22:	learn: 3311.7323649	total: 239ms	remaining: 72.8ms
23:	learn: 3289.6873693	total: 249ms	remaining: 62.3ms
24:	learn: 3269.0738310	total: 260ms	remaining: 52ms
25:	learn: 3249.8080087	total: 270ms	remaining: 41.6ms
26:	learn: 3230.7313298	total: 281ms	remaining: 31.2ms
27:	learn: 3210.4393648	total: 292ms	remaining: 20.8ms
28:	learn: 3193.1419973	total: 302ms	remaining: 10.4ms
29:	learn: 3175.5893957	total: 312ms	remaining: 0us
0:	learn: 4393.5059040	total: 10.1ms	remaining: 294ms
1:	learn: 4284.0224501	total: 20.9ms	remaining: 292ms
2:	learn: 4193.2450084	total: 31.4ms	remaining: 282ms
3:	learn: 4107.9465725	total: 41.8ms	remaining: 272ms
4:	learn: 4031.8607084	total: 52.1ms	remaining: 260ms
5:	learn: 3962.1665872	total: 62.4ms	remaining: 249ms
6:	learn: 3899.1916915	total: 72.7ms	remaining: 239ms
7:	learn: 3840.2637400	total: 83.6ms	remaining: 230ms
8:	learn: 3785.8844389	

0:	learn: 4317.1166742	total: 11.3ms	remaining: 102ms
1:	learn: 4149.0130549	total: 23.4ms	remaining: 93.6ms
2:	learn: 4001.6870304	total: 37.1ms	remaining: 86.5ms
3:	learn: 3873.0672699	total: 51ms	remaining: 76.6ms
4:	learn: 3765.0590428	total: 64.4ms	remaining: 64.4ms
5:	learn: 3652.1662947	total: 76.4ms	remaining: 50.9ms
6:	learn: 3563.3662470	total: 88.3ms	remaining: 37.8ms
7:	learn: 3474.8339827	total: 100ms	remaining: 25.1ms
8:	learn: 3405.9682966	total: 113ms	remaining: 12.5ms
9:	learn: 3328.3282564	total: 124ms	remaining: 0us
0:	learn: 4325.5678992	total: 11.6ms	remaining: 105ms
1:	learn: 4153.8621785	total: 23.6ms	remaining: 94.5ms
2:	learn: 4004.2415143	total: 35.3ms	remaining: 82.4ms
3:	learn: 3880.6947420	total: 47.2ms	remaining: 70.9ms
4:	learn: 3757.4609310	total: 59.1ms	remaining: 59.1ms
5:	learn: 3656.9607569	total: 70.9ms	remaining: 47.3ms
6:	learn: 3561.2094065	total: 82.8ms	remaining: 35.5ms
7:	learn: 3480.2352041	total: 94.5ms	remaining: 23.6ms
8:	learn: 3397.71767

17:	learn: 3418.0543495	total: 213ms	remaining: 23.6ms
18:	learn: 3377.5048878	total: 224ms	remaining: 11.8ms
19:	learn: 3340.9415231	total: 240ms	remaining: 0us
0:	learn: 4424.5498215	total: 11.5ms	remaining: 219ms
1:	learn: 4332.9372480	total: 23.4ms	remaining: 210ms
2:	learn: 4247.1355178	total: 35ms	remaining: 198ms
3:	learn: 4167.2069491	total: 46.5ms	remaining: 186ms
4:	learn: 4091.9197650	total: 57.9ms	remaining: 174ms
5:	learn: 4021.0092770	total: 69.6ms	remaining: 162ms
6:	learn: 3956.0172553	total: 81.4ms	remaining: 151ms
7:	learn: 3896.3645194	total: 93.3ms	remaining: 140ms
8:	learn: 3837.5824279	total: 105ms	remaining: 129ms
9:	learn: 3779.1417235	total: 117ms	remaining: 117ms
10:	learn: 3727.3323053	total: 129ms	remaining: 106ms
11:	learn: 3678.9647421	total: 141ms	remaining: 93.9ms
12:	learn: 3635.0037974	total: 153ms	remaining: 82.2ms
13:	learn: 3586.2717632	total: 165ms	remaining: 70.5ms
14:	learn: 3541.7149894	total: 176ms	remaining: 58.7ms
15:	learn: 3497.5755397	tota

0:	learn: 4499.6943130	total: 11.5ms	remaining: 333ms
1:	learn: 4479.6580509	total: 23.6ms	remaining: 330ms
2:	learn: 4459.9331315	total: 35.3ms	remaining: 317ms
3:	learn: 4440.5855465	total: 46.8ms	remaining: 304ms
4:	learn: 4421.5107567	total: 58.4ms	remaining: 292ms
5:	learn: 4402.8654435	total: 70ms	remaining: 280ms
6:	learn: 4384.3376424	total: 81.7ms	remaining: 269ms
7:	learn: 4365.9600102	total: 93.2ms	remaining: 256ms
8:	learn: 4348.0469841	total: 105ms	remaining: 245ms
9:	learn: 4330.2655267	total: 116ms	remaining: 233ms
10:	learn: 4312.7862749	total: 128ms	remaining: 221ms
11:	learn: 4295.4370997	total: 140ms	remaining: 210ms
12:	learn: 4278.3648803	total: 151ms	remaining: 198ms
13:	learn: 4261.5684698	total: 163ms	remaining: 186ms
14:	learn: 4245.0424677	total: 175ms	remaining: 175ms
15:	learn: 4228.7155231	total: 186ms	remaining: 163ms
16:	learn: 4212.7114854	total: 198ms	remaining: 151ms
17:	learn: 4196.9261927	total: 210ms	remaining: 140ms
18:	learn: 4181.1959473	total: 2

17:	learn: 3419.9497671	total: 212ms	remaining: 141ms
18:	learn: 3386.0823492	total: 224ms	remaining: 130ms
19:	learn: 3351.8715980	total: 236ms	remaining: 118ms
20:	learn: 3314.6197549	total: 247ms	remaining: 106ms
21:	learn: 3283.2122254	total: 259ms	remaining: 94.1ms
22:	learn: 3250.3038472	total: 271ms	remaining: 82.3ms
23:	learn: 3221.6709650	total: 282ms	remaining: 70.6ms
24:	learn: 3193.9828592	total: 294ms	remaining: 58.8ms
25:	learn: 3169.1879991	total: 306ms	remaining: 47ms
26:	learn: 3144.6488869	total: 317ms	remaining: 35.2ms
27:	learn: 3122.2785122	total: 328ms	remaining: 23.4ms
28:	learn: 3096.5432515	total: 340ms	remaining: 11.7ms
29:	learn: 3076.7127153	total: 351ms	remaining: 0us
0:	learn: 4312.6536360	total: 11.4ms	remaining: 332ms
1:	learn: 4145.5546714	total: 23.9ms	remaining: 335ms
2:	learn: 3994.9485701	total: 35.5ms	remaining: 320ms
3:	learn: 3868.6367832	total: 47.2ms	remaining: 307ms
4:	learn: 3746.2017672	total: 58.9ms	remaining: 295ms
5:	learn: 3645.2253480	t

0:	learn: 4493.1600389	total: 15.5ms	remaining: 140ms
1:	learn: 4466.1058944	total: 31.2ms	remaining: 125ms
2:	learn: 4439.3566488	total: 46.7ms	remaining: 109ms
3:	learn: 4413.5973713	total: 62.5ms	remaining: 93.8ms
4:	learn: 4387.2353855	total: 79.1ms	remaining: 79.1ms
5:	learn: 4362.2933004	total: 95.4ms	remaining: 63.6ms
6:	learn: 4337.0247325	total: 112ms	remaining: 47.9ms
7:	learn: 4311.9205357	total: 127ms	remaining: 31.8ms
8:	learn: 4288.0026078	total: 143ms	remaining: 15.9ms
9:	learn: 4263.7229986	total: 159ms	remaining: 0us
0:	learn: 4497.6143637	total: 15.7ms	remaining: 142ms
1:	learn: 4471.2047353	total: 31.7ms	remaining: 127ms
2:	learn: 4444.5373601	total: 47.3ms	remaining: 110ms
3:	learn: 4418.8358795	total: 63.2ms	remaining: 94.7ms
4:	learn: 4392.6569468	total: 78.9ms	remaining: 78.9ms
5:	learn: 4367.8148273	total: 95ms	remaining: 63.3ms
6:	learn: 4343.4928052	total: 111ms	remaining: 47.7ms
7:	learn: 4317.5334566	total: 127ms	remaining: 31.8ms
8:	learn: 4292.5941887	tota

13:	learn: 4170.8392842	total: 225ms	remaining: 96.6ms
14:	learn: 4148.0692751	total: 242ms	remaining: 80.7ms
15:	learn: 4125.1250459	total: 258ms	remaining: 64.5ms
16:	learn: 4103.1863009	total: 274ms	remaining: 48.4ms
17:	learn: 4081.1056039	total: 290ms	remaining: 32.2ms
18:	learn: 4059.0525454	total: 306ms	remaining: 16.1ms
19:	learn: 4037.7224207	total: 322ms	remaining: 0us
0:	learn: 4497.6143637	total: 15.7ms	remaining: 299ms
1:	learn: 4471.2047353	total: 31.8ms	remaining: 286ms
2:	learn: 4444.5373601	total: 47.4ms	remaining: 268ms
3:	learn: 4418.8358795	total: 62.9ms	remaining: 252ms
4:	learn: 4392.6569468	total: 80.7ms	remaining: 242ms
5:	learn: 4367.8148273	total: 100ms	remaining: 234ms
6:	learn: 4343.4928052	total: 119ms	remaining: 221ms
7:	learn: 4317.5334566	total: 135ms	remaining: 203ms
8:	learn: 4292.5941887	total: 151ms	remaining: 184ms
9:	learn: 4268.2585747	total: 167ms	remaining: 167ms
10:	learn: 4244.8209599	total: 182ms	remaining: 149ms
11:	learn: 4221.8732300	total

12:	learn: 2738.4091099	total: 217ms	remaining: 117ms
13:	learn: 2683.8110121	total: 232ms	remaining: 99.3ms
14:	learn: 2635.1290608	total: 247ms	remaining: 82.5ms
15:	learn: 2596.7590506	total: 263ms	remaining: 65.7ms
16:	learn: 2559.6706139	total: 278ms	remaining: 49.1ms
17:	learn: 2527.5958711	total: 293ms	remaining: 32.6ms
18:	learn: 2499.9409897	total: 308ms	remaining: 16.2ms
19:	learn: 2471.5368376	total: 323ms	remaining: 0us
0:	learn: 4254.0490544	total: 15.7ms	remaining: 299ms
1:	learn: 4019.7612887	total: 32ms	remaining: 288ms
2:	learn: 3817.1126143	total: 47.7ms	remaining: 270ms
3:	learn: 3634.6585465	total: 63.7ms	remaining: 255ms
4:	learn: 3479.7387371	total: 80ms	remaining: 240ms
5:	learn: 3336.1770565	total: 95.5ms	remaining: 223ms
6:	learn: 3219.0888720	total: 112ms	remaining: 208ms
7:	learn: 3108.6106103	total: 127ms	remaining: 191ms
8:	learn: 3017.0572832	total: 143ms	remaining: 175ms
9:	learn: 2929.5520010	total: 159ms	remaining: 159ms
10:	learn: 2858.2908781	total: 1

13:	learn: 3223.4434177	total: 223ms	remaining: 255ms
14:	learn: 3168.8122026	total: 240ms	remaining: 240ms
15:	learn: 3115.2434927	total: 260ms	remaining: 227ms
16:	learn: 3065.9727762	total: 276ms	remaining: 211ms
17:	learn: 3019.5713084	total: 291ms	remaining: 194ms
18:	learn: 2978.4944675	total: 308ms	remaining: 178ms
19:	learn: 2935.8910201	total: 323ms	remaining: 162ms
20:	learn: 2899.0329461	total: 339ms	remaining: 145ms
21:	learn: 2862.1630824	total: 355ms	remaining: 129ms
22:	learn: 2828.4449963	total: 370ms	remaining: 113ms
23:	learn: 2796.2708016	total: 386ms	remaining: 96.5ms
24:	learn: 2765.2620999	total: 402ms	remaining: 80.3ms
25:	learn: 2737.5903604	total: 418ms	remaining: 64.3ms
26:	learn: 2712.1309293	total: 434ms	remaining: 48.2ms
27:	learn: 2685.5524990	total: 453ms	remaining: 32.3ms
28:	learn: 2662.4644966	total: 468ms	remaining: 16.1ms
29:	learn: 2641.4243583	total: 483ms	remaining: 0us
0:	learn: 4378.5504748	total: 15.2ms	remaining: 441ms
1:	learn: 4252.5103222	t

26:	learn: 2325.3087424	total: 432ms	remaining: 48ms
27:	learn: 2311.6917265	total: 451ms	remaining: 32.2ms
28:	learn: 2299.7934481	total: 466ms	remaining: 16.1ms
29:	learn: 2287.9133085	total: 481ms	remaining: 0us
0:	learn: 4260.2163481	total: 16.2ms	remaining: 469ms
1:	learn: 4019.7767633	total: 32ms	remaining: 448ms
2:	learn: 3813.2248718	total: 49.7ms	remaining: 447ms
3:	learn: 3637.2127282	total: 67.7ms	remaining: 440ms
4:	learn: 3472.2465265	total: 84.4ms	remaining: 422ms
5:	learn: 3333.6093411	total: 101ms	remaining: 405ms
6:	learn: 3215.7995636	total: 118ms	remaining: 388ms
7:	learn: 3107.8589222	total: 134ms	remaining: 369ms
8:	learn: 3010.7855854	total: 151ms	remaining: 352ms
9:	learn: 2929.1383237	total: 168ms	remaining: 335ms
10:	learn: 2857.6395253	total: 184ms	remaining: 318ms
11:	learn: 2793.1104723	total: 199ms	remaining: 299ms
12:	learn: 2738.4091099	total: 215ms	remaining: 281ms
13:	learn: 2683.8110121	total: 231ms	remaining: 264ms
14:	learn: 2635.1290608	total: 252ms

0:	learn: 4207.8620878	total: 38.9ms	remaining: 350ms
1:	learn: 3939.0552286	total: 77.7ms	remaining: 311ms
2:	learn: 3707.9078342	total: 116ms	remaining: 271ms
3:	learn: 3500.3885016	total: 154ms	remaining: 232ms
4:	learn: 3316.0495435	total: 193ms	remaining: 193ms
5:	learn: 3158.4604705	total: 231ms	remaining: 154ms
6:	learn: 3016.3028749	total: 274ms	remaining: 117ms
7:	learn: 2894.9689906	total: 313ms	remaining: 78.2ms
8:	learn: 2789.4155070	total: 351ms	remaining: 39ms
9:	learn: 2701.5434815	total: 390ms	remaining: 0us
0:	learn: 4219.0820851	total: 38.3ms	remaining: 345ms
1:	learn: 3952.0167773	total: 77.4ms	remaining: 310ms
2:	learn: 3716.1260017	total: 116ms	remaining: 270ms
3:	learn: 3507.7713414	total: 154ms	remaining: 231ms
4:	learn: 3324.6985751	total: 197ms	remaining: 197ms
5:	learn: 3164.1476291	total: 238ms	remaining: 159ms
6:	learn: 3022.4847427	total: 276ms	remaining: 118ms
7:	learn: 2896.9741302	total: 315ms	remaining: 78.7ms
8:	learn: 2786.8016624	total: 353ms	remaini

5:	learn: 3726.4138325	total: 240ms	remaining: 561ms
6:	learn: 3620.1507214	total: 279ms	remaining: 519ms
7:	learn: 3519.3666208	total: 322ms	remaining: 484ms
8:	learn: 3426.9073865	total: 361ms	remaining: 441ms
9:	learn: 3337.6806414	total: 400ms	remaining: 400ms
10:	learn: 3254.4821563	total: 438ms	remaining: 359ms
11:	learn: 3179.2254220	total: 477ms	remaining: 318ms
12:	learn: 3107.1028403	total: 516ms	remaining: 278ms
13:	learn: 3040.4676334	total: 555ms	remaining: 238ms
14:	learn: 2979.0800236	total: 593ms	remaining: 198ms
15:	learn: 2919.6157382	total: 633ms	remaining: 158ms
16:	learn: 2863.6790146	total: 672ms	remaining: 119ms
17:	learn: 2811.6856550	total: 715ms	remaining: 79.4ms
18:	learn: 2763.1407405	total: 754ms	remaining: 39.7ms
19:	learn: 2718.2050936	total: 794ms	remaining: 0us
0:	learn: 4370.1085062	total: 38.3ms	remaining: 728ms
1:	learn: 4224.5198608	total: 77.2ms	remaining: 695ms
2:	learn: 4090.5486799	total: 116ms	remaining: 659ms
3:	learn: 3962.5456173	total: 155m

12:	learn: 4141.3909715	total: 509ms	remaining: 665ms
13:	learn: 4115.1024289	total: 547ms	remaining: 625ms
14:	learn: 4089.1024733	total: 590ms	remaining: 590ms
15:	learn: 4063.5115301	total: 629ms	remaining: 550ms
16:	learn: 4037.8529344	total: 668ms	remaining: 511ms
17:	learn: 4013.0326908	total: 706ms	remaining: 471ms
18:	learn: 3988.4954415	total: 745ms	remaining: 431ms
19:	learn: 3963.4237979	total: 788ms	remaining: 394ms
20:	learn: 3939.1213937	total: 831ms	remaining: 356ms
21:	learn: 3915.1604854	total: 870ms	remaining: 316ms
22:	learn: 3891.3965098	total: 909ms	remaining: 277ms
23:	learn: 3868.3211831	total: 948ms	remaining: 237ms
24:	learn: 3845.2209816	total: 987ms	remaining: 197ms
25:	learn: 3822.3200229	total: 1.02s	remaining: 158ms
26:	learn: 3799.8825223	total: 1.06s	remaining: 118ms
27:	learn: 3777.5552211	total: 1.1s	remaining: 78.8ms
28:	learn: 3754.9482281	total: 1.14s	remaining: 39.4ms
29:	learn: 3733.0814335	total: 1.18s	remaining: 0us
0:	learn: 4489.0979797	total:

18:	learn: 2763.1407405	total: 749ms	remaining: 434ms
19:	learn: 2718.2050936	total: 793ms	remaining: 397ms
20:	learn: 2676.6760923	total: 833ms	remaining: 357ms
21:	learn: 2637.8288843	total: 871ms	remaining: 317ms
22:	learn: 2600.4386758	total: 910ms	remaining: 277ms
23:	learn: 2565.0431370	total: 949ms	remaining: 237ms
24:	learn: 2531.4251690	total: 988ms	remaining: 198ms
25:	learn: 2502.0492093	total: 1.03s	remaining: 158ms
26:	learn: 2472.7686371	total: 1.07s	remaining: 119ms
27:	learn: 2444.6855300	total: 1.11s	remaining: 79.2ms
28:	learn: 2419.1841716	total: 1.15s	remaining: 39.6ms
29:	learn: 2395.4204974	total: 1.19s	remaining: 0us
0:	learn: 4370.1085062	total: 38.7ms	remaining: 1.12s
1:	learn: 4224.5198608	total: 77.7ms	remaining: 1.09s
2:	learn: 4090.5486799	total: 116ms	remaining: 1.04s
3:	learn: 3962.5456173	total: 154ms	remaining: 1s
4:	learn: 3840.4059656	total: 193ms	remaining: 965ms
5:	learn: 3727.4234542	total: 233ms	remaining: 933ms
6:	learn: 3621.7179341	total: 272ms

24:	learn: 2107.8458152	total: 963ms	remaining: 193ms
25:	learn: 2094.0987276	total: 1s	remaining: 155ms
26:	learn: 2079.4987153	total: 1.04s	remaining: 116ms
27:	learn: 2067.4223967	total: 1.08s	remaining: 77.2ms
28:	learn: 2054.6802518	total: 1.12s	remaining: 38.6ms
29:	learn: 2044.5644293	total: 1.16s	remaining: 0us
0:	learn: 4211.7152438	total: 50.7ms	remaining: 1.47s
1:	learn: 3941.7759894	total: 100ms	remaining: 1.4s
2:	learn: 3707.2109597	total: 149ms	remaining: 1.34s
3:	learn: 3497.5854695	total: 197ms	remaining: 1.28s
4:	learn: 3314.0626982	total: 246ms	remaining: 1.23s
5:	learn: 3154.5394502	total: 295ms	remaining: 1.18s
6:	learn: 3015.7884832	total: 343ms	remaining: 1.13s
7:	learn: 2892.9457524	total: 392ms	remaining: 1.08s
8:	learn: 2784.2964570	total: 440ms	remaining: 1.03s
9:	learn: 2693.6994791	total: 490ms	remaining: 980ms
10:	learn: 2615.7797763	total: 539ms	remaining: 931ms
11:	learn: 2545.8229780	total: 588ms	remaining: 881ms
12:	learn: 2486.5895514	total: 636ms	rema

In [53]:
display(best_params_cbr)

{'depth': 10, 'iterations': 30, 'learning_rate': 0.1}

In [55]:
%%time
cbr = CatBoostRegressor(random_state=7, 
                           depth=best_params_cbr['depth'],
                           iterations=best_params_cbr['iterations'],
                           learning_rate=best_params_cbr['learning_rate'],
                           ) 
cbr.fit(features_train, target_train)

0:	learn: 4211.7152438	total: 52.4ms	remaining: 1.52s
1:	learn: 3941.7759894	total: 102ms	remaining: 1.43s
2:	learn: 3707.2109597	total: 150ms	remaining: 1.35s
3:	learn: 3497.5854695	total: 199ms	remaining: 1.29s
4:	learn: 3314.0626982	total: 255ms	remaining: 1.28s
5:	learn: 3154.5394502	total: 305ms	remaining: 1.22s
6:	learn: 3015.7884832	total: 354ms	remaining: 1.16s
7:	learn: 2892.9457524	total: 402ms	remaining: 1.1s
8:	learn: 2784.2964570	total: 450ms	remaining: 1.05s
9:	learn: 2693.6994791	total: 500ms	remaining: 999ms
10:	learn: 2615.7797763	total: 549ms	remaining: 947ms
11:	learn: 2545.8229780	total: 597ms	remaining: 896ms
12:	learn: 2486.5895514	total: 646ms	remaining: 845ms
13:	learn: 2434.5090788	total: 695ms	remaining: 794ms
14:	learn: 2380.5856573	total: 744ms	remaining: 744ms
15:	learn: 2338.6535522	total: 798ms	remaining: 698ms
16:	learn: 2300.7303525	total: 846ms	remaining: 647ms
17:	learn: 2268.8852334	total: 895ms	remaining: 597ms
18:	learn: 2237.8347192	total: 945ms	r

<catboost.core.CatBoostRegressor at 0x7f50f07a7a60>

In [56]:
%%time
# Predict using the best model
predictions_valid_cbr = cbr.predict(features_valid)

CPU times: user 9.59 ms, sys: 3.99 ms, total: 13.6 ms
Wall time: 12.5 ms


In [59]:
# Calculate Mean Absolute Error
mse_cbr = mean_squared_error(target_valid, predictions_valid_cbr)
rmse_cbr = mse_cbr ** 0.5

print("Root Mean Squared Error:", rmse_cbr)
print("Best Parameters:", best_params_cbr)

Root Mean Squared Error: 2034.4336189856476
Best Parameters: {'depth': 10, 'iterations': 30, 'learning_rate': 0.1}


When running the above Cat Boost Regressor model with best parameters (depth:10, and iterations:30, and learning_rate:0.1), RMSE is 2034.4336189856476 which suggests that the model's predictions on average deviate by ~2034 from the actual values. The training time for model fiting is 2.33s while for making predictions is 13.6ms. It is the fastest model among all as it is efficient at handling categorical and numerical data. 

### LightGBM Regression

In [44]:
# Using LGBMRegressor modelto get the best model
param_grid_lgbm = {
    'learning_rate': [0.01, 0.05, 0.1],  
    'n_estimators': [10, 20, 30],         
    'max_depth': [1, 2, 5, 10]              
}

In [45]:
# Initialize CatBoost model
lgbm = LGBMRegressor(random_state=7)  

# Grid search with 4-fold cross-validation
grid_search_lgbm = GridSearchCV(estimator=lgbm, param_grid=param_grid_lgbm, cv=4, scoring='neg_root_mean_squared_error', verbose=1)

In [60]:
%%time
# Fit the model
grid_search_lgbm.fit(features_train, target_train)

# Get best parameters and best model
best_params_lgbm = grid_search_lgbm.best_params_
best_model_lgbm = grid_search_lgbm.best_estimator_

Fitting 4 folds for each of 36 candidates, totalling 144 fits
CPU times: user 4min 5s, sys: 17 s, total: 4min 22s
Wall time: 4min 22s


In [65]:
display(best_params_lgbm)

{'learning_rate': 0.1, 'max_depth': 10, 'n_estimators': 30}

In [66]:
%%time
lgbm = LGBMRegressor(random_state=7, 
                    learning_rate = best_params_lgbm['learning_rate'],
                    n_estimators = best_params_lgbm['n_estimators'],
                    max_depth = best_params_lgbm['max_depth']
                    )

lgbm.fit(features_train,target_train)

CPU times: user 2.1 s, sys: 91.8 ms, total: 2.19 s
Wall time: 2.14 s


LGBMRegressor(max_depth=10, n_estimators=30, random_state=7)

In [67]:
%%time
# Predict using the best model
predictions_valid_lgbm = best_model_lgbm.predict(features_valid)

CPU times: user 204 ms, sys: 32 ms, total: 236 ms
Wall time: 226 ms


In [68]:
# Calculate Mean Absolute Error
mse_lgbm = mean_squared_error(target_valid, predictions_valid_lgbm)
rmse_lgbm = mse_lgbm ** 0.5

print("Root Mean Squared Error:", rmse_lgbm)
print("Best Parameters:", best_params_lgbm)

Root Mean Squared Error: 2028.2235056876789
Best Parameters: {'learning_rate': 0.1, 'max_depth': 10, 'n_estimators': 30}


When running the above LightGBMRegressor model with best parameters (max_depth:10, and n_estimators:30, and learning_rate:0.1), RMSE is 2028.2235056876789 which suggests that the model's predictions on average deviate by ~2028 from the actual values. The training time for model fiting is 2.19s while for making predictions is 236ms. It shows that this model is fast but it takes slightly longer to train and predict than CatBoost. 

### Test Set

In [69]:
predictions_test = rf.predict(features_test)

# Evaluate model performance
mse_test = mean_squared_error(target_test, predictions_test)
rmse_test = mse_test ** 0.5

print("Root Mean Squared Error on Test Set:", rmse_test)

Root Mean Squared Error on Test Set: 2039.588475157083


Based on RMSE, RandomForestRegressor model is the best model. Thus, I choose it as the best model for test set which resulted into RMSE of 2039.588475157083 suggesting that model's predictions on average deviate by ~2040 from the actual values. The test RMSE is similar to the validation RMSE suggesting no major overfitting. It also shows that the model generalizes well meaning it perform well on unseen data. 

## Model analysis

The project was based on to build the model to determine the market value(price) of the car. But along with this, the company Rusty Bargain is interested in a model which has a good quality of the prediction, fast speed of the prediction and less time required for training. 

In this project after cleaning the dataset, multiple regression models were trained and evaluated to predict the target variable (price) accurately. To get a model with a good quality of the prediction, the goal is to minimize the RMSE. A Linear Regression model was used as a sanity check (baseline), and a progessively more complex models, such as Random Forest, CatBoost, and LightGBM, were trained to improve performance. 

Linear Regression model is the fastest but has a poor predictive performance, confirming the need for more complex models.Random Forest Regressor model has a good improvement in RMSE, but training other more complex models were necessary so CatBoostRegressor and LightGBMRegressor were trained. 

After training all models it is known that, Random Forest Regressor model has the lowest RMSE, 2018 with training time of 1min 17s while the RMSE of CatBoostModel is 2028 with training time of 2.33s. As both, accuracy and speed to train a model are important aspects in determining the best model, the difference in training both model is insignificant. Based on RMSE Random Forest Regressor is the best model and thus, used to determine the RMSE for test set. RMSE for test set is 2039.588475157083 suggesting that model's predictions on average deviate by ~2040 from the actual values. As the test RMSE is similar to the validation RMSE, it represents that there is no major overfitting and that the model generalizes well meaning it perform well on unseen data.

# Checklist

Type 'x' to check. Then press Shift+Enter.

- [x]  Jupyter Notebook is open
- [x]  Code is error free
- [x]  The cells with the code have been arranged in order of execution
- [x]  The data has been downloaded and prepared
- [x]  The models have been trained
- [x]  The analysis of speed and quality of the models has been performed