### Used Car Price Prediction

### About Dataset
The used car market in India is a dynamic and ever-changing landscape. Prices can fluctuate wildly based on a variety of factors including the make and model of the car, its mileage, its condition and the current market conditions. As a result, it can be difficult for sellers to accurately price their cars.

This dataset contains information about used cars.

#### Data Description (Feature Information)

car_name: Car's Full name, which includes brand and specific model name.

brand: Brand Name of the particular car.

model: Exact model name of the car of a particular brand.
    
seller_type: Which Type of seller is selling the used car

fuel_type: Fuel used in the used car, which was put up on sale.

transmission_type: Transmission used in the used car, which was put on sale.
    
vehicle_age: The count of years since car was bought.

mileage: It is the number of kilometer the car runs per litre.
    
engine: It is the engine capacity in cc(cubic centimeters)

max_power: Max power it produces in BHP.

seats: Total number of seats in car.

selling_price: The sale price which was put up on website.

The dataset consist of 14 columns and 15411 rows.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline
import plotly.express as px

import warnings

# Suppress all warnings
warnings.filterwarnings("ignore")

In [2]:
df = pd.read_csv("cardekho_dataset.csv")

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,car_name,brand,model,vehicle_age,km_driven,seller_type,fuel_type,transmission_type,mileage,engine,max_power,seats,selling_price
0,0,Maruti Alto,Maruti,Alto,9,120000,Individual,Petrol,Manual,19.7,796,46.3,5,120000
1,1,Hyundai Grand,Hyundai,Grand,5,20000,Individual,Petrol,Manual,18.9,1197,82.0,5,550000
2,2,Hyundai i20,Hyundai,i20,11,60000,Individual,Petrol,Manual,17.0,1197,80.0,5,215000
3,3,Maruti Alto,Maruti,Alto,9,37000,Individual,Petrol,Manual,20.92,998,67.1,5,226000
4,4,Ford Ecosport,Ford,Ecosport,6,30000,Dealer,Diesel,Manual,22.77,1498,98.59,5,570000


### Data Cleaning

In [4]:
# GET NUM OF COLOMS AND ROWS
print(f"Number of column :{df.shape[1]}\nNumber of rows :{df.shape[0]}")

Number of column :14
Number of rows :15411


In [5]:
### Different type of Column count in the dataset
numerical_feature = [feature for feature in df.columns if df[feature].dtypes != 'O']
discrete_feature=[feature for feature in numerical_feature if len(df[feature].unique())<25]
continuous_feature = [feature for feature in numerical_feature if feature not in discrete_feature]
categorical_feature = [feature for feature in df.columns if feature not in numerical_feature]
print("Numerical Features Count :: {}".format(len(numerical_feature)))
print("Discrete (Numerical) feature Count :: {}".format(len(discrete_feature)))
print("Continuous (Numerical) feature Count ::  {}".format(len(continuous_feature)))
print("Categorical feature Count :: {}".format(len(categorical_feature)))

Numerical Features Count :: 8
Discrete (Numerical) feature Count :: 2
Continuous (Numerical) feature Count ::  6
Categorical feature Count :: 6


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15411 entries, 0 to 15410
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Unnamed: 0         15411 non-null  int64  
 1   car_name           15411 non-null  object 
 2   brand              15411 non-null  object 
 3   model              15411 non-null  object 
 4   vehicle_age        15411 non-null  int64  
 5   km_driven          15411 non-null  int64  
 6   seller_type        15411 non-null  object 
 7   fuel_type          15411 non-null  object 
 8   transmission_type  15411 non-null  object 
 9   mileage            15411 non-null  float64
 10  engine             15411 non-null  int64  
 11  max_power          15411 non-null  float64
 12  seats              15411 non-null  int64  
 13  selling_price      15411 non-null  int64  
dtypes: float64(2), int64(6), object(6)
memory usage: 1.6+ MB


In [7]:
### Removing unnecessary columns from DataFrame [if needed]
df.drop(columns = ['Unnamed: 0','car_name','brand'],axis = 1,inplace = True)

In [8]:
df.columns

Index(['model', 'vehicle_age', 'km_driven', 'seller_type', 'fuel_type',
       'transmission_type', 'mileage', 'engine', 'max_power', 'seats',
       'selling_price'],
      dtype='object')

In [9]:
df.head()

Unnamed: 0,model,vehicle_age,km_driven,seller_type,fuel_type,transmission_type,mileage,engine,max_power,seats,selling_price
0,Alto,9,120000,Individual,Petrol,Manual,19.7,796,46.3,5,120000
1,Grand,5,20000,Individual,Petrol,Manual,18.9,1197,82.0,5,550000
2,i20,11,60000,Individual,Petrol,Manual,17.0,1197,80.0,5,215000
3,Alto,9,37000,Individual,Petrol,Manual,20.92,998,67.1,5,226000
4,Ecosport,6,30000,Dealer,Diesel,Manual,22.77,1498,98.59,5,570000


1. Hadling Missing values

2. Handling Duplicates

3. Check datatype

4. Understand the datatypes

In [10]:
pd.DataFrame({'count': df.shape[0], 
              'nulls': df.isnull().sum(), 
              'nulls%': df.isnull().mean() * 100, 
              'cardinality': df.nunique(),
              'duplicated':df.duplicated().sum(),
              'datatype':df.dtypes
             })


Unnamed: 0,count,nulls,nulls%,cardinality,duplicated,datatype
model,15411,0,0.0,120,167,object
vehicle_age,15411,0,0.0,24,167,int64
km_driven,15411,0,0.0,3688,167,int64
seller_type,15411,0,0.0,3,167,object
fuel_type,15411,0,0.0,5,167,object
transmission_type,15411,0,0.0,2,167,object
mileage,15411,0,0.0,411,167,float64
engine,15411,0,0.0,110,167,int64
max_power,15411,0,0.0,342,167,float64
seats,15411,0,0.0,8,167,int64


In [11]:
# duplicate value checking
df.duplicated().sum()

167

In [12]:
#df.drop_duplicates(inplace=True)

In [13]:
df.duplicated().sum()

167

In [14]:
### Checking Unique Value in Each Column
for col in df.columns:
    unique_values = df[col].unique()
    print(f"Column: {col}")
    print(f"Unique Values: {unique_values}")
    print(f"Number of Unique Values: {len(unique_values)}\n")

Column: model
Unique Values: ['Alto' 'Grand' 'i20' 'Ecosport' 'Wagon R' 'i10' 'Venue' 'Swift' 'Verna'
 'Duster' 'Cooper' 'Ciaz' 'C-Class' 'Innova' 'Baleno' 'Swift Dzire'
 'Vento' 'Creta' 'City' 'Bolero' 'Fortuner' 'KWID' 'Amaze' 'Santro'
 'XUV500' 'KUV100' 'Ignis' 'RediGO' 'Scorpio' 'Marazzo' 'Aspire' 'Figo'
 'Vitara' 'Tiago' 'Polo' 'Seltos' 'Celerio' 'GO' '5' 'CR-V' 'Endeavour'
 'KUV' 'Jazz' '3' 'A4' 'Tigor' 'Ertiga' 'Safari' 'Thar' 'Hexa' 'Rover'
 'Eeco' 'A6' 'E-Class' 'Q7' 'Z4' '6' 'XF' 'X5' 'Hector' 'Civic' 'D-Max'
 'Cayenne' 'X1' 'Rapid' 'Freestyle' 'Superb' 'Nexon' 'XUV300' 'Dzire VXI'
 'S90' 'WR-V' 'XL6' 'Triber' 'ES' 'Wrangler' 'Camry' 'Elantra' 'Yaris'
 'GL-Class' '7' 'S-Presso' 'Dzire LXI' 'Aura' 'XC' 'Ghibli' 'Continental'
 'CR' 'Kicks' 'S-Class' 'Tucson' 'Harrier' 'X3' 'Octavia' 'Compass' 'CLS'
 'redi-GO' 'Glanza' 'Macan' 'X4' 'Dzire ZXI' 'XC90' 'F-PACE' 'A8' 'MUX'
 'GTC4Lusso' 'GLS' 'X-Trail' 'XE' 'XC60' 'Panamera' 'Alturas' 'Altroz'
 'NX' 'Carnival' 'C' 'RX' 'Ghost' 'Quat

In [15]:
## Independent and Dependent features
X = df.drop(['selling_price'],axis=1)
y = df['selling_price']

In [16]:
X.head()

Unnamed: 0,model,vehicle_age,km_driven,seller_type,fuel_type,transmission_type,mileage,engine,max_power,seats
0,Alto,9,120000,Individual,Petrol,Manual,19.7,796,46.3,5
1,Grand,5,20000,Individual,Petrol,Manual,18.9,1197,82.0,5
2,i20,11,60000,Individual,Petrol,Manual,17.0,1197,80.0,5
3,Alto,9,37000,Individual,Petrol,Manual,20.92,998,67.1,5
4,Ecosport,6,30000,Dealer,Diesel,Manual,22.77,1498,98.59,5


In [17]:
y.head()

0    120000
1    550000
2    215000
3    226000
4    570000
Name: selling_price, dtype: int64

### Feature Encoding and Scaling

#### One Hot Encoding for columns which have lesser unique values and not ordinal

One Hot Encoding is a process by which categorical variables are converted into a form that could be provided to ML algorithms to do a better job in prediction.

In [18]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X['model'] = le.fit_transform(X['model'])

In [19]:
X.head()

Unnamed: 0,model,vehicle_age,km_driven,seller_type,fuel_type,transmission_type,mileage,engine,max_power,seats
0,7,9,120000,Individual,Petrol,Manual,19.7,796,46.3,5
1,54,5,20000,Individual,Petrol,Manual,18.9,1197,82.0,5
2,118,11,60000,Individual,Petrol,Manual,17.0,1197,80.0,5
3,7,9,37000,Individual,Petrol,Manual,20.92,998,67.1,5
4,38,6,30000,Dealer,Diesel,Manual,22.77,1498,98.59,5


In [20]:
len(df['seller_type'].unique()),len(df['fuel_type'].unique()),len(df['transmission_type'].unique())

(3, 5, 2)

In [21]:
## Create Column Transformer with 2 types of Transformer
num_features = X.select_dtypes(exclude = 'object').columns
onehot_columns = ['seller_type','fuel_type','transmission_type']

from sklearn.preprocessing import OneHotEncoder,StandardScaler
from sklearn.compose import ColumnTransformer

numeric_transformer = StandardScaler()
oh_transformer = OneHotEncoder(drop = 'first')

preprocessor = ColumnTransformer(
    [
        ("OneHotEncoder",oh_transformer,onehot_columns),
        ("StandardScaler",numeric_transformer,num_features)
    ],remainder = 'passthrough'
)

In [22]:
X = preprocessor.fit_transform(X)

In [23]:
pd.DataFrame(X)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13
0,1.0,0.0,0.0,0.0,0.0,1.0,1.0,-1.519714,0.983562,1.247335,-0.000276,-1.324259,-1.263352,-0.403022
1,1.0,0.0,0.0,0.0,0.0,1.0,1.0,-0.225693,-0.343933,-0.690016,-0.192071,-0.554718,-0.432571,-0.403022
2,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.536377,1.647309,0.084924,-0.647583,-0.554718,-0.479113,-0.403022
3,1.0,0.0,0.0,0.0,0.0,1.0,1.0,-1.519714,0.983562,-0.360667,0.292211,-0.936610,-0.779312,-0.403022
4,0.0,0.0,1.0,0.0,0.0,0.0,1.0,-0.666211,-0.012060,-0.496281,0.735736,0.022918,-0.046502,-0.403022
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15406,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.508844,0.983562,-0.869744,0.026096,-0.767733,-0.757204,-0.403022
15407,0.0,0.0,0.0,0.0,0.0,1.0,1.0,-0.556082,-1.339555,-0.728763,-0.527711,-0.216964,-0.220803,2.073444
15408,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.407551,-0.012060,0.220539,0.344954,0.022918,0.068225,-0.403022
15409,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.426247,-0.343933,72.541850,-0.887326,1.329794,0.917158,2.073444


In [24]:
## Separate dataset into train and test
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.33,random_state = 42)
X_train.shape,X_test.shape,y_train.shape,y_test.shape

((10325, 14), (5086, 14), (10325,), (5086,))

In [25]:
X_train

array([[ 0.        ,  0.        ,  0.        , ..., -0.9366097 ,
        -0.78070786, -0.40302241],
       [ 0.        ,  0.        ,  0.        , ..., -0.5566368 ,
        -0.3208691 , -0.40302241],
       [ 0.        ,  0.        ,  0.        , ..., -0.9366097 ,
        -0.78047514, -0.40302241],
       ...,
       [ 1.        ,  0.        ,  0.        , ..., -0.9366097 ,
        -0.78070786, -0.40302241],
       [ 0.        ,  0.        ,  0.        , ..., -0.55471774,
        -0.43582879, -0.40302241],
       [ 1.        ,  0.        ,  0.        , ..., -0.04616815,
         0.06194201, -0.40302241]])

In [26]:
y_train

12074    450000
14284    430000
7700     295000
13733    310000
12196    800000
          ...  
5191     665000
13418    249000
5390     250000
860      620000
7270     960000
Name: selling_price, Length: 10325, dtype: int64

### Model Training and Model Selection

In [40]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import AdaBoostRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LinearRegression,Ridge,Lasso
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from xgboost import XGBRegressor
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error

In [28]:
### Create a function to Evaluate Model
def evaluate_model(true,predicted):
    mae = mean_absolute_error(true,predicted)
    mse = mean_squared_error(true,predicted)
    rmse = np.sqrt(mean_squared_error(true,predicted))
    r2_square = r2_score(true,predicted)
    return mae,mse,rmse,r2_square

In [41]:
## Let's start Model Training
models = {
    "LinearRegression":LinearRegression(),
    "Ridge":Ridge(),
    "Lasso":Lasso(),
    "KNeighborsRegressor":KNeighborsRegressor(),
    "DecisionTreeRegressor":DecisionTreeRegressor(),
    "RandomForestRegressor":RandomForestRegressor(),
    "AdaBoostRegressor":AdaBoostRegressor(),
    "GradientBoostingRegressor":GradientBoostingRegressor(),
    "XGBoost Regressor":XGBRegressor()
}

for i in range(len(list(models))):
    model = list(models.values())[i]
    model.fit(X_train,y_train)
    ## Make Prediction
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    ## Evaluate Train and Test dataset
    model_train_mae,model_train_mse,model_train_rmse,model_train_r2_square = evaluate_model(y_train,y_train_pred)
    model_test_mae,model_test_mse,model_test_rmse,model_test_r2_square = evaluate_model(y_test,y_test_pred)

    print(list(models.keys())[i])
    print("*******************************************************************")

    print("Model Performance for Training Set ::")
    print("-mean_absolute_error :{:.4f}".format(model_train_mae))
    print("-mean_squared_error :{:.4f}".format(model_train_mse))
    print("-Root_mean_squared_error :{:.4f}".format(model_train_rmse))
    print("-r2_score :{:.4f}".format(model_train_r2_square))

    print("---------------------------------------------------------------------")
    
    print("Model Performance for Test Set ::")
    print("-mean_absolute_error :{:.4f}".format(model_test_mae))
    print("-mean_squared_error :{:.4f}".format(model_test_mse))
    print("-Root_mean_squared_error :{:.4f}".format(model_test_rmse))
    print("-r2_score :{:.4f}".format(model_test_r2_square))
    
    print("="*75)


LinearRegression
*******************************************************************
Model Performance for Training Set ::
-mean_absolute_error :269089.2604
-mean_squared_error :317614523214.2133
-Root_mean_squared_error :563572.9972
-r2_score :0.6170
---------------------------------------------------------------------
Model Performance for Test Set ::
-mean_absolute_error :281003.4130
-mean_squared_error :253401740454.3791
-Root_mean_squared_error :503390.2467
-r2_score :0.6569
Ridge
*******************************************************************
Model Performance for Training Set ::
-mean_absolute_error :269042.5654
-mean_squared_error :317615483996.8785
-Root_mean_squared_error :563573.8496
-r2_score :0.6170
---------------------------------------------------------------------
Model Performance for Test Set ::
-mean_absolute_error :280947.9075
-mean_squared_error :253383293702.4095
-Root_mean_squared_error :503371.9238
-r2_score :0.6570
Lasso
***********************************

In [42]:
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

# Create separate DataFrames to store results for training and testing datasets
train_results_df = pd.DataFrame(columns=['Model', 'R2','MAE', 'MSE', 'RMSE'])
test_results_df = pd.DataFrame(columns=['Model','R2', 'MAE', 'MSE', 'RMSE'])

# Define your models
models = {
    'Linear Regression': LinearRegression(),
    'Lasso': Lasso(),
    'Ridge': Ridge(),
    'KNeighbors': KNeighborsRegressor(),
    'Decision Tree Regressor': DecisionTreeRegressor(),
    'Random Forest Regressor': RandomForestRegressor(),
    'Adaboost Regressor': AdaBoostRegressor(),
    'GradientBoostingRegressor':GradientBoostingRegressor(),
    'XGBoost Regressor':XGBRegressor()
}

# Train and evaluate each model
for name, model in models.items():
    model.fit(X_train, y_train)
    
    # Training predictions and metrics
    train_r2 = r2_score(y_train, y_train_pred)
    y_train_pred = model.predict(X_train)
    train_mae = mean_absolute_error(y_train, y_train_pred)
    train_mse = mean_squared_error(y_train, y_train_pred)
    train_rmse = train_mse ** 0.5
    
    
    # Append training results to DataFrame
    train_results_df = pd.concat([train_results_df, pd.DataFrame([{
        'Model': name,
        'R2': train_r2,
        'MAE': train_mae, 
        'MSE': train_mse, 
        'RMSE': train_rmse
        
    }])], ignore_index=True)
    
    # Testing predictions and metrics
    y_test_pred = model.predict(X_test)
    test_r2 = r2_score(y_test, y_test_pred)
    test_mae = mean_absolute_error(y_test, y_test_pred)
    test_mse = mean_squared_error(y_test, y_test_pred)
    test_rmse = test_mse ** 0.5
    
    
    # Append testing results to DataFrame
    test_results_df = pd.concat([test_results_df, pd.DataFrame([{
        'Model': name, 
        'R2': test_r2,
        'MAE': test_mae, 
        'MSE': test_mse, 
        'RMSE': test_rmse
        
    }])], ignore_index=True)

# Display the results
print("Training Dataset Results:")
print(train_results_df)

print("\nTesting Dataset Results:")
print(test_results_df)

Training Dataset Results:
                       Model        R2            MAE           MSE  \
0          Linear Regression  0.991866  269089.260434  3.176145e+11   
1                      Lasso  0.616983  269088.212872  3.176145e+11   
2                      Ridge  0.616983  269042.565409  3.176155e+11   
3                 KNeighbors  0.616982   92998.314770  1.187940e+11   
4    Decision Tree Regressor  0.856744    4500.422922  3.935439e+08   
5    Random Forest Regressor  0.999525   40255.640119  2.413308e+10   
6         Adaboost Regressor  0.970897  345883.536407  1.975527e+11   
7  GradientBoostingRegressor  0.761768  110412.898159  3.874568e+10   
8          XGBoost Regressor  0.953276   58826.075594  6.745112e+09   

            RMSE  
0  563572.997237  
1  563573.007082  
2  563573.849639  
3  344665.070390  
4   19837.941041  
5  155348.238487  
6  444469.037335  
7  196839.217360  
8   82128.629530  

Testing Dataset Results:
                       Model        R2         

#### Hyperparameter Tuning of Multiple Regressor Models with RandomizedSearchCV

In [43]:
## Hyperparameter Training
knn_params ={"n_neighbors":[2,3,5,10,50,30,40,50]}
rf_params = {
    "max_depth":[5,8,15,"None",10],
    "max_features":[5,7,"auto",8],
    "min_samples_split":[2,8,15,20],
    "n_estimators":[100,200,500,1000]
}
adaboost_params ={
    "n_estimators":[50,60,70,80,100],
    "loss":["linear","square","exponential"]
}

gradient_params ={
    'loss':['squared_error','huber','absolute_error'],
    'criterion':['friedman_mse','squared_error','mse'],
    'min_samples_split':[2,8,10,15],
    'n_estimators':[100,200,500,1000],
    'max_depth':[5,8,15,10,20],
    'learning_rate':[0.1,0.01,0.001]
}

xgboost_params = {
    'learning_rate':[0.1,0.01],
    'max_depth':[5,8,12,20,30],
    'n_estimators':[100,200,300,400,500],
    'colsample_bytree':[0.5,0.5,1,0.3,0.4]
}
   

In [32]:
knn_params

{'n_neighbors': [2, 3, 5, 10, 50, 30, 40, 50]}

In [33]:
rf_params

{'max_depth': [5, 8, 15, 'None', 10],
 'max_features': [5, 7, 'auto', 8],
 'min_samples_split': [2, 8, 15, 20],
 'n_estimators': [100, 200, 500, 1000]}

In [34]:
adaboost_params

{'n_estimators': [50, 60, 70, 80, 100],
 'loss': ['linear', 'square', 'exponential']}

In [35]:
gradient_params

{'loss': ['squared_error', 'huber', 'absolute_error'],
 'criterion': ['friedman_mse', 'squared_error', 'mse'],
 'min_samples_split': [2, 8, 10, 15],
 'n_estimators': [100, 200, 500, 1000],
 'max_depth': [5, 8, 15, 10, 20],
 'learning_rate': [0.1, 0.01, 0.001]}

In [44]:
xgboost_params

{'learning_rate': [0.1, 0.01],
 'max_depth': [5, 8, 12, 20, 30],
 'n_estimators': [100, 200, 300, 400, 500],
 'colsample_bytree': [0.5, 0.5, 1, 0.3, 0.4]}

In [46]:
## Model List for Hyperparameter Tuning
randomcv_models = [
    #("KNN",KNeighborsRegressor(),knn_params),
    ("RF",RandomForestRegressor(),rf_params),
    #("AB",AdaBoostRegressor(),adaboost_params),
    #("GB",GradientBoostingRegressor(),gradient_params)
    ('XGB',XGBRegressor(),xgboost_params)
]

In [47]:
randomcv_models

[('RF',
  RandomForestRegressor(),
  {'max_depth': [5, 8, 15, 'None', 10],
   'max_features': [5, 7, 'auto', 8],
   'min_samples_split': [2, 8, 15, 20],
   'n_estimators': [100, 200, 500, 1000]}),
 ('XGB',
  XGBRegressor(base_score=None, booster=None, callbacks=None,
               colsample_bylevel=None, colsample_bynode=None,
               colsample_bytree=None, device=None, early_stopping_rounds=None,
               enable_categorical=False, eval_metric=None, feature_types=None,
               gamma=None, grow_policy=None, importance_type=None,
               interaction_constraints=None, learning_rate=None, max_bin=None,
               max_cat_threshold=None, max_cat_to_onehot=None,
               max_delta_step=None, max_depth=None, max_leaves=None,
               min_child_weight=None, missing=nan, monotone_constraints=None,
               multi_strategy=None, n_estimators=None, n_jobs=None,
               num_parallel_tree=None, random_state=None, ...),
  {'learning_rate': [0.1

In [48]:
from sklearn.model_selection import RandomizedSearchCV
model_param = {}
for name,model,params in randomcv_models:
    random = RandomizedSearchCV(
        estimator = model,
        param_distributions = params,
        n_iter = 10,
        cv = 3,
        verbose = 2,
        n_jobs = -1
    )
    random.fit(X_train,y_train)
    model_param[name] = random.best_params_

for model_name in model_param:
    print(f"-----------Best Params for {model_name}---------------")
    print(model_param[model_name])


Fitting 3 folds for each of 10 candidates, totalling 30 fits
Fitting 3 folds for each of 10 candidates, totalling 30 fits
-----------Best Params for RF---------------
{'n_estimators': 200, 'min_samples_split': 2, 'max_features': 8, 'max_depth': 15}
-----------Best Params for XGB---------------
{'n_estimators': 400, 'max_depth': 8, 'learning_rate': 0.1, 'colsample_bytree': 0.5}


In [50]:
models = {
    "Random Forest Regressor":RandomForestRegressor(n_estimators = 500,
                                          min_samples_split =8, 
                                          max_features = 8, 
                                          max_depth = 15),
    #"K-Neighbors Regressor":KNeighborsRegressor(n_neighbors =5),
    #"AdaBoost Regressor":AdaBoostRegressor(n_estimators = 50,loss ='linear'),
    #"Gradient Boosting Regressor":GradientBoostingRegressor(n_estimators = 1000, 
                                                            #min_samples_split = 2, 
                                                            #max_depth = 15, 
                                                           # loss ='absolute_error', 
                                                            #learning_rate = 0.01, 
                                                            #criterion = 'squared_error')
    "XGB Regressor":XGBRegressor(n_estimators = 400, 
                                 max_depth = 8, 
                                 learning_rate = 0.1, 
                                 colsample_bytree = 0.5)
}

for i in range(len(list(models))):
    model = list(models.values())[i]
    model.fit(X_train,y_train)
    ## Make Prediction
    y_train_pred = model.predict(X_train)
    y_test_pred = model.predict(X_test)
    ## Evaluate Train and Test dataset
    model_train_mae,model_train_mse,model_train_rmse,model_train_r2_square = evaluate_model(y_train,y_train_pred)
    model_test_mae,model_test_mse,model_test_rmse,model_test_r2_square = evaluate_model(y_test,y_test_pred)

    print(list(models.keys())[i])
    print("***********************************************************************")

    print("Model Performance for Training Set ::")
    print("-mean_absolute_error :{:.4f}".format(model_train_mae))
    print("-mean_squared_error :{:.4f}".format(model_train_mse))
    print("-Root_mean_squared_error :{:.4f}".format(model_train_rmse))
    print("-r2_score :{:.4f}".format(model_train_r2_square))

    print("---------------------------------------------------------------------")
    
    print("Model Performance for Test Set ::")
    print("-mean_absolute_error :{:.4f}".format(model_test_mae))
    print("-mean_squared_error :{:.4f}".format(model_test_mse))
    print("-Root_mean_squared_error :{:.4f}".format(model_test_rmse))
    print("-r2_score :{:.4f}".format(model_test_r2_square))
    
    print("="*75)


Random Forest Regressor
***********************************************************************
Model Performance for Training Set ::
-mean_absolute_error :69809.9247
-mean_squared_error :39819273467.8261
-Root_mean_squared_error :199547.6722
-r2_score :0.9520
---------------------------------------------------------------------
Model Performance for Test Set ::
-mean_absolute_error :102706.8230
-mean_squared_error :55636453107.5321
-Root_mean_squared_error :235873.8076
-r2_score :0.9247
XGB Regressor
***********************************************************************
Model Performance for Training Set ::
-mean_absolute_error :43989.7329
-mean_squared_error :3730806518.7059
-Root_mean_squared_error :61080.3284
-r2_score :0.9955
---------------------------------------------------------------------
Model Performance for Test Set ::
-mean_absolute_error :100260.7040
-mean_squared_error :71282235825.6084
-Root_mean_squared_error :266987.3327
-r2_score :0.9035
