## Problem Statement

##### "Trips & Travel.Com" company wants to enable and establish a viable business model to expand the customer base. One of the ways to expand the customer base is to introduce a new offering of packages. Currently, there are 5 types of packages the company is offering - Basic, Standard, Deluxe, Super Deluxe, King. Looking at the data of the last year, we observed that 18% of the customers purchased the packages. However, the marketing cost was quite high because customers were contacted at random without looking at the available information. The company is now planning to launch a new product i.e. Wellness Tourism Package. Wellness Tourism is defined as Travel that allows the traveler to maintain, enhance or kick-start a healthy lifestyle, and support or increase one's sense of well-being. However, this time company wants to harness the available data of existing and potential customers to make the marketing expenditure more efficient.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import warnings

warnings .filterwarnings('ignore')
%matplotlib inline

In [2]:
df = pd.read_csv('Travel.csv')

In [3]:
df.head()

Unnamed: 0,CustomerID,ProdTaken,Age,TypeofContact,CityTier,DurationOfPitch,Occupation,Gender,NumberOfPersonVisiting,NumberOfFollowups,ProductPitched,PreferredPropertyStar,MaritalStatus,NumberOfTrips,Passport,PitchSatisfactionScore,OwnCar,NumberOfChildrenVisiting,Designation,MonthlyIncome
0,200000,1,41.0,Self Enquiry,3,6.0,Salaried,Female,3,3.0,Deluxe,3.0,Single,1.0,1,2,1,0.0,Manager,20993.0
1,200001,0,49.0,Company Invited,1,14.0,Salaried,Male,3,4.0,Deluxe,4.0,Divorced,2.0,0,3,1,2.0,Manager,20130.0
2,200002,1,37.0,Self Enquiry,1,8.0,Free Lancer,Male,3,4.0,Basic,3.0,Single,7.0,1,3,0,0.0,Executive,17090.0
3,200003,0,33.0,Company Invited,1,9.0,Salaried,Female,2,3.0,Basic,3.0,Divorced,2.0,1,5,1,1.0,Executive,17909.0
4,200004,0,,Self Enquiry,1,8.0,Small Business,Male,2,3.0,Basic,4.0,Divorced,1.0,0,5,1,0.0,Executive,18468.0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4888 entries, 0 to 4887
Data columns (total 20 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   CustomerID                4888 non-null   int64  
 1   ProdTaken                 4888 non-null   int64  
 2   Age                       4662 non-null   float64
 3   TypeofContact             4863 non-null   object 
 4   CityTier                  4888 non-null   int64  
 5   DurationOfPitch           4637 non-null   float64
 6   Occupation                4888 non-null   object 
 7   Gender                    4888 non-null   object 
 8   NumberOfPersonVisiting    4888 non-null   int64  
 9   NumberOfFollowups         4843 non-null   float64
 10  ProductPitched            4888 non-null   object 
 11  PreferredPropertyStar     4862 non-null   float64
 12  MaritalStatus             4888 non-null   object 
 13  NumberOfTrips             4748 non-null   float64
 14  Passport

In [5]:
df.isnull().sum()


CustomerID                    0
ProdTaken                     0
Age                         226
TypeofContact                25
CityTier                      0
DurationOfPitch             251
Occupation                    0
Gender                        0
NumberOfPersonVisiting        0
NumberOfFollowups            45
ProductPitched                0
PreferredPropertyStar        26
MaritalStatus                 0
NumberOfTrips               140
Passport                      0
PitchSatisfactionScore        0
OwnCar                        0
NumberOfChildrenVisiting     66
Designation                   0
MonthlyIncome               233
dtype: int64

In [6]:
df['Gender'].value_counts()

Gender
Male       2916
Female     1817
Fe Male     155
Name: count, dtype: int64

In [7]:
# unnessesary category combined with the real meaning 
df['Gender']= df['Gender'].replace('Fe Male','Female')

In [8]:
df['Gender'].value_counts()

Gender
Male      2916
Female    1972
Name: count, dtype: int64

In [9]:
df['CityTier'].value_counts()

CityTier
1    3190
3    1500
2     198
Name: count, dtype: int64

In [10]:
df['MaritalStatus'].value_counts()

MaritalStatus
Married      2340
Divorced      950
Single        916
Unmarried     682
Name: count, dtype: int64

In [11]:
# combining unessesary unmarried by single
df['MaritalStatus']= df['MaritalStatus'].replace('Unmarried','Single')

In [12]:
df['MaritalStatus'].value_counts()

MaritalStatus
Married     2340
Single      1598
Divorced     950
Name: count, dtype: int64

In [13]:
df['Designation'].value_counts()

Designation
Executive         1842
Manager           1732
Senior Manager     742
AVP                342
VP                 230
Name: count, dtype: int64

In [14]:
df['NumberOfTrips'].value_counts()

NumberOfTrips
2.0     1464
3.0     1079
1.0      620
4.0      478
5.0      458
6.0      322
7.0      218
8.0      105
19.0       1
21.0       1
20.0       1
22.0       1
Name: count, dtype: int64

In [15]:
df['TypeofContact'].value_counts()

TypeofContact
Self Enquiry       3444
Company Invited    1419
Name: count, dtype: int64

In [16]:
df['NumberOfFollowups'].value_counts()

NumberOfFollowups
4.0    2068
3.0    1466
5.0     768
2.0     229
1.0     176
6.0     136
Name: count, dtype: int64

In [17]:
df['ProdTaken'].value_counts()

ProdTaken
0    3968
1     920
Name: count, dtype: int64

In [18]:
df['Passport'].value_counts()

Passport
0    3466
1    1422
Name: count, dtype: int64

In [19]:
df['OwnCar'].value_counts()

OwnCar
1    3032
0    1856
Name: count, dtype: int64

In [20]:
df['ProductPitched'].value_counts()

ProductPitched
Basic           1842
Deluxe          1732
Standard         742
Super Deluxe     342
King             230
Name: count, dtype: int64

In [21]:
df['Occupation'].value_counts()

Occupation
Salaried          2368
Small Business    2084
Large Business     434
Free Lancer          2
Name: count, dtype: int64

In [22]:
df.head()

Unnamed: 0,CustomerID,ProdTaken,Age,TypeofContact,CityTier,DurationOfPitch,Occupation,Gender,NumberOfPersonVisiting,NumberOfFollowups,ProductPitched,PreferredPropertyStar,MaritalStatus,NumberOfTrips,Passport,PitchSatisfactionScore,OwnCar,NumberOfChildrenVisiting,Designation,MonthlyIncome
0,200000,1,41.0,Self Enquiry,3,6.0,Salaried,Female,3,3.0,Deluxe,3.0,Single,1.0,1,2,1,0.0,Manager,20993.0
1,200001,0,49.0,Company Invited,1,14.0,Salaried,Male,3,4.0,Deluxe,4.0,Divorced,2.0,0,3,1,2.0,Manager,20130.0
2,200002,1,37.0,Self Enquiry,1,8.0,Free Lancer,Male,3,4.0,Basic,3.0,Single,7.0,1,3,0,0.0,Executive,17090.0
3,200003,0,33.0,Company Invited,1,9.0,Salaried,Female,2,3.0,Basic,3.0,Divorced,2.0,1,5,1,1.0,Executive,17909.0
4,200004,0,,Self Enquiry,1,8.0,Small Business,Male,2,3.0,Basic,4.0,Divorced,1.0,0,5,1,0.0,Executive,18468.0


### checking for missing values

In [23]:
feature_with_nan = [feature for feature in df.columns if df[feature].isnull().sum()>0]

In [24]:
feature_with_nan

['Age',
 'TypeofContact',
 'DurationOfPitch',
 'NumberOfFollowups',
 'PreferredPropertyStar',
 'NumberOfTrips',
 'NumberOfChildrenVisiting',
 'MonthlyIncome']

##### let's calculate how much percent of missing values are there

In [25]:
for feature in feature_with_nan:
    print(feature,np.round(df[feature].isnull().mean()*100,5), '% missing values')

Age 4.62357 % missing values
TypeofContact 0.51146 % missing values
DurationOfPitch 5.13502 % missing values
NumberOfFollowups 0.92062 % missing values
PreferredPropertyStar 0.53191 % missing values
NumberOfTrips 2.86416 % missing values
NumberOfChildrenVisiting 1.35025 % missing values
MonthlyIncome 4.76678 % missing values


In [26]:
df[feature_with_nan].select_dtypes(exclude='object').describe()

Unnamed: 0,Age,DurationOfPitch,NumberOfFollowups,PreferredPropertyStar,NumberOfTrips,NumberOfChildrenVisiting,MonthlyIncome
count,4662.0,4637.0,4843.0,4862.0,4748.0,4822.0,4655.0
mean,37.622265,15.490835,3.708445,3.581037,3.236521,1.187267,23619.853491
std,9.316387,8.519643,1.002509,0.798009,1.849019,0.857861,5380.698361
min,18.0,5.0,1.0,3.0,1.0,0.0,1000.0
25%,31.0,9.0,3.0,3.0,2.0,1.0,20346.0
50%,36.0,13.0,4.0,3.0,3.0,1.0,22347.0
75%,44.0,20.0,4.0,4.0,4.0,2.0,25571.0
max,61.0,127.0,6.0,5.0,22.0,3.0,98678.0


### replacing nan value with

median for age\n
mode for type of contact\n
median for duraction of pitch
mode for nooffollowups
median for number of tips
 mode for preffered property star
 mode for no of children visiting
median for monthly income

In [27]:
df.Age.fillna(df.Age.median(),inplace=True)

#tpe of contact
df.TypeofContact.fillna(df.TypeofContact.mode()[0],inplace=True)

#duration of pitch
df.DurationOfPitch.fillna(df.DurationOfPitch.median(),inplace=True)

df.NumberOfFollowups.fillna(df.NumberOfFollowups.mode()[0],inplace=True)

df.PreferredPropertyStar.fillna(df.PreferredPropertyStar.mode()[0],inplace=True)

df.NumberOfTrips.fillna(df.NumberOfTrips.median(),inplace=True)

df.NumberOfChildrenVisiting.fillna(df.NumberOfChildrenVisiting.mode()[0],inplace=True)

df.MonthlyIncome.fillna(df.MonthlyIncome.median(),inplace=True)


In [28]:
df.isnull().sum()

CustomerID                  0
ProdTaken                   0
Age                         0
TypeofContact               0
CityTier                    0
DurationOfPitch             0
Occupation                  0
Gender                      0
NumberOfPersonVisiting      0
NumberOfFollowups           0
ProductPitched              0
PreferredPropertyStar       0
MaritalStatus               0
NumberOfTrips               0
Passport                    0
PitchSatisfactionScore      0
OwnCar                      0
NumberOfChildrenVisiting    0
Designation                 0
MonthlyIncome               0
dtype: int64

In [29]:
df.drop(columns=['CustomerID'],inplace=True)

In [30]:
df.head()

Unnamed: 0,ProdTaken,Age,TypeofContact,CityTier,DurationOfPitch,Occupation,Gender,NumberOfPersonVisiting,NumberOfFollowups,ProductPitched,PreferredPropertyStar,MaritalStatus,NumberOfTrips,Passport,PitchSatisfactionScore,OwnCar,NumberOfChildrenVisiting,Designation,MonthlyIncome
0,1,41.0,Self Enquiry,3,6.0,Salaried,Female,3,3.0,Deluxe,3.0,Single,1.0,1,2,1,0.0,Manager,20993.0
1,0,49.0,Company Invited,1,14.0,Salaried,Male,3,4.0,Deluxe,4.0,Divorced,2.0,0,3,1,2.0,Manager,20130.0
2,1,37.0,Self Enquiry,1,8.0,Free Lancer,Male,3,4.0,Basic,3.0,Single,7.0,1,3,0,0.0,Executive,17090.0
3,0,33.0,Company Invited,1,9.0,Salaried,Female,2,3.0,Basic,3.0,Divorced,2.0,1,5,1,1.0,Executive,17909.0
4,0,36.0,Self Enquiry,1,8.0,Small Business,Male,2,3.0,Basic,4.0,Divorced,1.0,0,5,1,0.0,Executive,18468.0


In [31]:
df['Total visiting']=df['NumberOfChildrenVisiting']+df['NumberOfPersonVisiting']
df.drop(columns=['NumberOfChildrenVisiting','NumberOfPersonVisiting'],inplace=True)

In [32]:
df.head()

Unnamed: 0,ProdTaken,Age,TypeofContact,CityTier,DurationOfPitch,Occupation,Gender,NumberOfFollowups,ProductPitched,PreferredPropertyStar,MaritalStatus,NumberOfTrips,Passport,PitchSatisfactionScore,OwnCar,Designation,MonthlyIncome,Total visiting
0,1,41.0,Self Enquiry,3,6.0,Salaried,Female,3.0,Deluxe,3.0,Single,1.0,1,2,1,Manager,20993.0,3.0
1,0,49.0,Company Invited,1,14.0,Salaried,Male,4.0,Deluxe,4.0,Divorced,2.0,0,3,1,Manager,20130.0,5.0
2,1,37.0,Self Enquiry,1,8.0,Free Lancer,Male,4.0,Basic,3.0,Single,7.0,1,3,0,Executive,17090.0,3.0
3,0,33.0,Company Invited,1,9.0,Salaried,Female,3.0,Basic,3.0,Divorced,2.0,1,5,1,Executive,17909.0,3.0
4,0,36.0,Self Enquiry,1,8.0,Small Business,Male,3.0,Basic,4.0,Divorced,1.0,0,5,1,Executive,18468.0,2.0


In [33]:
#number of numeric and non numeric feature
num_feature = [feature for feature in df.columns if df[feature].dtype!='O']
obj_feature = [feature for feature in df.columns if df[feature].dtype=='O']
print('numeric feature are' ,len(num_feature))
print('object feature are', len(obj_feature))

numeric feature are 12
object feature are 6


In [34]:
# discrete features
discrete_feature = [feature for feature in num_feature if len(df[feature].unique())<=25]


In [35]:
# contineous festures
conti_feature = [feature for feature in num_feature if len(df[feature].unique())>25]


In [36]:
print(len(discrete_feature))
print(len(conti_feature))

9
3


In [37]:
df.head()

Unnamed: 0,ProdTaken,Age,TypeofContact,CityTier,DurationOfPitch,Occupation,Gender,NumberOfFollowups,ProductPitched,PreferredPropertyStar,MaritalStatus,NumberOfTrips,Passport,PitchSatisfactionScore,OwnCar,Designation,MonthlyIncome,Total visiting
0,1,41.0,Self Enquiry,3,6.0,Salaried,Female,3.0,Deluxe,3.0,Single,1.0,1,2,1,Manager,20993.0,3.0
1,0,49.0,Company Invited,1,14.0,Salaried,Male,4.0,Deluxe,4.0,Divorced,2.0,0,3,1,Manager,20130.0,5.0
2,1,37.0,Self Enquiry,1,8.0,Free Lancer,Male,4.0,Basic,3.0,Single,7.0,1,3,0,Executive,17090.0,3.0
3,0,33.0,Company Invited,1,9.0,Salaried,Female,3.0,Basic,3.0,Divorced,2.0,1,5,1,Executive,17909.0,3.0
4,0,36.0,Self Enquiry,1,8.0,Small Business,Male,3.0,Basic,4.0,Divorced,1.0,0,5,1,Executive,18468.0,2.0


In [38]:
x = df.drop(columns=['ProdTaken'])

In [39]:
x.head()

Unnamed: 0,Age,TypeofContact,CityTier,DurationOfPitch,Occupation,Gender,NumberOfFollowups,ProductPitched,PreferredPropertyStar,MaritalStatus,NumberOfTrips,Passport,PitchSatisfactionScore,OwnCar,Designation,MonthlyIncome,Total visiting
0,41.0,Self Enquiry,3,6.0,Salaried,Female,3.0,Deluxe,3.0,Single,1.0,1,2,1,Manager,20993.0,3.0
1,49.0,Company Invited,1,14.0,Salaried,Male,4.0,Deluxe,4.0,Divorced,2.0,0,3,1,Manager,20130.0,5.0
2,37.0,Self Enquiry,1,8.0,Free Lancer,Male,4.0,Basic,3.0,Single,7.0,1,3,0,Executive,17090.0,3.0
3,33.0,Company Invited,1,9.0,Salaried,Female,3.0,Basic,3.0,Divorced,2.0,1,5,1,Executive,17909.0,3.0
4,36.0,Self Enquiry,1,8.0,Small Business,Male,3.0,Basic,4.0,Divorced,1.0,0,5,1,Executive,18468.0,2.0


In [40]:
y = df['ProdTaken']

In [41]:
y

0       1
1       0
2       1
3       0
4       0
       ..
4883    1
4884    1
4885    1
4886    1
4887    1
Name: ProdTaken, Length: 4888, dtype: int64

In [42]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.3)

In [43]:
cat_feature = x.select_dtypes(include='object').columns
num_features1 = x.select_dtypes(exclude='object').columns

from sklearn.preprocessing import OneHotEncoder, StandardScaler
encoder = OneHotEncoder(drop='first')
scaler = StandardScaler()

from sklearn.compose import ColumnTransformer

transformer  = ColumnTransformer(
    [
        ("OneHotEncoder",encoder,cat_feature),
        ("StandardScaler",scaler,num_features1)
    ]
)

In [44]:
transformer

In [45]:
#transform training data
x_train=transformer.fit_transform(x_train)
#transforming test data
x_test = transformer.transform(x_test)

In [46]:
pd.DataFrame(x_train)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,16,17,18,19,20,21,22,23,24,25
0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,...,1.458173,1.015687,-0.712240,1.798453,-1.220041,-0.64360,-0.044742,-1.28768,-1.233129,-0.773300
1,1.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,...,-0.716848,-0.287283,-0.712240,-0.724100,-0.673308,1.55376,0.687621,0.77659,-0.263728,-0.773300
2,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,...,-0.716848,-1.116446,-0.712240,-0.724100,-0.673308,-0.64360,-0.044742,0.77659,-0.224169,-1.481589
3,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,1.0,...,0.370663,-0.524187,2.288466,0.537176,2.607091,-0.64360,1.419984,-1.28768,0.311242,0.643278
4,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,...,1.458173,-0.524187,0.287995,-0.724100,-0.126575,1.55376,1.419984,0.77659,-0.128567,-0.065011
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3416,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,...,0.370663,-0.879542,0.287995,-0.724100,-1.220041,-0.64360,-1.509468,-1.28768,2.005318,-0.773300
3417,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,...,1.458173,-0.287283,0.287995,1.798453,-0.673308,-0.64360,-0.044742,-1.28768,-0.224169,-1.481589
3418,1.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,...,-0.716848,2.081753,-0.712240,-0.724100,2.060358,-0.64360,-0.044742,-1.28768,-0.633338,-0.773300
3419,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,...,-0.716848,-0.168831,1.288230,-0.724100,2.607091,-0.64360,1.419984,0.77659,-0.444655,1.351567


### trainning the model

In [47]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score,f1_score,precision_score,recall_score,accuracy_score

In [48]:
models = {
    "Logistic Regression ":LogisticRegression(),
    "Support Vactor Classifier":SVC(),
    "Decision Tree Regressor":DecisionTreeClassifier(),
    "Random Forest classifier":RandomForestClassifier(),
    "GradientBoostingClassifier": GradientBoostingClassifier()
}

In [49]:
for i in range(len(list(models))):
    model = list(models.values())[i]
    model.fit(x_train,y_train)

    y_train_pred = model.predict(x_train)
    y_test_pred = model.predict(x_test)

    print('Model is ', list(models.keys())[i])
    print('Model Performence for training set')
    print('f1_score:',f1_score(y_train,y_train_pred,average='weighted'))
    print('accuracy_score:',accuracy_score(y_train,y_train_pred))
    print('precision_score:',precision_score(y_train,y_train_pred))
    print('recall_score:',recall_score(y_train,y_train_pred))
    print('roc_auc_score:',roc_auc_score(y_train,y_train_pred))
    print('--------------------------------------------------------------')

    print('Model Performence for test set')
    print('f1_score:',f1_score(y_test,y_test_pred,average='weighted'))
    print('accuracy_score:',accuracy_score(y_test,y_test_pred))
    print('precision_score:',precision_score(y_test,y_test_pred))
    print('recall_score:',recall_score(y_test,y_test_pred))
    print('roc_auc_score:',roc_auc_score(y_test,y_test_pred))
    print('='*35)
    print('\n')



Model is  Logistic Regression 
Model Performence for training set
f1_score: 0.8241209007020929
accuracy_score: 0.8465361005553932
precision_score: 0.7047619047619048
recall_score: 0.3394495412844037
roc_auc_score: 0.6529195664499358
--------------------------------------------------------------
Model Performence for test set
f1_score: 0.8151593828085208
accuracy_score: 0.8370824812542604
precision_score: 0.5985401459854015
recall_score: 0.3082706766917293
roc_auc_score: 0.6312377530003194


Model is  Support Vactor Classifier
Model Performence for training set
f1_score: 0.8785307438415894
accuracy_score: 0.8918444899152295
precision_score: 0.9011299435028248
recall_score: 0.4877675840978593
roc_auc_score: 0.7375592528367865
--------------------------------------------------------------
Model Performence for test set
f1_score: 0.8480117945383076
accuracy_score: 0.8670756646216768
precision_score: 0.7709923664122137
recall_score: 0.37969924812030076
roc_auc_score: 0.6773600320534893


Mo

## HYPERPARAMETER TUNNING

In [50]:
param = {
   'criterion':['gini', 'entropy', 'log_loss'],
   'max_depth':[1,2,3,4,5,8,15,None,10],
   'max_features':['auto','sqrt','log2'],
   'n_estimators':[100,200,500]
}
GradientBoostingClassifier()
gb_param= {
    "loss":['log_loss', 'deviance', 'exponential'],
    "n_estimators": [100,200,500,700],
    "criterion":['friedman_mse', 'squared_error','mse'] ,
    "min_samples_split": [2,8,15,20],
    "max_depth":[5,8,15,None,10],
}

In [51]:
randomcv_model = [
    ("decidion Tree",RandomForestClassifier(),param),
    ("Gradient Boost Tree",GradientBoostingClassifier(),gb_param)
]

In [52]:
from sklearn.model_selection import RandomizedSearchCV
model_params={}
for name , model ,param in randomcv_model:
    random = RandomizedSearchCV(estimator=model,param_distributions=param,cv=3,verbose=3,n_jobs=-1)
    random.fit(x_train,y_train)
    model_params[name]=random.best_params_
for model_name in model_params:
    print(f'---------------Best Params for {model_name}-------------------')
    print(model_params[model_name])

Fitting 3 folds for each of 10 candidates, totalling 30 fits


Fitting 3 folds for each of 10 candidates, totalling 30 fits
---------------Best Params for decidion Tree-------------------
{'n_estimators': 200, 'max_features': 'log2', 'max_depth': None, 'criterion': 'entropy'}
---------------Best Params for Gradient Boost Tree-------------------
{'n_estimators': 700, 'min_samples_split': 8, 'max_depth': 15, 'loss': 'log_loss', 'criterion': 'friedman_mse'}


In [55]:
tunned_models={
    "randomForsest":RandomForestClassifier(n_estimators=200, max_features= 'sqrt', max_depth= 15, criterion= 'entropy'),
    "GradientBoostingClassifier":GradientBoostingClassifier(n_estimators= 700, min_samples_split= 8, max_depth= 15, loss= 'log_loss', criterion= 'friedman_mse')
}

In [57]:
for i in range(len(list(tunned_models))):
    tunned_model = list(tunned_models.values())[i]
    tunned_model.fit(x_train,y_train)

    y_train_pred = tunned_model.predict(x_train)
    y_test_pred = tunned_model.predict(x_test)

    print('Model is ', list(tunned_models.keys())[i])
    print('Model Performence for training set')
    print('f1_score:',f1_score(y_train,y_train_pred,average='weighted'))
    print('accuracy_score:',accuracy_score(y_train,y_train_pred))
    print('precision_score:',precision_score(y_train,y_train_pred))
    print('recall_score:',recall_score(y_train,y_train_pred))
    print('roc_auc_score:',roc_auc_score(y_train,y_train_pred))
    print('--------------------------------------------------------------')

    print('Model Performence for test set')
    print('f1_score:',f1_score(y_test,y_test_pred,average='weighted'))
    print('accuracy_score:',accuracy_score(y_test,y_test_pred))
    print('precision_score:',precision_score(y_test,y_test_pred))
    print('recall_score:',recall_score(y_test,y_test_pred))
    print('roc_auc_score:',roc_auc_score(y_test,y_test_pred))
    print('='*35)
    print('\n')



Model is  randomForsest
Model Performence for training set
f1_score: 0.9997076023907531
accuracy_score: 0.9997076878105817
precision_score: 1.0
recall_score: 0.9984709480122325
roc_auc_score: 0.9992354740061162
--------------------------------------------------------------
Model Performence for test set
f1_score: 0.9153157943264203
accuracy_score: 0.9216087252897068
precision_score: 0.9217877094972067
recall_score: 0.6203007518796992
roc_auc_score: 0.8043218996700745


Model is  GradientBoostingClassifier
Model Performence for training set
f1_score: 1.0
accuracy_score: 1.0
precision_score: 1.0
recall_score: 1.0
roc_auc_score: 1.0
--------------------------------------------------------------
Model Performence for test set
f1_score: 0.9494147409464719
accuracy_score: 0.950920245398773
precision_score: 0.9254385964912281
recall_score: 0.793233082706767
roc_auc_score: 0.8895391058829422


