<h1> Credit Card Lead Prediction </h1>
<h4> <strong>Description : </strong> <hr>
<i>The Happy Customer Bank wants to cross-sell its credit cards to its existing customers. The bank has identified a set of customers that are eligible for taking these credit cards. The task is to identify customers that shows higher intent towards a recommended credit card.</i></h4>

In [None]:
#Importing Libraries

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
plt.style.use('seaborn-bright')
%matplotlib inline
from scipy.stats import chi2_contingency
import category_encoders as ce
from sklearn.model_selection import GridSearchCV,RandomizedSearchCV,train_test_split
from sklearn.preprocessing import LabelEncoder,StandardScaler,OneHotEncoder
import warnings
warnings.filterwarnings("ignore")

<h5> Reading Dataset </h5>

In [None]:
train = pd.read_csv('../input/jobathon-may-2021-credit-card-lead-prediction/train.csv')
test  = pd.read_csv('../input/jobathon-may-2021-credit-card-lead-prediction/test.csv')
sub   = pd.read_csv('../input/jobathon-may-2021-credit-card-lead-prediction/sample_submission.csv')

In [None]:
train.head()

<h4> The Dataset consists of following features: </h4>

<h5>
    
<i>    
   
1. ID : Unique Identifier for a row

2. Gender: Gender of the Customer

3. Age : Age of the Customer (in Years)

4. Region_Code : Code of the Region for the customers

5. Occupation : Occupation Type for the customer

6. Channel_Code : Acquisition Channel Code for the Customer (Encoded)

7. Vintage : Vintage for the Customer (In Months), Number of Days, Customer has been associated with the    company

8. Credit_Product : If the Customer has any active credit product (Home loan, Personal loan, Credit Card    etc.)

10. Avg_Account_Balance : Average Account Balance for the Customer in last 12 Months

11. Is_Active : If the Customer is Active in last 3 Months

12. Is_Lead(Target) : If the Customer is interested for the Credit Card, 0 / 1: Customer is not             interested / interested.
</i>    
 </h5>

In [None]:
test.head()

<h1> Exploratory Data Analysis </h1>

In [None]:
# Printing number of columns and rows in the dataset
print("There are {} number of rows and {} number of columns in training data".format(train.shape[0],train.shape[1]))
print("There are {} number of rows and {} number of columns in testing data".format(test.shape[0],test.shape[1]))

In [None]:
# Checking for data imbalanceness if any
sns.countplot(y=train["Is_Lead"],linewidth=2,edgecolor='black')

<h4>Inference: </h4>
<h5><i>From the above statistics, it appears that the dataset is imbalanced. Class 0 contains 1,75,000+ training examples while Class 1 contains about 65,000 training samples</i></h5>

In [None]:
# Checking the type of columns in dataset
train.info()

In [None]:
# Describing columns statistics
train.describe()

In [None]:
# Exploring the missing values in training data.
train.isnull().sum()

In [None]:
# Exploring the missing values in testing data.
test.isnull().sum()

<h4> Inference </h4>

<h5><i>From the above results, it appears that the column "Credit_Product" contains the highest number missing values in both the dataset i.e. Train(29,325) and Test set(12,522).</i> </h5>

In [None]:
#Summarizing training data using Data Profiling
import pandas_profiling
train.profile_report()

In [None]:
#Correlation Graph
plt.figure(figsize=(10,10))
sns.heatmap(train.corr(),annot=True)
plt.show()

<h4> Inference </h4>

<h5><i>From the above results, it appears that the columns "Age" and "Vintage" are weakly correlated with a magnitude of 0.63.</i></h5>

In [None]:
# Phik's correlation Graph
import phik
temp = [feature for feature in train.columns if feature not in ['ID']]
plt.figure(figsize=(10,10))
sns.heatmap(train[temp].phik_matrix(),annot=True)
plt.show()

<h4> Inference </h4>

<h5><i>The above heatmap is used for estimating the correlation between categorical and numerial variables using Phik's correlation coefficient. Further, from the above graph, it appears that the columns "Age","Occupation","Channel_Code" and "Vintage" shows some degree of correlation. We will analyse these columns in next section.</i> </h5>

In [None]:
# Further exploring the above relationships
#Relationship between Occupation and Age
sns.boxplot(x=train['Occupation'],y=train["Age"])

<h4> Inference </h4>

<h5><i>From the above box-plot, it appears that the people with "Other" as their occupation contains max. Age group distribution(10-80+ years) followed by people with "Self-Employed" profession shows (10-60) years of Age groups. The remaining columns contains outliers and shows min. distribution</i> </h5>

In [None]:
# Relationship between Channel_Code and Age
sns.boxplot(x="Channel_Code",y='Age',data=train)

<h4> Inference </h4>

<h5><i>From the above box-plot, it appears that the columns contains lots of outliers and the degree of relationship amongst them is also weak(0.67).</i> </h5>

In [None]:
# Relationship between "Age" and "Vintage"
sns.scatterplot(x="Vintage",y='Age',data=train)

<h4> Inference </h4>

<h5><i>From the above scatter-plot, it appears that no specific relationship exists between the columns. Also, their phik's correlation is also weak(0.66)</i> </h5>

In [None]:
# Categorical Features and their cardinality
categorical_feature = [feature for feature in train.columns if feature not in ['Age','Vintage','Avg_Account_Balance','ID']]
for feature in categorical_feature:
    print("The feature is {} and its cardinality is {}".format(feature,len(train[feature].unique())))

<h4> Inference </h4>

<h5><i>From the above results, it appears that the column "Region_Code" consists of maximum cardinality count of 35. Let's see if we can reduce it's cardinality and make it useful for prediction</i> </h5>

In [None]:
# Visualizing relationsip between categorical and target values
column_list =  ['Gender','Occupation','Channel_Code','Credit_Product','Is_Active']

plt.figure(figsize=(12,12))
for i in range(0,len(column_list)):
    data = train.copy()
    plt.subplot(4,3,i+1)
    data.groupby(column_list[i])['Is_Lead'].mean().plot(kind="bar",linewidth=2,edgecolor="black")
    plt.ylabel('Leads')
    plt.tight_layout()


<h4> Inference </h4>

<h5><i>From the above bar-plots, following observations can be deduced:
    
* **Gender :** It appears that "Male" category shows higher intent towards the recommended credit card.
* **Occupation:** It appears that the person having "Enterpreneur" as Occupation shows more interest while "Salaried" person shows the least.
* **Channel_Code:** It appears that the person subscribed through medium "X3" shows higher intent while "X1" shows the lowest.
* **Credit_Product:** It appears that the person with active credit product shows more interest.
* **Is_Active:** It appears that persons with active account shows more interest.

</i> </h5>

<h1> Data Preprocessing </h1>

In [None]:
# Imputing the missing features
train_imputed = train.copy()
test_imputed  = test.copy() 
feature = ['Credit_Product']

def impute_missing(train_data,test_data,feature):
    train_data[feature+'_nan'] = np.where(train_data[feature].isnull(),1,0)
    train_data[feature] = np.where(train_data[feature].isnull(),train_data[feature].mode(),train_data[feature])
    test_data[feature+'_nan'] = np.where(test_data[feature].isnull(),1,0)
    test_data[feature] =  np.where(test_data[feature].isnull(),test_data[feature].mode(),test_data[feature])
    return train_data,test_data
    
train_imputed,test_imputed = impute_missing(train_imputed,test_imputed,feature[0])

In [None]:
#Checking the presence of null values.
train_imputed.isnull().sum()

In [None]:
#Checking the presence of null values.
test_imputed.isnull().sum()

In [None]:
train_imputed.head(10)

<h1> Feature Engineering </h1>

<h4><i> We will be designing features using both manula and automatic approach (Using Feature Tools Library).</i> </h4>

<h4> 1. Designing Features Automatically. </h4>

In [None]:
# Seperating target and training data as input for automatic feature designing. 
y_temp = train_imputed[['Is_Lead']]
temp_train =  train_imputed.drop(columns=["Is_Lead"],axis=1)

In [None]:
#!pip install featuretools  --> Uncomment for installing automatic feature library.

# Initialising feature tools library on training data.
import featuretools as ft
es = ft.EntitySet(id='y_temp')
es.entity_from_dataframe(entity_id='credit_card',dataframe=temp_train,index='ID')
print(es)

In [None]:
# Initialising feature tools library on testing data.

es_2 = ft.EntitySet(id='y_temp')
es_2.entity_from_dataframe(entity_id='credit_card_1',dataframe=test_imputed,index='ID')
print(es_2)

In [None]:
#Generating 18 features for training data. 
feature_matrix,feature_names = ft.dfs(entityset=es,target_entity='credit_card',max_depth=2,verbose=1,
                                      n_jobs=-1,trans_primitives=['percentile','cum_mean'])

In [None]:
#Generating 18 features for testing data. 

feature_matrix_test,feature_names_test = ft.dfs(entityset=es_2,target_entity='credit_card_1',max_depth=2,verbose=1,
                                      n_jobs=-1,trans_primitives=['percentile','cum_mean'])

In [None]:
feature_matrix_test.head()

In [None]:
pd.set_option('display.max_rows', None) 
pd.set_option('display.max_columns', None)

ft.primitives.list_primitives()

In [None]:
# Visualizing features created using  automatic approach.
feature_matrix_test.head(10)

In [None]:
# Filtering out irrelevant features and resetting the index in training and testing data.
feature_matrix = feature_matrix.reindex(index=temp_train['ID'])
train_imputed = feature_matrix.reset_index()
train_imputed = train_imputed.drop(columns=['PERCENTILE(Avg_Account_Balance)','PERCENTILE(Credit_Product_nan)',
                                            'PERCENTILE(Vintage)','CUM_MEAN(Credit_Product_nan)'],axis=1)
feature_matrix_test = feature_matrix_test.reindex(index=test_imputed['ID'])
test_imputed = feature_matrix_test.reset_index()
test_imputed = test_imputed.drop(columns=['PERCENTILE(Avg_Account_Balance)','PERCENTILE(Credit_Product_nan)',
                                            'PERCENTILE(Vintage)','CUM_MEAN(Credit_Product_nan)'],axis=1)

In [None]:
train_imputed.head()

In [None]:
test_imputed.head()

<h4> 2. Designing Features Using Manually Approach. </h4>

In [None]:
# Feature Eng. numerical variables

def eng_age(train_data,test_data):
    train_data['Age_cat'] = pd.qcut(train_data.Age,  q=4, labels=False)
    test_data['Age_cat'] = pd.qcut(test_data.Age, q=4, labels=False)
    return train_data,test_data


def eng_vintage(train_data,test_data):
    train_data["Vintage_cat"] = pd.qcut(train_data["Vintage"], q=4, labels=False)
    test_data["Vintage_cat"] = pd.qcut(test_data["Vintage"], q=4, labels=False)
    return train_data,test_data

def eng_region_code(train_data,test_data):
    feature_to_encode_freq = ["Region_Code"]
    count_enc = ce.CountEncoder()
    count_encoded_train = count_enc.fit_transform(train_data[feature_to_encode_freq])
    train_data = train_data.join(count_encoded_train.add_suffix("_count"))
    train_data["Region_Code_count"] = np.where(train_data["Region_Code_count"]<10000,0,train_data["Region_Code_count"])
    count_encoded_test = pd.DataFrame(count_enc.transform(test_data[feature_to_encode_freq]))
    test_data = test_data.join(count_encoded_test.add_suffix("_count"))
    test_data["Region_Code_count"] = np.where(test_data["Region_Code_count"]<10000,0,test_data["Region_Code_count"])
    return train_data,test_data


def feature_eng(train_data,test_data):
    '''
    Input: train_data : Training Data
           test_data  : Testing Data
           
    Output: Training and Testing data after applying Feature Eng.
    '''
    
    train_data,test_data = eng_age(train_data,test_data) # Feature Eng. Age column
    train_data,test_data = eng_vintage(train_data,test_data) # Feature Eng. Vintage column
    train_data,test_data = eng_region_code(train_data,test_data) # Feature Eng. Region_Code column
    return train_data,test_data

train_eng,test_eng = feature_eng(train_imputed,test_imputed) # Calling function feature_eng to generate features.


In [None]:
train_eng.head(10)

In [None]:
test_eng.head()

In [None]:
# Removing the skewness of columns using Power Transformer.
from sklearn.preprocessing import PowerTransformer
feature_to_transform = ['Avg_Account_Balance']
pt = PowerTransformer(method='yeo-johnson',standardize=False)
train_transform = pd.DataFrame(pt.fit_transform(train_eng[feature_to_transform]),columns=["Avg_acct_Bal_Transformed"])
test_transform = pd.DataFrame(pt.transform(test_eng[feature_to_transform]),columns=["Avg_acct_Bal_Transformed"])
train_transform.head()

In [None]:
# Visualizing the histogram of transformed column.
test_transform.hist()

In [None]:
# Merging the transformed column.
train_eng = pd.concat([train_eng,train_transform],axis=1)
test_eng  = pd.concat([test_eng,test_transform],axis=1)
train_eng.head()

In [None]:
test_eng.head()

In [None]:
# Removing redundant columns
train_eng = train_eng.drop(columns=["ID","Age","Vintage","Avg_Account_Balance","Region_Code"])
test_eng = test_eng.drop(columns=["ID","Age","Vintage","Avg_Account_Balance","Region_Code"])

In [None]:
train_eng.head(10)

In [None]:
# Using Feature Encoding Techniques for encoding required features.

le = LabelEncoder()
ohe = OneHotEncoder(handle_unknown='ignore',sparse=False)
cols_to_le = ["Region_Code_count"] 
cols_to_ohe = ["Gender","Occupation","Channel_Code","Credit_Product","Is_Active"]

def ohe_encoding_columns(train_data,test_data,cols):
    
    train_data = pd.get_dummies(train_data[cols],prefix = cols)
    test_data =  pd.get_dummies(test_data[cols],prefix  = cols)
    return train_data,test_data

def le_encoding_columns(train_data,test_data,col):

    train_data[col] = le.fit_transform(train_data[col])
    test_data[col] =  le.transform(test_data[col])
    return train_data,test_data


train_encoded_ohe,test_encoded_ohe = ohe_encoding_columns(train_eng,test_eng,cols_to_ohe) # One-Hot Encoding
train_encoded_le,test_encoded_le = le_encoding_columns(train_eng,test_eng,cols_to_le[0])  # Label Encoding

In [None]:
# Concatenating columns with both type of encoding and dropping the redundant features.
train_encoded =  pd.concat([train_encoded_ohe,train_encoded_le],axis=1)
test_encoded  =  pd.concat([test_encoded_ohe,test_encoded_le],axis=1)
train_encoded =  train_encoded.drop(columns=["Gender","Occupation","Channel_Code","Credit_Product","Is_Active"],axis=1)
test_encoded =  test_encoded.drop(columns=["Gender","Occupation","Channel_Code","Credit_Product","Is_Active"],axis=1)
train_encoded.head()

In [None]:
test_encoded.head()

In [None]:
# Seperating target and training data.
Y = y_temp.copy()
X = train_encoded.copy()
X_test = test_encoded.copy()

In [None]:
# Standardizing the variables using Standard Scaler.
sc = StandardScaler()

X_train_scaled = pd.DataFrame(sc.fit_transform(X))
X_train_scaled.columns = X.columns


X_test_scaled = pd.DataFrame(sc.fit_transform(X_test))
X_test_scaled.columns = X_test.columns

X_train_scaled.head()

In [None]:
X_test_scaled.head()

In [None]:
# Spitting the data using train_test_Split approach
X_train,x_test,Y_train,y_test = train_test_split(X_train_scaled,Y,test_size=0.1,stratify=Y,random_state=0)
x_train,x_valid,y_train,y_valid = train_test_split(X_train,Y_train,test_size=0.1,stratify=Y_train,random_state=0)

<h1> Modelling </h1>

In [None]:
# Importing the libraries for different modelling approaches.
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score,f1_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from xgboost import XGBClassifier
from catboost import CatBoostClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import roc_auc_score

<h4>  Logistic Regression </h4>

In [None]:
model_1=LogisticRegression(max_iter=500,random_state=0)
model_1.fit(x_train,y_train.values.ravel())
pred = model_1.predict(x_valid)
score_1 = accuracy_score(y_valid,pred)
score_1

In [None]:
predictions_1 = model_1.predict(x_test)
score_1 = accuracy_score(y_test,predictions_1)
score_1

In [None]:
from sklearn.metrics import classification_report
print(classification_report(y_test,predictions_1,target_names=['0','1']))

<h4>  KNN Classifier </h4>

In [None]:
model_2 = KNeighborsClassifier()
model_2.fit(x_train,y_train.values.ravel())
pred = model_2.predict(x_valid)
score_2 = accuracy_score(y_valid,pred)
score_2

In [None]:
predictions_2 = model_2.predict(x_test)
print(classification_report(y_test,predictions_2,target_names=['0','1']))

<h4> Naive Bayes </h4>

In [None]:
model_4=GaussianNB()
model_4.fit(x_train,y_train.values.ravel())
pred = model_4.predict(x_valid)
score_4 = accuracy_score(y_valid,pred)
score_4

In [None]:
predictions_4 = model_4.predict(x_test)
print(classification_report(y_test,predictions_4,target_names=['0','1']))

<h4>  Decision Trees </h4>

In [None]:
model_5=DecisionTreeClassifier(random_state=0)
model_5.fit(x_train,y_train.values.ravel())
pred = model_5.predict(x_valid)
score_5 = accuracy_score(y_valid,pred)
score_5

In [None]:
predictions_5 = model_5.predict(x_test)
print(classification_report(y_test,predictions_5,target_names=['0','1']))

<h4> Random Forest </h4>

In [None]:
model_6=RandomForestClassifier(random_state=0)
model_6.fit(x_train,y_train.values.ravel())
pred = model_6.predict(x_valid)
score_6 = accuracy_score(y_valid,pred)
score_6

In [None]:
predictions_6 = model_6.predict(x_test)
print(classification_report(y_test,predictions_6,target_names=['0','1']))

<h4> XGBOOST </h4>

In [None]:
from xgboost import XGBClassifier
model_7 = XGBClassifier()
model_7.fit(x_train,y_train.values.ravel())
pred = model_7.predict(x_valid)
score_7 = accuracy_score(y_valid,pred)
score_7

In [None]:
predictions_7 = model_7.predict(x_test)
print(classification_report(y_test,predictions_7,target_names=['0','1']))

<h4> CATBOOST </h4>

In [None]:
from catboost import CatBoostClassifier
cat_model = CatBoostClassifier(verbose=2,iterations=500,od_type='Iter')
cat_model.fit(x_train,y_train,eval_set=(x_valid,y_valid))
print(cat_model.best_score_)

In [None]:
predictions_8 = cat_model.predict(x_test)
print(classification_report(y_test,predictions_8,target_names=['0','1']))

<h4> LGBM </h4>

In [None]:
model_9 = LGBMClassifier()
model_9.fit(x_train,y_train.values.ravel())
pred_9 = model_9.predict(x_valid)
score_9 = accuracy_score(y_valid,pred_9)
score_9

In [None]:
predictions_9 = model_9.predict(x_test)
print(classification_report(y_test,predictions_9,target_names=['0','1']))

In [None]:
print(accuracy_score(y_test,predictions_9))

<h4>  Passive Aggressive Classifier </h4>

In [None]:
from sklearn.linear_model import PassiveAggressiveClassifier
model_10 = PassiveAggressiveClassifier()
model_10.fit(x_train,y_train.values.ravel())
pred_10 = model_10.predict(x_valid)
score_10 = accuracy_score(y_valid,pred_10)
score_10

In [None]:
predictions_10 = model_10.predict(x_test)
print(classification_report(y_test,predictions_10,target_names=['0','1']))

In [None]:
print(accuracy_score(y_test,predictions_10))

<h4> Applying Hyper-Parameter Tuning </h4>

<h4> Tuning Random Forest </h4>

In [None]:
param_grid={'max_depth':range(6,9),'n_estimators':range(300,500,100),"max_features": range(9,11)}
rand_search_rf = RandomizedSearchCV(RandomForestClassifier(),param_grid,verbose=1,cv=10,n_jobs=-1)
rand_search_rf.fit(x_train,y_train.values.ravel())

In [None]:
rand_search_rf.best_params_

In [None]:
rand_search_predictions = rand_search_rf.predict(x_test)
print(classification_report(y_test,rand_search_predictions,target_names=['0','1']))

In [None]:
print(accuracy_score(y_test,rand_search_predictions))

<h4> Tuning XGBOOST </h4>

In [None]:
param_grid_xg={"learning_rate" : [0.05,0.07] ,
 "max_depth"        : [ 1,3,5],
 "min_child_weight" : [ 3,5],
 "gamma"            : [ 0.0],
 "colsample_bytree" : [0.3,0.5],
 "n_estimators"     : [300,500]}
rand_search_xg = RandomizedSearchCV(XGBClassifier(),param_grid_xg,verbose=1,cv=5,n_jobs=-1)
rand_search_xg.fit(x_train,y_train.values.ravel())

In [None]:
rand_search_xg.best_params_

In [None]:
rand_search_predictions_xg = rand_search_xg.predict(x_test)
print(classification_report(y_test,rand_search_predictions_xg,target_names=['0','1']))

In [None]:
from sklearn.metrics import roc_auc_score
print(accuracy_score(y_test,rand_search_predictions_xg))

<h4> Tuning CatBoost </h4>

In [None]:
# Cat Boosting Tuning
param_grid_cat = {'iterations': [80],
                 'depth': range(8, 9),
                 'learning_rate': [0.1],
                 'bagging_temperature': [0.9],
                 'border_count': range(202, 203),
                 'l2_leaf_reg': range(20, 21),
                 'scale_pos_weight': [1.0]}

In [None]:
rand_search_cat = RandomizedSearchCV(CatBoostClassifier(verbose=2,od_type='Iter'),param_grid_cat,verbose=1,cv=10,n_jobs=-1)
rand_search_cat.fit(x_train,y_train.values.ravel())

In [None]:
rand_search_cat.best_params_

In [None]:
rand_search_predictions_cat = rand_search_cat.predict(x_test)
print(classification_report(y_test,rand_search_predictions_cat,target_names=['0','1']))

In [None]:
from sklearn.metrics import roc_auc_score
print(accuracy_score(y_test,rand_search_predictions_cat))

<h4> Tuning LBGM </h4>

In [None]:
# LGBM Tuning
param_grid_gbm = {
        'n_estimators' : [700,900,1000],
        'learning_rate' : [0.01,0.03],
        'max_depth' : [ 7, 5,6]}
rand_search_gbm = RandomizedSearchCV(LGBMClassifier(),param_grid_gbm,verbose=1,cv=10,n_jobs=-1)
rand_search_gbm.fit(x_train,y_train.values.ravel()),

In [None]:
rand_search_gbm.best_params_

In [None]:
rand_search_predictions_gbm = rand_search_gbm.predict(x_test)
print(classification_report(y_test,rand_search_predictions_gbm,target_names=['0','1']))

In [None]:
from sklearn.metrics import roc_auc_score
print(accuracy_score(y_test,rand_search_predictions_gbm))

<h4>  Voting Classifier </h4>
<h5><i> I have implemented Voting classification two hyper-parameter tuned classifiers: XGBOOST and CATBOOST.</i></h5>

In [None]:

from sklearn.ensemble import VotingClassifier

clf_1 = XGBClassifier(learning_rate=0.07 ,max_depth=5,min_child_weight=3,gamma=0.0,colsample_bytree=0.3,
         n_estimators=500 )
clf_2 = CatBoostClassifier(iterations=80,depth=8,learning_rate=0.1,bagging_temperature=0.9,border_count=202,
        l2_leaf_reg=20,scale_pos_weight=1)
voting_class = VotingClassifier(estimators=[('xgb',clf_1),('cat',clf_2)],voting='soft')
voting_class.fit(x_train,y_train.values.ravel())
voting_pred = voting_class.predict(x_test)
print(classification_report(y_test,voting_pred,target_names=['0','1']))

In [None]:
from sklearn.metrics import roc_auc_score
print(accuracy_score(y_test,voting_pred))

<h4> Stacking Classifier </h4>
<h5> <i>I have implemented stacking classifier using three classification techniques i.e XGBOOST,CATBOOST and LGBM with hyper-parameters tuned.</i></h5>

In [None]:
from sklearn.ensemble import StackingClassifier
clf_1 = XGBClassifier(learning_rate=0.07 ,max_depth=5,min_child_weight=3,gamma=0.0,colsample_bytree=0.3,
        n_estimators=500, use_label_encoder=False,eval_metric='logloss' )
clf_2 = CatBoostClassifier(iterations=80,depth=8,learning_rate=0.1,bagging_temperature=0.9,border_count=202,
        l2_leaf_reg=20,scale_pos_weight=1)
clf_3 = LGBMClassifier(n_estimators= 900, max_depth = 5, learning_rate = 0.01)
estimators = [('xgb', clf_1),('cat',clf_2),('lgbm',clf_3)]
stack_model = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
stack_model.fit(x_train,y_train.values.ravel())
stack_pred = stack_model.predict(x_test)
print(classification_report(y_test,stack_pred,target_names=['0','1']))

In [None]:
from sklearn.metrics import roc_auc_score
print(accuracy_score(y_test,stack_pred))

<h4><strong> Since, The Stacking Classifier shows the highly accurate results then other models, therefore, using this as the final model for prediction .</strong> <h4>

<h4> Prediction on Test Set </h4>

In [None]:
final_predictions =  pd.DataFrame(stack_model.predict_proba(X_test_scaled))[1]
final_predictions.columns = ["Is_Lead"]
final_predictions = pd.concat([test["ID"],final_predictions],axis=1)
final_predictions.head()