**Credit Card Lead Prediction**

Happy Customer Bank is a mid-sized private bank that deals in all kinds of banking products, like Savings accounts, Current accounts, investment products, credit products, among other offerings.



The bank also cross-sells products to its existing customers and to do so they use different kinds of communication like tele-calling, e-mails, recommendations on net banking, mobile banking, etc. 



In this case, the Happy Customer Bank wants to cross sell its credit cards to its existing customers. The bank has identified a set of customers that are eligible for taking these credit cards.



Now, the bank is looking for your help in identifying customers that could show higher intent towards a recommended credit card, given:

Customer details (gender, age, region etc.)
Details of his/her relationship with the bank (Channel_Code,Vintage, 'Avg_Asset_Value etc.)

**Loading the dataset**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


In [None]:
df=pd.read_csv('../input/jobathon-may-2021-credit-card-lead-prediction/train.csv')
test=pd.read_csv('../input/jobathon-may-2021-credit-card-lead-prediction/test.csv')

In [None]:
df.isna().sum()

In [None]:
test.isna().sum()

# **Filling the nan values to others as the bank is unaware if the Customer has any active credit product**

In [None]:
df.fillna('Others', inplace=True)


# As the Average account balance is continuous in nature and the data is wide spread combining them into 10 bins to use them as categories

In [None]:
df.Avg_Account_Balance=pd.qcut(df.Avg_Account_Balance, q=10, labels=[0,1,2,3,4,5,6,7,8,9])
df.head(10)

**Label Encoding of the data**

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['Gender'] = le.fit_transform(df['Gender'])
df['Region_Code'] = le.fit_transform(df['Region_Code'])
df['Occupation'] = le.fit_transform(df['Occupation'])
df['Channel_Code'] = le.fit_transform(df['Channel_Code'])
df['Credit_Product'] = le.fit_transform(df['Credit_Product'])
df['Is_Active'] = le.fit_transform(df['Is_Active'])

In [None]:
df.isna().sum()

In [None]:
df.to_csv('train_extracted.csv',index=False)

# Data Visualization using AutoViz

In [None]:
!pip install xlrd

In [None]:
pip install autoviz


In [None]:
from autoviz.AutoViz_Class import AutoViz_Class

AV = AutoViz_Class()

In [None]:
filename = "./train_extracted.csv"
sep = ","
dft = AV.AutoViz(
    filename,
    sep=",",
    depVar="",
    dfte=None,
    header=0,
    verbose=0,
    lowess=False,
    chart_format="svg",
    max_cols_analyzed=30,
)


In [None]:
train=pd.read_csv('./train_extracted.csv')


# Removing the outliers

In [None]:
train=train[train.Avg_Account_Balance<train.Avg_Account_Balance.quantile(0.99)]
train.isna().sum()

In [None]:
train.fillna('Others', inplace=True)

In [None]:
del train['ID']

### Test data

In [None]:
test.Credit_Product.fillna('Others', inplace=True)


In [None]:
test.Avg_Account_Balance=pd.qcut(test.Avg_Account_Balance, q=10, labels=[0,1,2,3,4,5,6,7,8,9])

In [None]:
test['Gender'] = le.fit_transform(test['Gender'])
test['Region_Code'] = le.fit_transform(test['Region_Code'])
test['Occupation'] = le.fit_transform(test['Occupation'])
test['Channel_Code'] = le.fit_transform(test['Channel_Code'])
test['Credit_Product'] = le.fit_transform(test['Credit_Product'])
test['Is_Active'] = le.fit_transform(test['Is_Active'])

In [None]:
test.to_csv('test_extracted.csv',index=False)

In [None]:
sub_df=pd.read_csv('./test_extracted.csv')

In [None]:
sub_df.isna().sum()

In [None]:
sub_df.fillna('NA', inplace=True)

In [None]:
X=train.drop(columns=['Is_Lead'],axis=1)
y=train.Is_Lead

# Data split and model building

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score


In [None]:
X_train, X_test, y_train, y_test=train_test_split(X,y, test_size=0.30, random_state=42,) #stratify=y)

In [None]:
from sklearn.ensemble import BaggingClassifier
import xgboost as xgb

from lightgbm import LGBMClassifier
from catboost import CatBoostClassifier
model = BaggingClassifier(base_estimator=LGBMClassifier(boosting_type='gbdt',
                                                class_weight=None,
                                                colsample_bytree=1.0,
                                                importance_type='split',
                                                learning_rate=0.1, max_depth=-1,
                                                min_child_samples=20,
                                                min_child_weight=0.001,
                                                min_split_gain=0.0,
                                                n_estimators=100, n_jobs=-1,
                                                num_leaves=31, objective=None,
                                                random_state=123, reg_alpha=0.0,
                                                reg_lambda=0.0, silent=True,
                                                subsample=1.0,
                                                subsample_for_bin=200000,
                                                subsample_freq=0),
                  bootstrap=True, bootstrap_features=False, max_features=1.0,
                  max_samples=1.0, n_estimators=10, n_jobs=1, oob_score=False,
                  random_state=123, verbose=0, warm_start=False)

model.fit(X_train, y_train)
model.score(X_test, y_test)

In [None]:
roc_auc_score(y_true=y_test, y_score=model.predict_proba(X_test)[:, 1])

In [None]:
data = model.predict_proba(sub_df.loc[:, sub_df.columns != 'ID'])
values = pd.DataFrame({'class_0': data[:, 0], 'class_1': data[:, 1]})

In [None]:
sub_df['Is_Lead']=values['class_1']

In [None]:
sub_df

In [None]:
sub_df[['ID','Is_Lead']].to_csv('sub.csv', index=False)