**Description:**

Project Description

**Project Name:** Churn Predition - Die Zeit

**Team:** Carlotta Ulm, Silas Mederer, Jonas Bechthold

**Date:** 2020-10-26 to ...

# Setting up environment and imports

In [20]:
# data analysis and wrangling
import pandas as pd
import numpy as np
import math
import itertools
from time import time

# visualization
import seaborn as sns
sns.set(style="white")   

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')
from pandas.plotting import scatter_matrix

# warnings handler
import warnings
warnings.filterwarnings("ignore")

random_state = 100           # Ensures modeling results can be replicated
np.random.seed(42)

# Display Options for pandas
pd.set_option('display.max_columns', None) # Sets maximum columns displayed in tables
pd.set_option('display.max_rows', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', -1)

# Variables for plot sizes
matplotlib.rc('font', size=20)          # controls default text sizes
matplotlib.rc('axes', titlesize=16)     # fontsize of the axes title
matplotlib.rc('axes', labelsize=18)    # fontsize of the x and y labels
matplotlib.rc('xtick', labelsize=18)    # fontsize of the tick labels
matplotlib.rc('ytick', labelsize=18)    # fontsize of the tick labels
matplotlib.rc('legend', fontsize=14)    # legend fontsize
matplotlib.rc('figure', titlesize=20)

####################################################
# Machine Learning Libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import fbeta_score, accuracy_score, f1_score, recall_score, precision_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer 
from sklearn.model_selection import KFold
from sklearn.metrics import roc_curve
from sklearn.metrics import roc_auc_score

#Pipeline
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Business Understanding 

## General

## Background


## Key Questions

## Dataset Description

Let's get an idea about the columns and find out what they mean:

In [21]:
df = pd.read_csv('data/f_chtr_churn_traintable_nf.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 161760 entries, 0 to 161759
Columns: 170 entries, Unnamed: 0 to date_x
dtypes: float64(31), int64(121), object(18)
memory usage: 209.8+ MB


In [22]:
df.head()

Unnamed: 0.1,Unnamed: 0,auftrag_new_id,liefer_beginn_evt,kanal,objekt_name,aboform_name,zahlung_rhythmus_name,lesedauer,rechnungsmonat,zahlung_weg_name,studentenabo,plz_1,plz_2,plz_3,ort,metropole,land_iso_code,shop_kauf,unterbrechung,anrede,titel,avg_churn,zon_che_opt_in,zon_sit_opt_in,zon_zp_grey,zon_premium,zon_boa,zon_kommentar,zon_sonstige,zon_zp_red,zon_rawr,zon_community,zon_app_sonstige,zon_schach,zon_blog_kommentare,zon_quiz,cnt_abo,cnt_abo_diezeit,cnt_abo_diezeit_digital,cnt_abo_magazin,cnt_umwandlungsstatus2_dkey,abo_registrierung_min,nl_zeitbrief,nl_zeitshop,nl_zeitverlag_hamburg,nl_fdz_organisch,nl_blacklist_sum,nl_bounced_sum,nl_aktivitaet,nl_registrierung_min,nl_sperrliste_sum,nl_opt_in_sum,boa_reg,che_reg,sit_reg,sso_reg,received_anzahl_1w,received_anzahl_1m,received_anzahl_3m,received_anzahl_6m,opened_anzahl_1w,opened_anzahl_1m,opened_anzahl_3m,openedanzahl_6m,clicked_anzahl_1w,clicked_anzahl_1m,clicked_anzahl_3m,clicked_anzahl_6m,unsubscribed_anzahl_1w,unsubscribed_anzahl_1m,unsubscribed_anzahl_3m,unsubscribed_anzahl_6m,openrate_1w,clickrate_1w,openrate_1m,clickrate_1m,openrate_3m,clickrate_3m,received_anzahl_bestandskunden_1w,received_anzahl_bestandskunden_1m,received_anzahl_bestandskunden_3m,received_anzahl_bestandskunden_6m,opened_anzahl_bestandskunden_1w,opened_anzahl_bestandskunden_1m,opened_anzahl_bestandskunden_3m,openedanzahl_bestandskunden_6m,clicked_anzahl_bestandskunden_1w,clicked_anzahl_bestandskunden_1m,clicked_anzahl_bestandskunden_3m,clicked_anzahl_bestandskunden_6m,unsubscribed_anzahl_bestandskunden_1w,unsubscribed_anzahl_bestandskunden_1m,unsubscribed_anzahl_bestandskunden_3m,unsubscribed_anzahl_bestandskunden_6m,openrate_bestandskunden_1w,clickrate_bestandskunden_1w,openrate_bestandskunden_1m,clickrate_bestandskunden_1m,openrate_bestandskunden_3m,clickrate_bestandskunden_3m,received_anzahl_produktnews_1w,received_anzahl_produktnews_1m,received_anzahl_produktnews_3m,received_anzahl_produktnews_6m,opened_anzahl_produktnews_1w,opened_anzahl_produktnews_1m,opened_anzahl_produktnews_3m,openedanzahl_produktnews_6m,clicked_anzahl_produktnews_1w,clicked_anzahl_produktnews_1m,clicked_anzahl_produktnews_3m,clicked_anzahl_produktnews_6m,unsubscribed_anzahl_produktnews_1w,unsubscribed_anzahl_produktnews_1m,unsubscribed_anzahl_produktnews_3m,unsubscribed_anzahl_produktnews_6m,openrate_produktnews_1w,clickrate_produktnews_1w,openrate_produktnews_1m,clickrate_produktnews_1m,openrate_produktnews_3m,clickrate_produktnews_3m,received_anzahl_hamburg_1w,received_anzahl_hamburg_1m,received_anzahl_hamburg_3m,received_anzahl_hamburg_6m,opened_anzahl_hamburg_1w,opened_anzahl_hamburg_1m,opened_anzahl_hamburg_3m,openedanzahl_hamburg_6m,clicked_anzahl_hamburg_1w,clicked_anzahl_hamburg_1m,clicked_anzahl_hamburg_3m,clicked_anzahl_hamburg_6m,unsubscribed_anzahl_hamburg_1w,unsubscribed_anzahl_hamburg_1m,unsubscribed_anzahl_hamburg_3m,unsubscribed_anzahl_hamburg_6m,openrate_hamburg_1w,clickrate_hamburg_1w,openrate_hamburg_1m,clickrate_hamburg_1m,openrate_hamburg_3m,clickrate_hamburg_3m,received_anzahl_zeitbrief_1w,received_anzahl_zeitbrief_1m,received_anzahl_zeitbrief_3m,received_anzahl_zeitbrief_6m,opened_anzahl_zeitbrief_1w,opened_anzahl_zeitbrief_1m,opened_anzahl_zeitbrief_3m,openedanzahl_zeitbrief_6m,clicked_anzahl_zeitbrief_1w,clicked_anzahl_zeitbrief_1m,clicked_anzahl_zeitbrief_3m,clicked_anzahl_zeitbrief_6m,unsubscribed_anzahl_zeitbrief_1w,unsubscribed_anzahl_zeitbrief_1m,unsubscribed_anzahl_zeitbrief_3m,unsubscribed_anzahl_zeitbrief_6m,openrate_zeitbrief_1w,clickrate_zeitbrief_1w,openrate_zeitbrief_1m,clickrate_zeitbrief_1m,openrate_zeitbrief_3m,clickrate_zeitbrief_3m,training_set,kuendigungs_eingangs_datum,churn,date_x
0,0,8F55996E-22DD-4450-808F-9F2410C65F0C,2018-08-29,E-Mailing,ZEIT Digital,Negative Option,vierteljährlich,19,0,Bankeinzug,0,3,37,372,Neu-Eichenberg,0,DE,0,0,Herr,kein Titel,0.412416,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,2,0,0,2,2015-03-12 12:33:54,0,0,0,0,0,0,10,2013-03-13 14:50:27,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,,0,2020-03-26 00:00:00
1,1,FDC16301-8457-4FCE-A630-172D027B4FA6,2018-08-29,andere,ZEIT Digital,Negative Option,halbjährlich,21,0,Rechnung,0,4,45,451,Essen,1,DE,0,0,Herr,kein Titel,0.373177,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2018-07-30 03:32:59,2,0,0,0,0,0,5,2018-07-29 21:36:44,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,2020-05-11,1,2020-05-11 00:00:00
2,2,1B81B3D5-5D6A-4AE7-9AF2-DB5BE537CD90,2018-10-31,Telefonmarketing,DIE ZEIT,Festabo,vierteljährlich,17,0,Bankeinzug,0,4,40,406,Düsseldorf,1,DE,2,1,Frau,kein Titel,0.472616,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,1986-07-01 12:00:00,0,2,0,0,0,0,4,2009-12-06 00:00:00,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,2020-03-18,1,2020-03-18 00:00:00
3,3,CF5D1C66-D1B5-454E-B3AD-541792B56EBF,2018-08-01,andere,ZEIT Digital,Festabo,vierteljährlich,17,0,Bankeinzug,0,8,82,825,Münsing,0,DE,0,0,Frau,kein Titel,0.472616,0,0,0,1,0,0,0,0,0,0,0,0,0,0,7,3,1,3,3,2012-01-04 14:41:07,2,0,0,0,0,0,7,2018-04-12 08:20:27,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,,0,2020-01-12 00:00:00
4,4,0CE98B1B-1167-4967-B68C-559AEBA1001B,2018-08-29,andere,ZEIT Digital,Negative Option,halbjährlich,11,0,Rechnung,0,7,70,701,Stuttgart,1,DE,0,0,Frau,kein Titel,0.534802,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2018-07-30 08:21:34,2,0,0,0,0,0,5,2018-07-26 06:32:30,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,1,,0,2019-07-11 00:00:00


In [None]:
df.tail()

In [23]:
df.drop("Unnamed: 0", axis=1,inplace=True)

In [24]:
columnlist = list(df.columns) 

In [25]:
columnlist

['auftrag_new_id',
 'liefer_beginn_evt',
 'kanal',
 'objekt_name',
 'aboform_name',
 'zahlung_rhythmus_name',
 'lesedauer',
 'rechnungsmonat',
 'zahlung_weg_name',
 'studentenabo',
 'plz_1',
 'plz_2',
 'plz_3',
 'ort',
 'metropole',
 'land_iso_code',
 'shop_kauf',
 'unterbrechung',
 'anrede',
 'titel',
 'avg_churn',
 'zon_che_opt_in',
 'zon_sit_opt_in',
 'zon_zp_grey',
 'zon_premium',
 'zon_boa',
 'zon_kommentar',
 'zon_sonstige',
 'zon_zp_red',
 'zon_rawr',
 'zon_community',
 'zon_app_sonstige',
 'zon_schach',
 'zon_blog_kommentare',
 'zon_quiz',
 'cnt_abo',
 'cnt_abo_diezeit',
 'cnt_abo_diezeit_digital',
 'cnt_abo_magazin',
 'cnt_umwandlungsstatus2_dkey',
 'abo_registrierung_min',
 'nl_zeitbrief',
 'nl_zeitshop',
 'nl_zeitverlag_hamburg',
 'nl_fdz_organisch',
 'nl_blacklist_sum',
 'nl_bounced_sum',
 'nl_aktivitaet',
 'nl_registrierung_min',
 'nl_sperrliste_sum',
 'nl_opt_in_sum',
 'boa_reg',
 'che_reg',
 'sit_reg',
 'sso_reg',
 'received_anzahl_1w',
 'received_anzahl_1m',
 'received_

In [27]:
df.describe().round(2)

Unnamed: 0,lesedauer,rechnungsmonat,studentenabo,metropole,shop_kauf,unterbrechung,avg_churn,zon_che_opt_in,zon_sit_opt_in,zon_zp_grey,zon_premium,zon_boa,zon_kommentar,zon_sonstige,zon_zp_red,zon_rawr,zon_community,zon_app_sonstige,zon_schach,zon_blog_kommentare,zon_quiz,cnt_abo,cnt_abo_diezeit,cnt_abo_diezeit_digital,cnt_abo_magazin,cnt_umwandlungsstatus2_dkey,nl_zeitbrief,nl_zeitshop,nl_zeitverlag_hamburg,nl_fdz_organisch,nl_blacklist_sum,nl_bounced_sum,nl_aktivitaet,nl_sperrliste_sum,nl_opt_in_sum,boa_reg,che_reg,sit_reg,sso_reg,received_anzahl_1w,received_anzahl_1m,received_anzahl_3m,received_anzahl_6m,opened_anzahl_1w,opened_anzahl_1m,opened_anzahl_3m,openedanzahl_6m,clicked_anzahl_1w,clicked_anzahl_1m,clicked_anzahl_3m,clicked_anzahl_6m,unsubscribed_anzahl_1w,unsubscribed_anzahl_1m,unsubscribed_anzahl_3m,unsubscribed_anzahl_6m,openrate_1w,clickrate_1w,openrate_1m,clickrate_1m,openrate_3m,clickrate_3m,received_anzahl_bestandskunden_1w,received_anzahl_bestandskunden_1m,received_anzahl_bestandskunden_3m,received_anzahl_bestandskunden_6m,opened_anzahl_bestandskunden_1w,opened_anzahl_bestandskunden_1m,opened_anzahl_bestandskunden_3m,openedanzahl_bestandskunden_6m,clicked_anzahl_bestandskunden_1w,clicked_anzahl_bestandskunden_1m,clicked_anzahl_bestandskunden_3m,clicked_anzahl_bestandskunden_6m,unsubscribed_anzahl_bestandskunden_1w,unsubscribed_anzahl_bestandskunden_1m,unsubscribed_anzahl_bestandskunden_3m,unsubscribed_anzahl_bestandskunden_6m,openrate_bestandskunden_1w,clickrate_bestandskunden_1w,openrate_bestandskunden_1m,clickrate_bestandskunden_1m,openrate_bestandskunden_3m,clickrate_bestandskunden_3m,received_anzahl_produktnews_1w,received_anzahl_produktnews_1m,received_anzahl_produktnews_3m,received_anzahl_produktnews_6m,opened_anzahl_produktnews_1w,opened_anzahl_produktnews_1m,opened_anzahl_produktnews_3m,openedanzahl_produktnews_6m,clicked_anzahl_produktnews_1w,clicked_anzahl_produktnews_1m,clicked_anzahl_produktnews_3m,clicked_anzahl_produktnews_6m,unsubscribed_anzahl_produktnews_1w,unsubscribed_anzahl_produktnews_1m,unsubscribed_anzahl_produktnews_3m,unsubscribed_anzahl_produktnews_6m,openrate_produktnews_1w,clickrate_produktnews_1w,openrate_produktnews_1m,clickrate_produktnews_1m,openrate_produktnews_3m,clickrate_produktnews_3m,received_anzahl_hamburg_1w,received_anzahl_hamburg_1m,received_anzahl_hamburg_3m,received_anzahl_hamburg_6m,opened_anzahl_hamburg_1w,opened_anzahl_hamburg_1m,opened_anzahl_hamburg_3m,openedanzahl_hamburg_6m,clicked_anzahl_hamburg_1w,clicked_anzahl_hamburg_1m,clicked_anzahl_hamburg_3m,clicked_anzahl_hamburg_6m,unsubscribed_anzahl_hamburg_1w,unsubscribed_anzahl_hamburg_1m,unsubscribed_anzahl_hamburg_3m,unsubscribed_anzahl_hamburg_6m,openrate_hamburg_1w,clickrate_hamburg_1w,openrate_hamburg_1m,clickrate_hamburg_1m,openrate_hamburg_3m,clickrate_hamburg_3m,received_anzahl_zeitbrief_1w,received_anzahl_zeitbrief_1m,received_anzahl_zeitbrief_3m,received_anzahl_zeitbrief_6m,opened_anzahl_zeitbrief_1w,opened_anzahl_zeitbrief_1m,opened_anzahl_zeitbrief_3m,openedanzahl_zeitbrief_6m,clicked_anzahl_zeitbrief_1w,clicked_anzahl_zeitbrief_1m,clicked_anzahl_zeitbrief_3m,clicked_anzahl_zeitbrief_6m,unsubscribed_anzahl_zeitbrief_1w,unsubscribed_anzahl_zeitbrief_1m,unsubscribed_anzahl_zeitbrief_3m,unsubscribed_anzahl_zeitbrief_6m,openrate_zeitbrief_1w,clickrate_zeitbrief_1w,openrate_zeitbrief_1m,clickrate_zeitbrief_1m,openrate_zeitbrief_3m,clickrate_zeitbrief_3m,training_set,churn
count,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0,161760.0
mean,31.5,0.1,0.11,0.29,0.46,0.1,0.32,0.0,0.0,0.16,0.16,0.01,0.0,0.05,0.04,0.0,0.0,0.0,0.0,0.0,0.0,7.11,4.63,0.5,0.59,2.76,1.37,0.29,0.12,0.0,0.05,0.13,6.65,0.08,0.27,0.07,0.09,0.07,0.68,1.86,7.15,20.44,40.4,0.61,2.3,6.47,12.59,0.06,0.23,0.64,1.2,0.0,0.0,0.01,0.02,0.11,0.02,0.12,0.03,0.13,0.05,0.03,0.08,0.18,0.2,0.01,0.03,0.07,0.08,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.03,0.0,0.02,0.07,0.16,0.17,0.01,0.04,0.08,0.08,0.0,0.01,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.04,0.01,0.13,0.51,1.47,2.89,0.06,0.23,0.64,1.26,0.01,0.04,0.11,0.2,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.23,0.89,2.66,5.4,0.09,0.36,1.05,2.1,0.01,0.04,0.1,0.2,0.0,0.0,0.0,0.01,0.08,0.01,0.08,0.01,0.08,0.02,1.0,0.31
std,21.43,0.3,0.31,0.45,2.21,0.3,0.15,0.05,0.03,0.46,0.46,0.1,0.07,0.26,0.25,0.02,0.01,0.06,0.02,0.03,0.02,166.69,96.44,11.09,10.63,71.74,0.88,0.67,0.45,0.02,0.44,0.6,4.36,1.48,0.62,0.26,0.28,0.26,0.47,3.7,13.9,38.84,75.53,2.07,7.29,19.97,38.36,0.38,1.2,3.04,5.62,0.07,0.07,0.14,0.19,0.3,0.12,0.27,0.15,0.27,0.16,0.16,0.29,0.63,0.73,0.1,0.18,0.36,0.4,0.03,0.06,0.09,0.09,0.02,0.02,0.03,0.03,0.1,0.03,0.18,0.06,0.16,0.06,0.15,0.38,0.62,0.66,0.11,0.26,0.42,0.44,0.04,0.1,0.16,0.16,0.01,0.01,0.02,0.02,0.09,0.04,0.15,0.06,0.19,0.09,0.86,3.27,9.35,18.3,0.54,1.99,5.57,10.8,0.17,0.57,1.53,2.84,0.01,0.01,0.02,0.02,0.1,0.04,0.09,0.04,0.09,0.05,0.5,1.82,5.28,10.5,0.34,1.08,2.98,5.82,0.1,0.26,0.62,1.15,0.03,0.03,0.05,0.07,0.29,0.09,0.24,0.1,0.23,0.11,0.0,0.46
min,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
25%,13.0,0.0,0.0,0.0,0.0,0.0,0.19,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
50%,26.0,0.0,0.0,0.0,0.0,0.0,0.29,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
75%,46.0,0.0,0.0,1.0,0.0,0.0,0.41,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,0.0,0.0,1.0,2.0,0.0,0.0,0.0,0.0,0.0,9.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,8.0,26.0,54.0,0.0,1.0,2.0,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0
max,88.0,1.0,1.0,1.0,153.0,1.0,0.7,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,7062.0,3676.0,503.0,426.0,3122.0,2.0,2.0,2.0,1.0,22.0,15.0,36.0,136.0,19.0,1.0,1.0,1.0,1.0,53.0,196.0,578.0,1133.0,89.0,219.0,566.0,1119.0,17.0,60.0,163.0,340.0,4.0,4.0,9.0,11.0,18.0,2.0,8.22,2.0,5.5,2.0,1.0,2.0,3.0,5.0,1.0,2.0,4.0,5.0,1.0,2.0,2.0,3.0,1.0,1.0,1.0,1.0,1.0,1.0,2.0,1.0,2.0,1.0,4.0,9.0,13.0,16.0,4.0,9.0,12.0,13.0,3.0,5.0,6.0,6.0,1.0,1.0,1.0,1.0,3.0,1.0,2.0,3.0,1.0,1.0,8.0,29.0,72.0,126.0,16.0,41.0,73.0,177.0,9.0,25.0,72.0,123.0,1.0,1.0,1.0,1.0,3.2,1.0,1.86,1.1,1.11,1.05,3.0,6.0,15.0,28.0,12.0,35.0,45.0,55.0,3.0,7.0,14.0,29.0,1.0,1.0,1.0,1.0,11.0,2.0,8.25,2.0,3.46,1.5,1.0,1.0


## Get an idea of the column names by sampling

In [None]:
df.sample(2)

## Target Metric


## Business Scenario

# Data Mining

# Data Cleaning
Purpose: Fix the inconsistencies within the data and handle the missing values

## Drop empty, useless and one value columns

## Handling missing

## Replace missing

## Conclusion

# Data Exploration - EDA


## Target Variable Analysis

## Feature Analysis

## Observation of histograms for distribution characteristics

## Correlogram for continuous variables - Heatmap

## Overal Skew and Kurtosis of the data

## Final Feature Selection List and Dropping of Features

## Statistical distribution of our target default

## Export the dataframe to .csv file

## Conclusion

# Feature Engineering

# Predictive Modelling

## Structure of the notebooks for machine learning

We used different jupyter notebook for different methods of machine learning. These notebooks are linked here:
    
- ...

- ...

- ...

- ...

# Data Visualization

The results and our main finding can be found in our presentation:

# Future Work

The aspects of future work can be found in our presentation. 