Author: Muhammad Zaki Fuadi

**Loan Data 2014-2017 Preparation**

**Project Background**: 

Perusahaan pemberi pinjaman perlu memprediksi risiko kredit dari pelanggan yang mengajukan pinjaman. Tujuannya adalah untuk mengurangi risiko kredit yang tidak terbayar, meningkatkan pengambilan keputusan kredit, dan meminimalkan kerugian perusahaan.
Untuk memprediksi risiko kredit, perusahaan perlu mempertimbangkan berbagai faktor, seperti riwayat kredit, kondisi keuangan, dan informasi pribadi pelanggan. Perusahaan juga perlu berkomunikasi dengan pemangku kepentingan, seperti tim manajemen, tim risiko, dan tim keuangan, untuk memahami kebutuhan dan perspektif mereka.
Kriteria keberhasilan proyek ini adalah peningkatan akurasi prediksi risiko kredit dan pengurangan risiko kredit yang tidak terbayar.

**Objective**:

1. Meningkatkan Akurasi Prediksi Risiko Kredit.
2. Mengurangi Risiko Kredit yang Tidak Terbayar.
3. Pengambilan Keputusan Kredit yang Lebih Baik.
4. Minimalkan Kerugian Perusahaan.

**Actions**:

1. Melakukan pembersihan data dan visualisasi untuk mendapatkan wawasan bisnis.
2. Membangun model dengan algoritma machine learning.
3. Memprediksi kemampuan pembayaran klien pada data aplikasi uji dengan menggunakan model machine learning terbaik.
4. Memberikan rekomendasi kepada perusahaan untuk meningkatkan keberhasilan klien dalam mengajukan kredit.

### Import Package

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import roc_curve, auc
import xgboost as xgb
from sklearn.metrics import log_loss
from sklearn.metrics import accuracy_score

from sklearn.preprocessing import MinMaxScaler
from imblearn.over_sampling import RandomOverSampler
from sklearn.preprocessing import StandardScaler

ModuleNotFoundError: No module named 'google.colab'

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

#### Ingest Data

In [3]:
df = pd.read_csv('data/loan_data_2007_2014.csv')

  df = pd.read_csv('data/loan_data_2007_2014.csv')


In [4]:
df_copy = df.copy()
df_copy.head()

Unnamed: 0.1,Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,last_credit_pull_d,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type,annual_inc_joint,dti_joint,verification_status_joint,acc_now_delinq,tot_coll_amt,tot_cur_bal,open_acc_6m,open_il_6m,open_il_12m,open_il_24m,mths_since_rcnt_il,total_bal_il,il_util,open_rv_12m,open_rv_24m,max_bal_bc,all_util,total_rev_hi_lim,inq_fi,total_cu_tl,inq_last_12m
0,0,1077501,1296599,5000,5000,4975.0,36 months,10.65,162.87,B,B2,,10+ years,RENT,24000.0,Verified,Dec-11,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/22/11 > I need to upgra...,credit_card,Computer,860xx,AZ,27.65,0.0,Jan-85,1.0,,,3.0,0.0,13648,83.7,9.0,f,0.0,0.0,5861.071414,5831.78,5000.0,861.07,0.0,0.0,0.0,Jan-15,171.62,,Jan-16,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,
1,1,1077430,1314167,2500,2500,2500.0,60 months,15.27,59.83,C,C4,Ryder,< 1 year,RENT,30000.0,Source Verified,Dec-11,Charged Off,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/22/11 > I plan to use t...,car,bike,309xx,GA,1.0,0.0,Apr-99,5.0,,,3.0,0.0,1687,9.4,4.0,f,0.0,0.0,1008.71,1008.71,456.46,435.17,0.0,117.08,1.11,Apr-13,119.66,,Sep-13,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,
2,2,1077175,1313524,2400,2400,2400.0,36 months,15.96,84.33,C,C5,,10+ years,RENT,12252.0,Not Verified,Dec-11,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,,small_business,real estate business,606xx,IL,8.72,0.0,Nov-01,2.0,,,2.0,0.0,2956,98.5,10.0,f,0.0,0.0,3003.653644,3003.65,2400.0,603.65,0.0,0.0,0.0,Jun-14,649.91,,Jan-16,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,
3,3,1076863,1277178,10000,10000,10000.0,36 months,13.49,339.31,C,C1,AIR RESOURCES BOARD,10+ years,RENT,49200.0,Source Verified,Dec-11,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/21/11 > to pay for prop...,other,personel,917xx,CA,20.0,0.0,Feb-96,1.0,35.0,,10.0,0.0,5598,21.0,37.0,f,0.0,0.0,12226.30221,12226.3,10000.0,2209.33,16.97,0.0,0.0,Jan-15,357.48,,Jan-15,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,
4,4,1075358,1311748,3000,3000,3000.0,60 months,12.69,67.79,B,B5,University Medical Group,1 year,RENT,80000.0,Source Verified,Dec-11,Current,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/21/11 > I plan on combi...,other,Personal,972xx,OR,17.94,0.0,Jan-96,0.0,38.0,,15.0,0.0,27783,53.9,38.0,f,766.9,766.9,3242.17,3242.17,2233.1,1009.07,0.0,0.0,0.0,Jan-16,67.79,Feb-16,Jan-16,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,


In [5]:
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 466285 entries, 0 to 466284
Data columns (total 75 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   Unnamed: 0                   466285 non-null  int64  
 1   id                           466285 non-null  int64  
 2   member_id                    466285 non-null  int64  
 3   loan_amnt                    466285 non-null  int64  
 4   funded_amnt                  466285 non-null  int64  
 5   funded_amnt_inv              466285 non-null  float64
 6   term                         466285 non-null  object 
 7   int_rate                     466285 non-null  float64
 8   installment                  466285 non-null  float64
 9   grade                        466285 non-null  object 
 10  sub_grade                    466285 non-null  object 
 11  emp_title                    438697 non-null  object 
 12  emp_length                   445277 non-null  object 
 13 

### Preprocessing Data

In [6]:
### Drop Kolom kosong
df_copy = df_copy.dropna(axis=1, how='all')
df_copy = df_copy.drop(['Unnamed: 0'], axis = 1)
df_copy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 466285 entries, 0 to 466284
Data columns (total 57 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   id                           466285 non-null  int64  
 1   member_id                    466285 non-null  int64  
 2   loan_amnt                    466285 non-null  int64  
 3   funded_amnt                  466285 non-null  int64  
 4   funded_amnt_inv              466285 non-null  float64
 5   term                         466285 non-null  object 
 6   int_rate                     466285 non-null  float64
 7   installment                  466285 non-null  float64
 8   grade                        466285 non-null  object 
 9   sub_grade                    466285 non-null  object 
 10  emp_title                    438697 non-null  object 
 11  emp_length                   445277 non-null  object 
 12  home_ownership               466285 non-null  object 
 13 

In [7]:
### Cek Missing Value
# Missing value
def describe_dataframe(df):
    listItem = []

    for col in df.columns:
        listItem.append([
            col,
            df[col].dtype,
            df[col].isnull().sum(),
            round((df[col].isnull().sum() / len(df[col])) * 100, 2),
            df[col].nunique(),
            list(df[col].drop_duplicates().values)
        ])

    df_desc = pd.DataFrame(
        columns=['Column', 'Dtype', 'null count', 'null perc.', 'unique count', 'unique sample'],
        data=listItem
    )
    
    return df_desc

# Total null values
total_null = df_copy.isnull().sum()
percent_missing = df_copy.isnull().sum() * 100/ len(df_copy)
dtypes = [df_copy[col].dtype for col in df_copy.columns]
df_missing_value = pd.DataFrame({'total_null': total_null,
                                'data_type': dtypes,
                                'percent_missing': percent_missing})
df_missing_value.sort_values('percent_missing', ascending = False,inplace = True)
missing_value = df_missing_value[df_missing_value['percent_missing']>0].reset_index()
describe_dataframe(df_copy)

Unnamed: 0,Column,Dtype,null count,null perc.,unique count,unique sample
0,id,int64,0,0.0,466285,"[1077501, 1077430, 1077175, 1076863, 1075358, ..."
1,member_id,int64,0,0.0,466285,"[1296599, 1314167, 1313524, 1277178, 1311748, ..."
2,loan_amnt,int64,0,0.0,1352,"[5000, 2500, 2400, 10000, 3000, 7000, 5600, 53..."
3,funded_amnt,int64,0,0.0,1354,"[5000, 2500, 2400, 10000, 3000, 7000, 5600, 53..."
4,funded_amnt_inv,float64,0,0.0,9854,"[4975.0, 2500.0, 2400.0, 10000.0, 3000.0, 5000..."
5,term,object,0,0.0,2,"[ 36 months, 60 months]"
6,int_rate,float64,0,0.0,506,"[10.65, 15.27, 15.96, 13.49, 12.69, 7.9, 18.64..."
7,installment,float64,0,0.0,55622,"[162.87, 59.83, 84.33, 339.31, 67.79, 156.46, ..."
8,grade,object,0,0.0,7,"[B, C, A, E, F, D, G]"
9,sub_grade,object,0,0.0,35,"[B2, C4, C5, C1, B5, A4, E1, F2, C3, B1, D1, A..."


In [8]:
# Drop Unnecesary coloumn
df_copy = df_copy.drop(['member_id','url','title','addr_state','zip_code','policy_code','application_type','emp_title'], axis = 1)

In [9]:
### Handling Missing Values
# Drop feature that have more than 50% missing value
col_full_null = df_missing_value.loc[df_missing_value['percent_missing']> 50].index.tolist()
df_copy.drop(columns=col_full_null, inplace = True)

# Feature `tot_coll_amt`,`tot_cur_bal`,`total_rev_hi_lim` mengganti missing value dengan "0" karena asumsi nasabah tidak meminjam lagi
for col in ['tot_coll_amt','tot_cur_bal','total_rev_hi_lim']:
    df_copy[col] = df_copy[col].fillna(0)
    
# Numerical columns replace missing value with "Median"
for col in df_copy.select_dtypes(exclude = 'object'):
    df_copy[col] = df_copy[col].fillna(df_copy[col].median())
    
def fill_null_with_mode(df, columns):
    for col in columns:
        mode_val = df[col].mode()[0]
        df[col].fillna(mode_val, inplace=True)
    return df

# Mengisi nilai null pada kolom berisi tanggal dengan mode
fill_null_with_mode(df_copy,['next_pymnt_d','last_credit_pull_d','last_pymnt_d','earliest_cr_line'])

describe_dataframe(df_copy)

Unnamed: 0,Column,Dtype,null count,null perc.,unique count,unique sample
0,id,int64,0,0.0,466285,"[1077501, 1077430, 1077175, 1076863, 1075358, ..."
1,loan_amnt,int64,0,0.0,1352,"[5000, 2500, 2400, 10000, 3000, 7000, 5600, 53..."
2,funded_amnt,int64,0,0.0,1354,"[5000, 2500, 2400, 10000, 3000, 7000, 5600, 53..."
3,funded_amnt_inv,float64,0,0.0,9854,"[4975.0, 2500.0, 2400.0, 10000.0, 3000.0, 5000..."
4,term,object,0,0.0,2,"[ 36 months, 60 months]"
5,int_rate,float64,0,0.0,506,"[10.65, 15.27, 15.96, 13.49, 12.69, 7.9, 18.64..."
6,installment,float64,0,0.0,55622,"[162.87, 59.83, 84.33, 339.31, 67.79, 156.46, ..."
7,grade,object,0,0.0,7,"[B, C, A, E, F, D, G]"
8,sub_grade,object,0,0.0,35,"[B2, C4, C5, C1, B5, A4, E1, F2, C3, B1, D1, A..."
9,emp_length,object,21008,4.51,11,"[10+ years, < 1 year, 1 year, 3 years, 8 years..."


In [10]:
# Feature Engineering Kolom Date
df_copy['earliest_cr_line'] = pd.to_datetime(df_copy['earliest_cr_line'], format = '%b-%y')
df_copy['last_credit_pull_d'] = pd.to_datetime(df_copy['last_credit_pull_d'], format = '%b-%y')
df_copy['last_pymnt_d'] = pd.to_datetime(df_copy['last_pymnt_d'], format = '%b-%y')
df_copy['issue_d'] = pd.to_datetime(df_copy['issue_d'], format = '%b-%y')
df_copy['next_pymnt_d'] = pd.to_datetime(df_copy['next_pymnt_d'], format = '%b-%y')

Adding Feature :<br>
* `pymnt_time` = Selisih bulan `next_pymnt_d` dan `last_pymnt_d`
* `credit_pull_year` = Selisih tahun `earliest_cr_line` dan `last_credit_pull_d`

In [11]:
def diff_month(d1, d2):
    return (d1.year - d2.year) * 12 + d1.month - d2.month
def diff_year(d1, d2):
    return (d1.year - d2.year)

df_copy['pymnt_time'] = df_copy.apply(lambda x: diff_month(x.next_pymnt_d, x.last_pymnt_d), axis=1)
df_copy['credit_pull_year'] = df_copy.apply(lambda x: diff_year(x.last_credit_pull_d, x.earliest_cr_line), axis=1)

df_copy[df_copy['pymnt_time']<0][['next_pymnt_d','last_pymnt_d','pymnt_time']]

Unnamed: 0,next_pymnt_d,last_pymnt_d,pymnt_time
40122,2011-02-01,2016-01-01,-59
40481,2010-10-01,2016-01-01,-63
40498,2010-10-01,2016-01-01,-63
40753,2010-06-01,2016-01-01,-67
40769,2010-06-01,2016-01-01,-67
40785,2010-06-01,2016-01-01,-67
40848,2010-05-01,2016-01-01,-68
40914,2010-04-01,2016-01-01,-69
40927,2010-04-01,2016-01-01,-69
41145,2009-11-01,2016-01-01,-74


In [12]:
df_copy[df_copy['credit_pull_year']<0][['earliest_cr_line','last_credit_pull_d','credit_pull_year']]

Unnamed: 0,earliest_cr_line,last_credit_pull_d,credit_pull_year
1580,2062-09-01,2013-09-01,-49
1770,2068-09-01,2015-09-01,-53
2799,2064-09-01,2016-01-01,-48
3282,2067-09-01,2015-05-01,-52
3359,2065-02-01,2014-11-01,-51
3413,2067-06-01,2013-04-01,-54
3607,2067-08-01,2014-04-01,-53
3989,2063-12-01,2014-11-01,-49
4440,2068-09-01,2016-01-01,-52
4449,2068-09-01,2015-10-01,-53


* There is a negative value in the `pymnt_time` feature, the value will be replaced with 0 because it assumes that the customer does not have a bill to make a payment

* There is an false input in the `earliest_cr_line` feature, resulting in a negative `credit_pull_year` value so that the line with that value will be replaced with the maximum value of the `credit_pull_year `feature

In [13]:
df_copy.head()

Unnamed: 0,id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,purpose,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,last_credit_pull_d,collections_12_mths_ex_med,acc_now_delinq,tot_coll_amt,tot_cur_bal,total_rev_hi_lim,pymnt_time,credit_pull_year
0,1077501,5000,5000,4975.0,36 months,10.65,162.87,B,B2,10+ years,RENT,24000.0,Verified,2011-12-01,Fully Paid,n,credit_card,27.65,0.0,1985-01-01,1.0,3.0,0.0,13648,83.7,9.0,f,0.0,0.0,5861.071414,5831.78,5000.0,861.07,0.0,0.0,0.0,2015-01-01,171.62,2016-02-01,2016-01-01,0.0,0.0,0.0,0.0,0.0,13,31
1,1077430,2500,2500,2500.0,60 months,15.27,59.83,C,C4,< 1 year,RENT,30000.0,Source Verified,2011-12-01,Charged Off,n,car,1.0,0.0,1999-04-01,5.0,3.0,0.0,1687,9.4,4.0,f,0.0,0.0,1008.71,1008.71,456.46,435.17,0.0,117.08,1.11,2013-04-01,119.66,2016-02-01,2013-09-01,0.0,0.0,0.0,0.0,0.0,34,14
2,1077175,2400,2400,2400.0,36 months,15.96,84.33,C,C5,10+ years,RENT,12252.0,Not Verified,2011-12-01,Fully Paid,n,small_business,8.72,0.0,2001-11-01,2.0,2.0,0.0,2956,98.5,10.0,f,0.0,0.0,3003.653644,3003.65,2400.0,603.65,0.0,0.0,0.0,2014-06-01,649.91,2016-02-01,2016-01-01,0.0,0.0,0.0,0.0,0.0,20,15
3,1076863,10000,10000,10000.0,36 months,13.49,339.31,C,C1,10+ years,RENT,49200.0,Source Verified,2011-12-01,Fully Paid,n,other,20.0,0.0,1996-02-01,1.0,10.0,0.0,5598,21.0,37.0,f,0.0,0.0,12226.30221,12226.3,10000.0,2209.33,16.97,0.0,0.0,2015-01-01,357.48,2016-02-01,2015-01-01,0.0,0.0,0.0,0.0,0.0,13,19
4,1075358,3000,3000,3000.0,60 months,12.69,67.79,B,B5,1 year,RENT,80000.0,Source Verified,2011-12-01,Current,n,other,17.94,0.0,1996-01-01,0.0,15.0,0.0,27783,53.9,38.0,f,766.9,766.9,3242.17,3242.17,2233.1,1009.07,0.0,0.0,0.0,2016-01-01,67.79,2016-02-01,2016-01-01,0.0,0.0,0.0,0.0,0.0,1,20


In [14]:
df_copy.loc[df_copy['pymnt_time'] < 0,'pymnt_time'] = 0
df_copy.loc[df_copy['credit_pull_year'] < 0,'credit_pull_year'] = df_copy['credit_pull_year'].max()
df_copy.drop(columns=['issue_d','earliest_cr_line','next_pymnt_d','last_pymnt_d','last_credit_pull_d','sub_grade'], inplace = True)

In [15]:
df_handling_emp_length = df_copy.copy()

In [16]:
df_handling_emp_length['emp_length'].unique()

array(['10+ years', '< 1 year', '1 year', '3 years', '8 years', '9 years',
       '4 years', '5 years', '6 years', '2 years', '7 years', nan],
      dtype=object)

In [17]:
mapping_tahun_angka = {
    '10+ years': 10,
    '< 1 year': 0,
    '1 year': 1,
    '2 years': 2,
    '3 years': 3,
    '4 years': 4,
    '5 years': 5,
    '6 years': 6,
    '7 years': 7,
    '8 years': 8,
    '9 years': 9
}

df_handling_emp_length["emp_length"] = df_handling_emp_length["emp_length"].map(mapping_tahun_angka)
df_handling_emp_length.head()

Unnamed: 0,id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,emp_length,home_ownership,annual_inc,verification_status,loan_status,pymnt_plan,purpose,dti,delinq_2yrs,inq_last_6mths,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_amnt,collections_12_mths_ex_med,acc_now_delinq,tot_coll_amt,tot_cur_bal,total_rev_hi_lim,pymnt_time,credit_pull_year
0,1077501,5000,5000,4975.0,36 months,10.65,162.87,B,10.0,RENT,24000.0,Verified,Fully Paid,n,credit_card,27.65,0.0,1.0,3.0,0.0,13648,83.7,9.0,f,0.0,0.0,5861.071414,5831.78,5000.0,861.07,0.0,0.0,0.0,171.62,0.0,0.0,0.0,0.0,0.0,13,31
1,1077430,2500,2500,2500.0,60 months,15.27,59.83,C,0.0,RENT,30000.0,Source Verified,Charged Off,n,car,1.0,0.0,5.0,3.0,0.0,1687,9.4,4.0,f,0.0,0.0,1008.71,1008.71,456.46,435.17,0.0,117.08,1.11,119.66,0.0,0.0,0.0,0.0,0.0,34,14
2,1077175,2400,2400,2400.0,36 months,15.96,84.33,C,10.0,RENT,12252.0,Not Verified,Fully Paid,n,small_business,8.72,0.0,2.0,2.0,0.0,2956,98.5,10.0,f,0.0,0.0,3003.653644,3003.65,2400.0,603.65,0.0,0.0,0.0,649.91,0.0,0.0,0.0,0.0,0.0,20,15
3,1076863,10000,10000,10000.0,36 months,13.49,339.31,C,10.0,RENT,49200.0,Source Verified,Fully Paid,n,other,20.0,0.0,1.0,10.0,0.0,5598,21.0,37.0,f,0.0,0.0,12226.30221,12226.3,10000.0,2209.33,16.97,0.0,0.0,357.48,0.0,0.0,0.0,0.0,0.0,13,19
4,1075358,3000,3000,3000.0,60 months,12.69,67.79,B,1.0,RENT,80000.0,Source Verified,Current,n,other,17.94,0.0,0.0,15.0,0.0,27783,53.9,38.0,f,766.9,766.9,3242.17,3242.17,2233.1,1009.07,0.0,0.0,0.0,67.79,0.0,0.0,0.0,0.0,0.0,1,20


In [18]:
from sklearn.impute import KNNImputer

# Buat imputer KNN
imputer = KNNImputer(n_neighbors=11)
kolom_imputasi = ['emp_length']
df_handling_emp_length[kolom_imputasi] = imputer.fit_transform(df_handling_emp_length[kolom_imputasi])
df_handling_emp_length['emp_length'] = df_handling_emp_length['emp_length'].round().astype(int)
df_handling_emp_length['emp_length'].unique()

array([10,  0,  1,  3,  8,  9,  4,  5,  6,  2,  7])

In [19]:
good_loan = ['Current','Fully Paid','In Grace Period']
df_handling_emp_length['loan_status'] = np.where(df_handling_emp_length['loan_status'].isin(good_loan),1,0)

In [20]:
# Save clean data to csv
path_file = 'data/loan_cleaned.csv'
df_handling_emp_length.to_csv(path_file, index=False)