# Consumer Loan Case Study
Objective : Understand the driving factors (or driver variables) behind loan default thereby cutting down the amount of credit loss.

In [47]:
#Import libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [48]:
#Importing the loan.csv file in the dataframe for analysis.
loan_master = pd.read_csv("loan.csv", low_memory=False)

### Data Understanding & Data Cleaning
Analysis of the Dataframe. Count of rows and columns and column names in the dataframe.

In [49]:
#Number of rows and columns.
loan_master.shape

(39717, 111)

In [50]:
#Summary of the loan_master dataframe.
loan_master.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39717 entries, 0 to 39716
Columns: 111 entries, id to total_il_high_credit_limit
dtypes: float64(74), int64(13), object(24)
memory usage: 33.6+ MB


In [51]:
#Column names in the dataset.
pd.set_option('display.max_rows', 60)
pd.set_option('display.max_columns', 60)
loan_master.columns

Index(['id', 'member_id', 'loan_amnt', 'funded_amnt', 'funded_amnt_inv',
       'term', 'int_rate', 'installment', 'grade', 'sub_grade',
       ...
       'num_tl_90g_dpd_24m', 'num_tl_op_past_12m', 'pct_tl_nvr_dlq',
       'percent_bc_gt_75', 'pub_rec_bankruptcies', 'tax_liens',
       'tot_hi_cred_lim', 'total_bal_ex_mort', 'total_bc_limit',
       'total_il_high_credit_limit'],
      dtype='object', length=111)

In [10]:
#Checking unique values in Loan Status.
loan_master.loan_status.unique()

array(['Fully Paid', 'Charged Off', 'Current'], dtype=object)

In [11]:
#Subsetting Loan_Status = "Charged Off"for Risk Analytics.
loan_charged_off = loan_master.loc[loan_master['loan_status'] == 'Charged Off']

In [12]:
#Checking rows and columns of new filtered dataframe.
loan_charged_off.shape

(5627, 111)

In [14]:
#Checking the Missing values.
loan_charged_off.isnull().sum()

id                               0
member_id                        0
loan_amnt                        0
funded_amnt                      0
funded_amnt_inv                  0
                              ... 
tax_liens                        1
tot_hi_cred_lim               5627
total_bal_ex_mort             5627
total_bc_limit                5627
total_il_high_credit_limit    5627
Length: 111, dtype: int64

In [16]:
#Checking missing values percentages of the columns.
round(loan_charged_off.isnull().sum()/len(loan_charged_off.index), 2) * 100

id                              0.0
member_id                       0.0
loan_amnt                       0.0
funded_amnt                     0.0
funded_amnt_inv                 0.0
                              ...  
tax_liens                       0.0
tot_hi_cred_lim               100.0
total_bal_ex_mort             100.0
total_bc_limit                100.0
total_il_high_credit_limit    100.0
Length: 111, dtype: float64

In [18]:
#Removing cloumns with missing values greater than 60%.
missing_value_cols = loan_charged_off.columns[100*(loan_charged_off.isnull().sum()/len(loan_charged_off.index)) > 60]

In [20]:
#Dropping missing values columns greater than 60% missing values.
loan_charged_off = loan_charged_off.drop(missing_value_cols, axis=1)

In [24]:
#Checking the Missing values.
round(loan_charged_off.isnull().sum()/len(loan_charged_off.index), 2)*100

id                             0.0
member_id                      0.0
loan_amnt                      0.0
funded_amnt                    0.0
funded_amnt_inv                0.0
term                           0.0
int_rate                       0.0
installment                    0.0
grade                          0.0
sub_grade                      0.0
emp_title                      9.0
emp_length                     4.0
home_ownership                 0.0
annual_inc                     0.0
verification_status            0.0
issue_d                        0.0
loan_status                    0.0
pymnt_plan                     0.0
url                            0.0
desc                          32.0
purpose                        0.0
title                          0.0
zip_code                       0.0
addr_state                     0.0
dti                            0.0
delinq_2yrs                    0.0
earliest_cr_line               0.0
inq_last_6mths                 0.0
open_acc            

In [26]:
loan_charged_off.shape

(5627, 54)

In [27]:
#Checking the data to find columns which would not be required for data analysis.
loan_charged_off.head()

Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,last_credit_pull_d,collections_12_mths_ex_med,policy_code,application_type,acc_now_delinq,chargeoff_within_12_mths,delinq_amnt,pub_rec_bankruptcies,tax_liens
1,1077430,1314167,2500,2500,2500.0,60 months,15.27%,59.83,C,C4,Ryder,< 1 year,RENT,30000.0,Source Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/22/11 > I plan to use t...,car,bike,309xx,GA,1.0,0,Apr-99,5,3,0,1687,9.40%,4,f,0.0,0.0,1008.71,1008.71,456.46,435.17,0.0,117.08,1.11,Apr-13,119.66,Sep-13,0.0,1,INDIVIDUAL,0,0.0,0,0.0,0.0
8,1071795,1306957,5600,5600,5600.0,60 months,21.28%,152.39,F,F2,,4 years,OWN,40000.0,Source Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/21/11 > I own a small h...,small_business,Expand Business & Buy Debt Portfolio,958xx,CA,5.55,0,Apr-04,2,11,0,5210,32.60%,13,f,0.0,0.0,646.02,646.02,162.02,294.94,0.0,189.06,2.09,Apr-12,152.39,Aug-12,0.0,1,INDIVIDUAL,0,0.0,0,0.0,0.0
9,1071570,1306721,5375,5375,5350.0,60 months,12.69%,121.45,B,B5,Starbucks,< 1 year,RENT,15000.0,Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/16/11 > I'm trying to b...,other,Building my credit history.,774xx,TX,18.08,0,Sep-04,0,2,0,9279,36.50%,3,f,0.0,0.0,1476.19,1469.34,673.48,533.42,0.0,269.29,2.52,Nov-12,121.45,Mar-13,0.0,1,INDIVIDUAL,0,0.0,0,0.0,0.0
12,1064687,1298717,9000,9000,9000.0,36 months,13.49%,305.38,C,C1,Va. Dept of Conservation/Recreation,< 1 year,RENT,30000.0,Source Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/15/11 > Plan to pay off...,debt_consolidation,freedom,245xx,VA,10.08,0,Apr-04,1,4,0,10452,91.70%,9,f,0.0,0.0,2270.7,2270.7,1256.14,570.26,0.0,444.3,4.16,Jul-12,305.38,Nov-12,0.0,1,INDIVIDUAL,0,0.0,0,0.0,0.0
14,1069057,1303503,10000,10000,10000.0,36 months,10.65%,325.74,B,B2,SFMTA,3 years,RENT,100000.0,Source Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,,other,Other Loan,951xx,CA,7.06,0,May-91,2,14,0,11997,55.50%,29,f,0.0,0.0,7471.99,7471.99,5433.47,1393.42,0.0,645.1,6.3145,Oct-13,325.74,Mar-14,0.0,1,INDIVIDUAL,0,0.0,0,0.0,0.0


In [42]:
#Removing columns from dataframe, which are not helpful for analysis.
#Columns with only 0's and NA. - "total_rec_late_fee", "collections_12_mths_ex_med", "delinq_2yrs","pub_rec","out_prncp",
# "out_prncp_inv","acc_now_delinq","chargeoff_within_12_mths","delinq_amnt","pub_rec_bankruptcies","tax_liens"
loan_charged_off = loan_charged_off.drop(['total_rec_late_fee', 
                                         'collections_12_mths_ex_med', 
                                         'delinq_2yrs',
                                         'pub_rec',
                                         'out_prncp',
                                         'out_prncp_inv',
                                         'acc_now_delinq',
                                         'chargeoff_within_12_mths',
                                         'delinq_amnt',
                                         'pub_rec_bankruptcies',
                                         'tax_liens'], axis=1)

In [45]:
loan_charged_off = loan_charged_off.drop('member_id', axis=1)

In [46]:
loan_charged_off.head()

Unnamed: 0,id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,earliest_cr_line,inq_last_6mths,open_acc,revol_bal,revol_util,total_acc,initial_list_status,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,last_credit_pull_d,policy_code,application_type
1,1077430,2500,2500,2500.0,60 months,15.27%,59.83,C,C4,Ryder,< 1 year,RENT,30000.0,Source Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/22/11 > I plan to use t...,car,bike,309xx,GA,1.0,Apr-99,5,3,1687,9.40%,4,f,1008.71,1008.71,456.46,435.17,117.08,1.11,Apr-13,119.66,Sep-13,1,INDIVIDUAL
8,1071795,5600,5600,5600.0,60 months,21.28%,152.39,F,F2,,4 years,OWN,40000.0,Source Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/21/11 > I own a small h...,small_business,Expand Business & Buy Debt Portfolio,958xx,CA,5.55,Apr-04,2,11,5210,32.60%,13,f,646.02,646.02,162.02,294.94,189.06,2.09,Apr-12,152.39,Aug-12,1,INDIVIDUAL
9,1071570,5375,5375,5350.0,60 months,12.69%,121.45,B,B5,Starbucks,< 1 year,RENT,15000.0,Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/16/11 > I'm trying to b...,other,Building my credit history.,774xx,TX,18.08,Sep-04,0,2,9279,36.50%,3,f,1476.19,1469.34,673.48,533.42,269.29,2.52,Nov-12,121.45,Mar-13,1,INDIVIDUAL
12,1064687,9000,9000,9000.0,36 months,13.49%,305.38,C,C1,Va. Dept of Conservation/Recreation,< 1 year,RENT,30000.0,Source Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,Borrower added on 12/15/11 > Plan to pay off...,debt_consolidation,freedom,245xx,VA,10.08,Apr-04,1,4,10452,91.70%,9,f,2270.7,2270.7,1256.14,570.26,444.3,4.16,Jul-12,305.38,Nov-12,1,INDIVIDUAL
14,1069057,10000,10000,10000.0,36 months,10.65%,325.74,B,B2,SFMTA,3 years,RENT,100000.0,Source Verified,Dec-11,Charged Off,n,https://lendingclub.com/browse/loanDetail.acti...,,other,Other Loan,951xx,CA,7.06,May-91,2,14,11997,55.50%,29,f,7471.99,7471.99,5433.47,1393.42,645.1,6.3145,Oct-13,325.74,Mar-14,1,INDIVIDUAL


In [95]:
df=pd.DataFrame({'A':["a","b","a","c","a"]})
df['A'].value_counts()

a    3
c    1
b    1
Name: A, dtype: int64

In [96]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
