## Analyzing borrowers’ risk of defaulting

Your project is to prepare a report for a bank’s loan division. You’ll need to find out if a customer’s marital status and number of children has an impact on whether they will default on a loan. The bank already has some data on customers’ credit worthiness.

Your report will be considered when building a **credit scoring** of a potential customer. A ** credit scoring ** is used to evaluate the ability of a potential borrower to repay their loan.

### Step 1. Open the data file and have a look at the general information. 

In [None]:
import pandas as pd
credit_scoring = pd.read_csv('credit_scoring_eng.csv')

In [2]:
credit_scoring.info() # check for missing values in each column or variable

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21525 entries, 0 to 21524
Data columns (total 12 columns):
children            21525 non-null int64
days_employed       19351 non-null float64
dob_years           21525 non-null int64
education           21525 non-null object
education_id        21525 non-null int64
family_status       21525 non-null object
family_status_id    21525 non-null int64
gender              21525 non-null object
income_type         21525 non-null object
debt                21525 non-null int64
total_income        19351 non-null float64
purpose             21525 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 2.0+ MB


In [3]:
credit_scoring['income_type'].value_counts()

employee                       11119
business                        5085
retiree                         3856
civil servant                   1459
unemployed                         2
entrepreneur                       2
student                            1
paternity / maternity leave        1
Name: income_type, dtype: int64

In [4]:
credit_scoring['family_status'].value_counts()

married              12380
civil partnership     4177
unmarried             2813
divorced              1195
widow / widower        960
Name: family_status, dtype: int64

In [5]:
credit_scoring.head(10)

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
0,1,-8437.673028,42,bachelor's degree,0,married,0,F,employee,0,40620.102,purchase of the house
1,1,-4024.803754,36,secondary education,1,married,0,F,employee,0,17932.802,car purchase
2,0,-5623.42261,33,Secondary Education,1,married,0,M,employee,0,23341.752,purchase of the house
3,3,-4124.747207,32,secondary education,1,married,0,M,employee,0,42820.568,supplementary education
4,0,340266.072047,53,secondary education,1,civil partnership,1,F,retiree,0,25378.572,to have a wedding
5,0,-926.185831,27,bachelor's degree,0,civil partnership,1,M,business,0,40922.17,purchase of the house
6,0,-2879.202052,43,bachelor's degree,0,married,0,F,business,0,38484.156,housing transactions
7,0,-152.779569,50,SECONDARY EDUCATION,1,married,0,M,employee,0,21731.829,education
8,2,-6929.865299,35,BACHELOR'S DEGREE,0,civil partnership,1,F,employee,0,15337.093,having a wedding
9,0,-2188.756445,41,secondary education,1,married,0,M,employee,0,23108.15,purchase of the house for my family


 Conclusion

From credit_scoring 2174 entries for days_employed and total_income columns are NaN(null) which means that without the number of days_worked the total_income of customers cannot be determined. Also days_employed variable has negative values which is not possible to have negative values for that variable this can be attributed to the floating point datatype.

### Step 2. Data preprocessing

### Processing missing values

In [6]:
# checking all missing values in the dataset
data = credit_scoring
data[data['days_employed'].isnull()].head()

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
12,0,,65,secondary education,1,civil partnership,1,M,retiree,0,,to have a wedding
26,0,,41,secondary education,1,married,0,M,civil servant,0,,education
29,0,,63,secondary education,1,unmarried,4,F,retiree,0,,building a real estate
41,0,,50,secondary education,1,married,0,F,civil servant,0,,second-hand car purchase
55,0,,54,secondary education,1,civil partnership,1,F,retiree,1,,to have a wedding


In [7]:
data['days_employed'] = data['days_employed'].abs() # take absolute values of days_employed variable 


In [8]:
data['days_employed'] = data['days_employed'].fillna(0) # Fill all null values in days_employed column with 0

In [9]:
income_level_value = data['income_type'].unique() # store all unique values of income type in a list

In [10]:
median_age = data.groupby('income_type').agg({'dob_years':['median']}) # find the median age for all income levels
age = median_age['dob_years']['median']

# Fill  all ages with zero(0) values  of income levels with their median age
for levels in income_level_value:
    data.loc[(data['income_type']==levels )& (data['dob_years']==0),'dob_years']= age[levels]



In [12]:
# find the average income for all persons belonging to a particular income level
mean_income_level = data.groupby('income_type').agg({'total_income':['mean']}) 
income_level = mean_income_level['total_income']['mean']

# fill all null income variables with their respective average incomes from all levels
for var in income_level_value:
    data.loc[(data['income_type'] == var) & (data['total_income'].isnull()),'total_income'] = income_level[var]


In [14]:
#find average  number of days employed for each income level
median_days_worked = data.groupby('income_type').agg({'days_employed':['mean']})
days_worked = median_days_worked['days_employed']['mean']


# fill all days_employed values of zero(0) with the mean of each income levels 
for days in income_level_value:
    data.loc[(data['income_type'] == days) & (data['days_employed']==0.0),'days_employed'] = days_worked[days]




In [15]:
# change all education values to lower case 
data['education'] = data['education'].str.lower()

In [16]:
data['children'] = data['children'].abs() #take the absolute of the children column

In [17]:
# Take the absolute value of days_employed
data['days_employed'] = abs(data['days_employed'])

In [18]:
# Round the number of days employed to the nearest integer to get the number of days worked 
data['days_employed'] = round(data['days_employed'])

In [19]:
# Change number of days to number of months
data['months_employed'] = round(data['days_employed']/30).astype('int16')

### Conclusion

    In days_employed column all NaN values was replaced with zeros(0).The absolute value was then taken which 
    turns all negatives to positives.The values then rounded to get the nearest integer which then gives the number
    of days employed for all income levels.In addition to this,the average number of days_employed for each income 
    level was calculated. The mean for each income level was used to replace each zero(0) value for each income level
    respectively.Out of this, months_employed column was created which can be use as an indicator for awarding loans to 
    customers based on the years of experience or the number of years worked in the current company.
    
     Secondly the median age of each income level was calculated and used as the value for each dob_years and its 
     corresponding income level.Also, average total income was calculated based on each income level and the value 
     used to replace NaN values 

In [23]:
# Describe features
data['days_employed'].describe()

count     21525.000000
mean      66582.933333
std      137868.492155
min          24.000000
25%        1026.000000
50%        2095.000000
75%        5347.000000
max      401755.000000
Name: days_employed, dtype: float64

### Data type replacement

In [24]:
# Concert int64 data type to int16 datatype for all variables that stores smallinteger values
data['children'] = data['children'].astype('int16')
data['dob_years'] = data['dob_years'].astype('int16')  # convert from floating point datatype to integer datatype
data['education_id'] = data['education_id'].astype('int16')
data['family_status_id'] = data['family_status_id'].astype('int16')
data['debt'] = data['debt'].astype('int16')
data['children'] = data['children'].astype('int16')  # convert from floating point datatype to integer datatype

# total_income rounded off to a precision of 2 decimal places
data['total_income'] = round(data['total_income'])


In [25]:
data['total_income'].min()

3307.0

### Conclusion

    Changed all variables storing smaller integer values to int16 and also changed floating point datatype of  
    dob_years to int16. In addition to this floating point value of total_income was rounded off to the 1  decimal 

### Processing duplicates

In [26]:
# check for duplicates in the dataset
data.duplicated().sum() 

71

In [27]:
# seek to drop all duplicates appearing in all fields of the dataset
data.drop_duplicates(inplace=True)

### Conclusion

used drop_duplicate function to drop all duplicated entries in the data

### Categorizing Data

In [28]:
def purpose_new(value):
    if 'wedding' in value:
        return 'Prepare for wedding'
    elif 'car' in value:
        return 'purchase a car'
    elif 'educat'  in value or 'university' in value:
        return 'getting education'
    elif 'propert' in value:
        return 'build property'
    elif 'real estate' in value:
        return 'real estate'
    elif 'hous' in value:
        return 'purchase a house'
    else:
        return value

In [29]:
data['purpose_group'] = data['purpose'].apply(purpose_new)

In [30]:
data['purpose_group'].value_counts()

real estate            4464
purchase a car         4306
getting education      4013
purchase a house       3809
build property         2538
Prepare for wedding    2324
Name: purpose_group, dtype: int64

In [31]:
data['income_distribution'] = pd.qcut(data['total_income'], q=6,duplicates = 'drop' )

In [32]:
data['income_distribution'].value_counts()

(36620.5, 362497.0]    3576
(24302.0, 28289.0]     3576
(14743.5, 19653.0]     3576
(3306.999, 14743.5]    3576
(28289.0, 36620.5]     3575
(19653.0, 24302.0]     3575
Name: income_distribution, dtype: int64

 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.qcut.html
 


### Conclusion

    The purpose column which describes the purpose of applying for a loan is categorized into various descriptions 
    such as "prepare for wedding","purchase a car","property","getting education","real estate" and "purchase a house".
    Also the total income column categorized into various categories of total income received per month by each borrower
    starting from '3306.99 - 14743.5','14745.5 - 19653.0','19653.0 - 24302.0','24302.0 - 28289.0','28289.0 - 36620.5',
    '366205.5 - 362497.0'

### Step 3. Answer these questions

- Is there a relation between having kids and repaying a loan on time?

In [39]:
kids_loan_relation = pd.pivot_table(data = data , index =['children'],columns='debt',values='income_type', aggfunc='count')

In [40]:
kids_loan_relation=kids_loan_relation.dropna()
kids_loan_relation['total_per_category'] = kids_loan_relation[0] + kids_loan_relation[1]
kids_loan_relation['percent'] =  round(kids_loan_relation[1]/kids_loan_relation['total_per_category'] *100,1)
kids_loan_relation

debt,0,1,total_per_category,percent
children,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,13028.0,1063.0,14091.0,7.5
1,4410.0,445.0,4855.0,9.2
2,1858.0,194.0,2052.0,9.5
3,303.0,27.0,330.0,8.2
4,37.0,4.0,41.0,9.8
20,68.0,8.0,76.0,10.5


### Conclusion

    As we take a look at "0 - category" there are 14091 borrowers without kids and only 1063 have debt which means 
    that 7.5% of the group are bad debtors."1- category" there are a total of 4855 borrowers with a single kid and 
    out of the total, 445 which makes a percentage of 9.2% are bad debtors.330 borrowers have 3 kids and 27 are bad 
    debtors making a percentage of 8.2%.In addition to this 41 borrowers have 4 children out of the total 4 have debt 
    which means 9.8% are bad debtors.Last but not least 76 borrowers are with 20 kids and only 8 have debt this means 
    that 10.5% are bad debtors.
    
    General statistics whose that as the number of children for each debtor increases the likelihood of bad debtors 
    also increases.Hence the greater the number of children of borrowers the greater the risk of increase in bad debtors

- Is there a relation between marital status and repaying a loan on time?

In [41]:
marital_loan_relation = pd.pivot_table(data = data, index =['family_status'],columns='debt',values='income_type', aggfunc='count')
marital_loan_relation['total_category'] = marital_loan_relation[0] + marital_loan_relation[1]
marital_loan_relation['Percentage'] = round( marital_loan_relation[1]/marital_loan_relation['total_category'] *100,1)
marital_loan_relation

debt,0,1,total_category,Percentage
family_status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
civil partnership,3763,388,4151,9.3
divorced,1110,85,1195,7.1
married,11408,931,12339,7.5
unmarried,2536,274,2810,9.8
widow / widower,896,63,959,6.6


### Conclusion

    From the statistics available borrowers with civil partnership marrital status are 4151 in number out of that 388
    are bad debtors which comprises of 9.3%.1195 category of borrowers are divorced out of those 85 are bad debtors 
    which makes 7.1%.Also 12339 borrowers are married and out of that total 931 are bad debtors which is 7.5%.2810 
    borrowers have a marrital status as married but 274 are bad debtors which makes 9.8%.Finally, a total of 959 
    borrowers are  either widow/widower and out of the number 63 are bad debtors which is 6.6% of the total.
    
    It can be concluded from the statistics available that a category of unmarried borrowers have the highest likelihood
    to default on their loan compared to the rest and also widow/widower category are the lowest category of borrowers 
    to default on their loans

- Is there a relation between income level and repaying a loan on time?

In [45]:
income_loan_relation = pd.pivot_table(data = data, index =['income_distribution'],columns='debt',values='income_type', aggfunc='count',fill_value=0)
income_loan_relation['total_category'] = income_loan_relation[0] + income_loan_relation[1]
income_loan_relation['percentage'] = round(income_loan_relation[1]/income_loan_relation['total_category'] * 100,1)
income_loan_relation

debt,0,1,total_category,percentage
income_distribution,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
"(3306.999, 14743.5]",3291,285,3576,8.0
"(14743.5, 19653.0]",3280,296,3576,8.3
"(19653.0, 24302.0]",3254,321,3575,9.0
"(24302.0, 28289.0]",3263,313,3576,8.8
"(28289.0, 36620.5]",3299,276,3575,7.7
"(36620.5, 362497.0]",3326,250,3576,7.0


### Conclusion

    From the information available borrowers of income category between "3306.999 - 14743.5" is made up of  3576 
    in total borrowers and out of this number 285 of them are bad debtors which is a percentage 8.0%.Borrowers whose 
    income falls between "14743.5 - 19653.0" are also 3576 in number and out of that 296 are bad debtors making a 
    percentage of 8.3%.Also borrowers who falls between income level of "19653.0 - 24302.0" are 3575 in total and out 
    of that 321 are  bad debtors which makes 9.0%.In addition to this borrowers that falls within this income level 
    category "24302.0 - 28289.0" are 3576 in total and out of that number 313 are bad debtors which is a percentage of
    8.80%. Borrowers  with income levels within "28289.0, 36620.5" are 3576 in total, out of the total  313 are bad 
    debtors which represent a percentage 7.7%.Also with category between "36620.5 - 362497.0" are 3576 in total out of 
    which 250 are bad debtors representing 7.0%.
    
    It can be deduced that borrowers with income level between "19653.0 - 24302.0" have the highest percentage of 
    borrowers defaulting on a loan whiles borrowers that falls within the income group "36620.5 - 362497.0" are the 
    least borrowers to default on a loan. 

- How do different loan purposes affect on-time repayment of the loan?

In [46]:
purpose_loan_relation = pd.pivot_table(data = data, index =['purpose_group'],columns='debt',values='income_type', aggfunc='count')
purpose_loan_relation['total_category'] = purpose_loan_relation[0] + purpose_loan_relation[1]
purpose_loan_relation['percentage'] = round(purpose_loan_relation[1]/purpose_loan_relation['total_category'] * 100,1)
purpose_loan_relation

debt,0,1,total_category,percentage
purpose_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Prepare for wedding,2138,186,2324,8.0
build property,2348,190,2538,7.5
getting education,3643,370,4013,9.2
purchase a car,3903,403,4306,9.4
purchase a house,3553,256,3809,6.7
real estate,4128,336,4464,7.5


In [47]:
data_deliq = data[data['debt'] ==1]

In [48]:
# Find the loan deliquency of bad debtors
data_deliq_count =data_deliq.groupby('income_type').agg({'income_distribution':['count']})
data_deliq_count['percentage'] = round(data_deliq_count['income_distribution']/data['income_type'].count().sum() * 100,1)
data_deliq_count
#data['income_type'].count().sum()


Unnamed: 0_level_0,income_distribution,percentage
Unnamed: 0_level_1,count,Unnamed: 2_level_1
income_type,Unnamed: 1_level_2,Unnamed: 2_level_2
business,376,1.8
civil servant,86,0.4
employee,1061,4.9
paternity / maternity leave,1,0.0
retiree,216,1.0
unemployed,1,0.0


### Conclusion

    The category of purpose for applying for a loan  that is "Prepare for wedding" has  2324 as the total number of 
    borrowers out of which 186 are bad debtors which represent 8.0%.The second category of purpose is to "build property" 
    which has 2538 as total number of borrowers  with 190 borrowers as bad debtors which also represent a percentage 
    of 7.2%.In addition to this another category of purpose which is "getting education" which has 4013 as the total 
    number of borrowers with 370 being bad debtors which represent 9.2%.The "purchase of car" category also have 4306 
    as the total number of borrowers with reasons for the loan and out of that number 403 are bad debtors as well.
    The number of brrowers who gave "Purchase a house" as purpose for the loan are 3809 whiles 256 out of this number 
    are bad debtors representing 6.7%.Last but not least 4464  and 2538 borrowers gave "real estate"  and "property" 
    category as the purpose for the loan and out of that  336 and 190 are bad debtors representing 7.5% respectively.
    It can be deduced that many borrowers who gave reasons  to "purchase a car" are the highest number of debtors who 
    defaulted on their loans whiles  borrowers who gave reasons as "build property" and "real estate" are the least 
    likely to default on their loan   

### Step 4. General conclusion

    The main objective of this report is to determine the impact a customer's marital status and number of children has on defaulting a loan.A data on customers previous credit worthiness was provided.The given data comprises of  21525 entries of customers. Information on each customer such as number of children,number of children,number of days employed,ages of each customer,educational status,family status, gender,the income type, debt which explains whether a customer has ever defaulted a loan before or not, the total income per month and last but not least the purpose or reason for applying for the loan.

    In addition to this,upon careful scrutiny of the data it was clear that some customers information such as the number of days employed and total income per month of 2174 customers were null or missing out 21525 customers.The missing or  NaN(null) entries means that without the number of days worked the total income of those customers affected cannot be determined. Also number days employed feature of customers had negative values which is not possible because it is impossible to have negative days worked. This can be  attributed to the floating point datatype when data was imported.
    
**Preprocessing**
    
    In order to arrive at a good analysis, the data available needs to be preprocessed eliminating missing(null) entries of customers information.To begin with, all missing entries of number of days employed feature was replaced with zeros(0) and it's absolute value was taken which turns all negatives to positives.The values then rounded to get the nearest integer which then gives the number of days employed for all income levels.In addition to this,the average number of days employed for each income level was calculated. The mean for each income level was used to replace each zero(0) value for each income level respectively.Out of this, months employed feature was created which can be use as an indicator for awarding loans to customers based on the years of experience or the number of years worked.

     Secondly the median age of each income level was calculated and used as the value for each age feature and its corresponding income level.Also, average total income was calculated based on each income level and the value used to replace missing  (null) values.
     Also duplicates found in the dataset was droped this reduced customer data entries by 71. This means that there was  a   repeatition of 71 entries of customer information in the dataset. Hence the dataset was reduced to 21454 entries.
     
     Furthermore,both purpose for a loan and total income feature was categorised into various categories which are listed  bellow.
     
   __Purpose for a loan category__
    * prepare for wedding
    * purchase a car
    * build property
    * getting education 
    * real estate
    * purchase a house
    
   __total income category__
    * 3306.999 - 14743.5
    * 14743.5  - 19653.0
    * 19653.0  - 24302.0
    * 24302.0  - 28289.0
    * 28289.0  - 36620.5
    * 36620.5  - 362497.0
    
    
    
       The group of customers with the highest rate of loan delinquency are customers with employee income type with 1061 making a percentage of 4.9%.
       
       There was the interest to check the relationships that exist among certain features in order to conclude on the objective of the analysis.
       
* Whether there exist a relation between having kids and repaying a loan on time?

        From the table below, at "0 - category" there are 14091 borrowers without kids and only 1063 have debt which means that 7.5% of the group are bad debtors."1- category" there are a total of 4855 borrowers with a single kid and out of the total, 445 which makes a percentage of 9.2% are bad debtors.330 borrowers have 3 kids and 27 are bad debtors making a percentage of 8.2%.In addition to this 41 borrowers have 4 children out of the total 4 have debt which means 9.8% are bad debtors.Last but not least 76 borrowers are with 20 kids and only 8 have debt this means that 10.5% are bad debtors.
      Out of this we can conclude that as the number of children for each debtor increases so as the number of customers who default on loans increases.Hence the greater the number of children of borrowers the greater the risk of increase in bad debtors.
        
    
   
   | Number of Children|Pay loan on time|default loan| total debtors per category |Percent|
   | --- | --- | --- |---|---|
   | 0 | 13028 | 1063 |14091|7.5|
   | 1 | 4410 | 445 |4855|9.2|
   | 2 | 1858 | 194 |2052|9.5|
   | 3 | 303 | 27 |330|8.2|
   | 4 | 37 | 4 |41|9.8|
   | 20 | 68 | 8 |76|10.5|
   
   
* Whether there is a relation between marital status and repaying a loan on time?

      In reference to the table below  borrowers with civil partnership as marrital status are 4151 in number out of that 388 are bad debtors which comprises of 9.3%.1195 category of borrowers are divorced out of those 85 are bad debtors which makes 7.1%.Also 12339 borrowers are married and out of that total 931 are bad debtors which is 7.5%.2810 borrowers have a marrital status as married but 274 are bad debtors which makes 9.8%.Finally, a total of 959 borrowers are  either widow/widower and out of the number 63 are bad debtors which is 6.6% of the total.It can be concluded from the statistics available that a category of unmarried borrowers have the highest likelihood to default on their loan compared to the rest and also widow/widower category are the lowest category of borrowers to default on their loans.This can be attributed to the fact that most of umarried customers borrow and spend lavishly on cars which generates no income.
        
        
        
  
   

   |family status|Pay loan on time|default on loan|total debtors per family status|Percentage|
   | --- | --- | --- |--- | --- |			
   |civil partnership|3763|388|4151|9.3|
   |divorced|1110|	85|	1195|7.1|
   |married|11408|	931|12339|7.5|
   |unmarried|2536|274	|2810|9.8|
   |widow / widower|896|63|959|6.6|
   

* Is there a relation between income level and repaying a loan on time?
   
       From the information available from the table below,From the information available borrowers of income category between "3306.999 - 14743.5" is made up of  3576 in total  and out of this number 285 of them are bad debtors which is a percentage 8.0%.Borrowers whose income falls between "14743.5 - 19653.0" are also 3576 in number and out of that 296 are bad debtors making a percentage of 8.3%.Also borrowers who falls between income level of "19653.0 - 24302.0" are 3575 in total and out of that 321 are  bad debtors which makes 9.0%.In addition to this borrowers that falls within this income level category "24302.0 - 28289.0" are 3576 in total and out of that number 313 are bad debtors which is a percentage of 8.80%. Borrowers  with income levels within "28289.0, 36620.5" are 3576 in total, out of the total  313 are bad debtors which represent a percentage 7.7%.Also with category between "36620.5 - 362497.0" are 3576 in total out of which are 250 bad debtors which represent 7.0%.
       
       It can be deduced that borrowers with income level between "19653.0 - 24302.0" have the highest percentage of borrowers defaulting on a loan whiles borrowers that falls within the income group "36620.5 - 362497.0" are the least borrowers to default on a loan. 
       
       
        
        
        
  |distribution of income| Pay loans on time|Default on loans| total income category|percentage|
  |---|---|---|---|---|
  |3306.999 - 14743.5|	3291|	285	|3576|	8.0|
  |14743.5 - 19653.0|	3280|	296|	3576|	8.3|
  |19653.0 - 24302.0|	3254|	321|	3575|	9.0|
  |24302.0 - 28289.0|	3263|	313|	3576|	8.8|
  |28289.0 - 36620.5|	3299|	276|    3575|	7.7|
  |36620.5 - 362497.0|	3326|	250|	3576|	7.0|

   
* How do different loan purposes affect on-time repayment of the loan?
    
        From the table below the category of purpose for applying for a loan  that is "Prepare for wedding" has  2324 as the total number of borrowers out of which 186 are bad debtors which represent 8.0%.The second category of purpose is to "build property" which has 2538 as total number of borrowers  with 190 borrowers as bad debtors which also represent a percentage of 7.2%.In addition to this another category of purpose which is "getting education" which has 4013 as the total number of borrowers with 370 being bad debtors which represent 9.2%.The "purchase of car" category also have 4306 as the total number of borrowers with reasons for the loan and out of that number 403 are bad debtors as well.The number of brrowers who gave "Purchase a house" as purpose for the loan are 3809 whiles 256 out of this number are bad debtors representing 6.7%.Last but not least 4464  and 2538 borrowers gave "real estate"  and "property" category as the purpose for the loan and out of that  336 and 190 are bad debtors representing 7.5% respectively.
        It can be deduced that many borrowers who gave reasons  to "purchase a car" are the highest number of debtors who defaulted on their loans whiles  borrowers who gave reasons as "build property" and "real estate" are the least likely to default on their loan.This can be concluded that since investment in property building and real estate  pays for itself most borrowers are able to offset their loans on time compared to borrowing to purchase a car.  
        
        
        
   |purpose of applying for loan|Pay loan on time|Default on loan|Total purpose category|percentage|
   |---|---|---|---|---|			
   |Prepare for wedding|	2138|	186|	2324|	8.0|
   |build property|	2348|	190|	2538|	7.5|
   |getting education|	3643|	370|	4013|	9.2|
   |purchase a car|	3903|	403|	4306|	9.4|
   |purchase a house|	3553|	256|	3809|	6.7|
   |real estate|	4128|	336|	4464|	7.5|
   
In conclusion it was found that  customers with more children default on their loans as well as unmarried customers also default on their loans as well as customers who borrow to purchase cars.Hence it can be concluded that umarried borrowers with more children will default on their loan.
