## Analyzing borrowers’ risk of defaulting

Your project is to prepare a report for a bank’s loan division. You’ll need to find out if a customer’s marital status and number of children has an impact on whether they will default on a loan. The bank already has some data on customers’ credit worthiness.

Your report will be considered when building a **credit scoring** of a potential customer. A ** credit scoring ** is used to evaluate the ability of a potential borrower to repay their loan.

### Step 1. Open the data file and have a look at the general information. 

In [1]:
import pandas as pd
import numpy as np 
  
logs = pd.read_csv('/datasets/credit_scoring_eng.csv')
logs.head(30)
#logs.info()
#logs.describe(include='all')

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
0,1,-8437.673028,42,bachelor's degree,0,married,0,F,employee,0,40620.102,purchase of the house
1,1,-4024.803754,36,secondary education,1,married,0,F,employee,0,17932.802,car purchase
2,0,-5623.42261,33,Secondary Education,1,married,0,M,employee,0,23341.752,purchase of the house
3,3,-4124.747207,32,secondary education,1,married,0,M,employee,0,42820.568,supplementary education
4,0,340266.072047,53,secondary education,1,civil partnership,1,F,retiree,0,25378.572,to have a wedding
5,0,-926.185831,27,bachelor's degree,0,civil partnership,1,M,business,0,40922.17,purchase of the house
6,0,-2879.202052,43,bachelor's degree,0,married,0,F,business,0,38484.156,housing transactions
7,0,-152.779569,50,SECONDARY EDUCATION,1,married,0,M,employee,0,21731.829,education
8,2,-6929.865299,35,BACHELOR'S DEGREE,0,civil partnership,1,F,employee,0,15337.093,having a wedding
9,0,-2188.756445,41,secondary education,1,married,0,M,employee,0,23108.15,purchase of the house for my family


### Conclusion

it is clear from the table that there are a lot of missing values in total income and days employed.and when we check the number of children column there are two corrupted values as -1 and 20 . 

### Step 2. Data preprocessing

### Processing missing values

print('Total Rows:', len(logs)) #how many rows are there and there missing values 

print(logs.isnull().sum())
logs.describe(include='all')

as it can be seen from the values the same rows of total_income and days_employed are missing .

In [2]:
num_columns = ['total_income','days_employed','children']

print('These are the means:')
print(logs.groupby('family_status')[num_columns].mean())
print() # New line
print('These are the medians:')
print(logs.groupby('family_status')[num_columns].median())

These are the means:
                   total_income  days_employed  children
family_status                                           
civil partnership  26694.428597   54587.019762  0.510175
divorced           27189.354550   64819.140232  0.457741
married            27041.784689   59202.282275  0.638207
unmarried          26934.069805   43834.346331  0.286527
widow / widower    22984.208556  202957.656283  0.223958

These are the medians:
                   total_income  days_employed  children
family_status                                           
civil partnership     23186.534   -1197.176853         0
divorced              23515.096   -1146.122484         0
married               23389.540   -1332.196271         0
unmarried             23149.028   -1015.245549         0
widow / widower       20514.190  337017.713307         0


In [3]:
print(logs.isnull().sum())

children               0
days_employed       2174
dob_years              0
education              0
education_id           0
family_status          0
family_status_id       0
gender                 0
income_type            0
debt                   0
total_income        2174
purpose                0
dtype: int64


In [4]:
print(logs[logs['income_type'] == '0'])

Empty DataFrame
Columns: [children, days_employed, dob_years, education, education_id, family_status, family_status_id, gender, income_type, debt, total_income, purpose]
Index: []


<font color='green'><h4>transforming by incometype </h4></font>

as business and employee cannot have the same income so that it is better to find out the median for differnt incometype 

In [5]:
logs["total_income"] = logs.groupby('income_type')["total_income"].transform('median')

logs[logs['days_employed'].isnull()].sample(5)

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
16519,1,,28,secondary education,1,married,0,M,business,0,27577.272,buying my own car
21391,0,,52,secondary education,1,married,0,F,business,0,27577.272,purchase of the house for my family
9582,0,,49,secondary education,1,divorced,3,F,employee,0,22815.1035,purchase of the house
14907,1,,31,secondary education,1,married,0,F,business,0,27577.272,getting higher education
17908,1,,36,some college,2,married,0,M,employee,0,22815.1035,purchase of the house for my family


In [6]:
logs[logs['days_employed'].isnull()]['total_income'].value_counts()

22815.1035    1105
27577.2720     508
18962.3180     413
24071.6695     147
79866.1030       1
Name: total_income, dtype: int64

In [7]:
 abs(logs["days_employed"]/365).describe()

count    19351.000000
mean       183.328024
std        380.906522
min          0.066141
25%          2.539751
50%          6.011563
75%         15.172281
max       1100.699727
Name: days_employed, dtype: float64

In [8]:
logs[logs['days_employed'] < 0]['days_employed'].count()

15906

when dividing by 365 difference is more than 1000 years it is impossible 

In [9]:
logs['days_employed'] = logs["days_employed"]/24
logs.head(20)

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
0,1,-351.569709,42,bachelor's degree,0,married,0,F,employee,0,22815.1035,purchase of the house
1,1,-167.700156,36,secondary education,1,married,0,F,employee,0,22815.1035,car purchase
2,0,-234.309275,33,Secondary Education,1,married,0,M,employee,0,22815.1035,purchase of the house
3,3,-171.864467,32,secondary education,1,married,0,M,employee,0,22815.1035,supplementary education
4,0,14177.753002,53,secondary education,1,civil partnership,1,F,retiree,0,18962.318,to have a wedding
5,0,-38.591076,27,bachelor's degree,0,civil partnership,1,M,business,0,27577.272,purchase of the house
6,0,-119.966752,43,bachelor's degree,0,married,0,F,business,0,27577.272,housing transactions
7,0,-6.365815,50,SECONDARY EDUCATION,1,married,0,M,employee,0,22815.1035,education
8,2,-288.744387,35,BACHELOR'S DEGREE,0,civil partnership,1,F,employee,0,22815.1035,having a wedding
9,0,-91.198185,41,secondary education,1,married,0,M,employee,0,22815.1035,purchase of the house for my family


In [10]:
#logs['days_employed'].fillna(0)
#logs.head(14)
#logs['days_employed'].fillna(value=0, inplace =True)
#logs['days_employed'] = logs.groupby('days_employed')['total_income'].fillna(0)
#logs['days_employed'].isnull().fillna(0)

logs['days_employed'] = logs['days_employed'].replace(np.nan, 0) #replacing nan by 0 
  

In [11]:
logs.head(20)

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
0,1,-351.569709,42,bachelor's degree,0,married,0,F,employee,0,22815.1035,purchase of the house
1,1,-167.700156,36,secondary education,1,married,0,F,employee,0,22815.1035,car purchase
2,0,-234.309275,33,Secondary Education,1,married,0,M,employee,0,22815.1035,purchase of the house
3,3,-171.864467,32,secondary education,1,married,0,M,employee,0,22815.1035,supplementary education
4,0,14177.753002,53,secondary education,1,civil partnership,1,F,retiree,0,18962.318,to have a wedding
5,0,-38.591076,27,bachelor's degree,0,civil partnership,1,M,business,0,27577.272,purchase of the house
6,0,-119.966752,43,bachelor's degree,0,married,0,F,business,0,27577.272,housing transactions
7,0,-6.365815,50,SECONDARY EDUCATION,1,married,0,M,employee,0,22815.1035,education
8,2,-288.744387,35,BACHELOR'S DEGREE,0,civil partnership,1,F,employee,0,22815.1035,having a wedding
9,0,-91.198185,41,secondary education,1,married,0,M,employee,0,22815.1035,purchase of the house for my family


### Conclusion

There are lot of details which are to be considered while looking at the data
  1) missing values are in the same rows of both total_income and days employed which is 2174 nearly 10% of data
  2)those having different income type cannot have same income thus they are replaced by mean of the same income type type
  3) days employed when divided by 365 have min and max value difference of 1000 years which is unacceptable
  4)days employed is incorrect and corrupted data 
  
  

In [12]:
logs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21525 entries, 0 to 21524
Data columns (total 12 columns):
children            21525 non-null int64
days_employed       21525 non-null float64
dob_years           21525 non-null int64
education           21525 non-null object
education_id        21525 non-null int64
family_status       21525 non-null object
family_status_id    21525 non-null int64
gender              21525 non-null object
income_type         21525 non-null object
debt                21525 non-null int64
total_income        21525 non-null float64
purpose             21525 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 2.0+ MB


### Data type replacement

In [13]:

logs['total_income'] = logs['total_income'].astype('int')# changing the value from float to int
#logs['days_employed'] = logs['days_employed'].astype('int')#change days employed to int 
logs['days_employed'] = abs(logs['days_employed'])# converting hte negativ to positive
logs.loc[logs['children'] == -1,'children'] = 1 # changing and replacing the value of -1 to 1
logs.loc[logs['children'] == 20,'children'] = 2 # replacing the value of 20 by 2
logs['children'].value_counts()
print(logs.info())
print(logs.head())
try:
    logs['dob_years'] = pd.to_numeric(logs['döb_years'])
    print('converted')
except:
    print('Error: Numeric conversion. Please check entries in dob years.')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21525 entries, 0 to 21524
Data columns (total 12 columns):
children            21525 non-null int64
days_employed       21525 non-null float64
dob_years           21525 non-null int64
education           21525 non-null object
education_id        21525 non-null int64
family_status       21525 non-null object
family_status_id    21525 non-null int64
gender              21525 non-null object
income_type         21525 non-null object
debt                21525 non-null int64
total_income        21525 non-null int64
purpose             21525 non-null object
dtypes: float64(1), int64(6), object(5)
memory usage: 2.0+ MB
None
   children  days_employed  dob_years            education  education_id  \
0         1     351.569709         42    bachelor's degree             0   
1         1     167.700156         36  secondary education             1   
2         0     234.309275         33  Secondary Education             1   
3         3     171.8

### Conclusion

in data type replacement,I did the conversion of data first to int then converted the negative values in days_employed to positive by using absolute function.
in number of children also there are two corruped entries, I changed thenm too ,as 20 by 2 and -1 by 1. i used the try and except methos to check

### Processing duplicates

In [14]:
#print(logs['days_employed'].drop_duplicates())#duplicate values
#print(logs['days_employed'].duplicated.sum())
print(logs['education'].duplicated().count()) #

21525


In [15]:
print(logs['education'].value_counts())

secondary education    13750
bachelor's degree       4718
SECONDARY EDUCATION      772
Secondary Education      711
some college             668
BACHELOR'S DEGREE        274
Bachelor's Degree        268
primary education        250
Some College              47
SOME COLLEGE              29
PRIMARY EDUCATION         17
Primary Education         15
graduate degree            4
Graduate Degree            1
GRADUATE DEGREE            1
Name: education, dtype: int64


<font color='green'><h4>Converting to lower case</h4></font>

In [16]:
def convert_to_lower():
    logs['education'] = logs['education'].str.lower() #converting all the column into lowercase

In [17]:
convert_to_lower()

In [18]:
logs.duplicated().sum()
logs[logs.duplicated()].sort_values('education').head()#sorting values 

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose
8583,0,0.0,58,bachelor's degree,0,unmarried,4,F,retiree,0,18962,supplementary education
4182,1,0.0,34,bachelor's degree,0,civil partnership,1,F,employee,0,22815,wedding ceremony
16904,1,0.0,32,bachelor's degree,0,married,0,F,employee,0,22815,buying a second-hand car
19387,0,0.0,38,bachelor's degree,0,civil partnership,1,F,business,0,27577,having a wedding
17379,0,0.0,54,bachelor's degree,0,married,0,M,business,0,27577,transactions with commercial real estate


In [19]:
logs.duplicated('education')

#logs[logs.duplicated()].sort_values('education').head()

0        False
1        False
2         True
3         True
4         True
         ...  
21520     True
21521     True
21522     True
21523     True
21524     True
Length: 21525, dtype: bool

In [20]:
logs = logs.drop_duplicates().reset_index(drop=True)#removing all duplicates by drop method
logs.duplicated().sum()

0

When we see the data on the file the education column has some cells in upward and some same words in lower case .first of all we will convert them to lower and then we will drop those duplicates

### Categorizing Data

<font color='green'><h4>Categorizing purpose by stemming </h4></font>

In [21]:
unique_purpose = logs['purpose'].unique()
display(unique_purpose)
len(unique_purpose)  #Checking length and unique structure of purpose column 

array(['purchase of the house', 'car purchase', 'supplementary education',
       'to have a wedding', 'housing transactions', 'education',
       'having a wedding', 'purchase of the house for my family',
       'buy real estate', 'buy commercial real estate',
       'buy residential real estate', 'construction of own property',
       'property', 'building a property', 'buying a second-hand car',
       'buying my own car', 'transactions with commercial real estate',
       'building a real estate', 'housing',
       'transactions with my real estate', 'cars', 'to become educated',
       'second-hand car purchase', 'getting an education', 'car',
       'wedding ceremony', 'to get a supplementary education',
       'purchase of my own house', 'real estate transactions',
       'getting higher education', 'to own a car', 'purchase of a car',
       'profile education', 'university education',
       'buying property for renting out', 'to buy a car',
       'housing renovation', 'going

38

In [22]:
from nltk.stem import SnowballStemmer #Pass the stemmer the value 'english' so that it's tuned for working with English
english_stemmer = SnowballStemmer('english')

In [23]:
english_stemmer = SnowballStemmer('english')

def purpose_category_eng(queries):
    for word in queries.split(' '):
        stemmed_word = english_stemmer.stem(word)
        if stemmed_word == 'hous':
            return 'housing'
        if stemmed_word == 'car':
            return 'car purchase'
        if stemmed_word == 'educ' or stemmed_word == 'univers':
            return 'education'
        if stemmed_word == 'wed':
            return 'wedding'
        if stemmed_word == 'estat':
            return 'real estate'
        if stemmed_word == 'properti':
            return 'property'
    return 'unknown'

logs['purpose'] = logs['purpose'].apply(purpose_category_eng)
logs['purpose']

0             housing
1        car purchase
2             housing
3           education
4             wedding
             ...     
21449         housing
21450    car purchase
21451        property
21452    car purchase
21453    car purchase
Name: purpose, Length: 21454, dtype: object

In [24]:
# reviewers code
logs['purpose'].value_counts()

real estate     4464
car purchase    4306
education       4013
housing         3809
property        2538
wedding         2324
Name: purpose, dtype: int64

In [26]:
logs.drop_duplicates(['family_status_id', 'family_status'])[['family_status_id', 'family_status']]

Unnamed: 0,family_status_id,family_status
0,0,married
4,1,civil partnership
18,2,widow / widower
19,3,divorced
24,4,unmarried


In [27]:
logs.drop_duplicates(['education_id', 'education'])[['education_id', 'education']]

Unnamed: 0,education_id,education
0,0,bachelor's degree
1,1,secondary education
13,2,some college
31,3,primary education
2962,4,graduate degree


<font color='green'><h4>Categorizing income by debt </h4></font>

In [29]:
def credit_scoring(row):#income by checking the defaulters on the basis of income and debt
    
    total_income =row['total_income']
    debt = row['debt']
    
    if total_income <= 17000:
        if debt == 0:
            return 'low income non defaulters'
    if total_income <= 17000:
        if debt == 1:
            return 'low income defaulters'
    if ((total_income <= 17000) and (total_income <= 23000)):
        if debt == 0:
            return 'low middle income non defaulters'
    if ((total_income <= 17000) and (total_income <= 23000)):
        if debt == 1:
            return 'low middle income defaulters'
    if ((total_income <= 32000) and (total_income <= 23000)):
        if debt == 0:
            return 'medium income non defaulters'
    if ((total_income <= 32000) and (total_income <= 23000)):
        if debt == 1:
            return 'medium income defaulters'
    if total_income <= 60000:
        if debt == 0:
            return 'high income non defaulters'

    return 'high income defaulters'
logs['final_status'] = logs.apply(credit_scoring, axis=1)
#print(logs['final_status'].value_counts())
print(logs.head(10))

   children  days_employed  dob_years            education  education_id  \
0         1     351.569709         42    bachelor's degree             0   
1         1     167.700156         36  secondary education             1   
2         0     234.309275         33  secondary education             1   
3         3     171.864467         32  secondary education             1   
4         0   14177.753002         53  secondary education             1   
5         0      38.591076         27    bachelor's degree             0   
6         0     119.966752         43    bachelor's degree             0   
7         0       6.365815         50  secondary education             1   
8         2     288.744387         35    bachelor's degree             0   
9         0      91.198185         41  secondary education             1   

       family_status  family_status_id gender income_type  debt  total_income  \
0            married                 0      F    employee     0         22815   
1

In [30]:
logs[['dob_years', 'total_income', 'children']].describe()

Unnamed: 0,dob_years,total_income,children
count,21454.0,21454.0,21454.0
mean,43.271231,23343.921646,0.480563
std,12.570822,2856.60558,0.756069
min,0.0,8612.0,0.0
25%,33.0,22815.0,0.0
50%,42.0,22815.0,0.0
75%,53.0,24071.0,1.0
max,75.0,79866.0,5.0


In [31]:
def category_total_income(total_income):
    
    if total_income < 17000:
        return 0
    elif 17000 <= total_income < 23000:
        return 1
    elif 23000 <= total_income < 32000:
        return 2
    elif 32000 <= total_income < 60000:
        return 3
    else:
        return 4
    
def category_children(children):
    if children < 1:
        return 0
    elif 1 <= children < 3:
        return 1
    else:
        return 2

def category_dob_years(dob_years):
    if dob_years < 35:
        return 0
    elif 35 <= dob_years < 45:
        return 1
    else:
        return 2
    
def process_categorize():
    logs['children_category'] = logs['children'].apply(category_children)
    logs['total_income_category'] = logs['total_income'].apply(category_total_income)
    logs['dob_years_category'] = logs['dob_years'].apply(category_dob_years)
    

In [32]:
process_categorize()
logs.head(10)

Unnamed: 0,children,days_employed,dob_years,education,education_id,family_status,family_status_id,gender,income_type,debt,total_income,purpose,final_status,children_category,total_income_category,dob_years_category
0,1,351.569709,42,bachelor's degree,0,married,0,F,employee,0,22815,housing,medium income non defaulters,1,1,1
1,1,167.700156,36,secondary education,1,married,0,F,employee,0,22815,car purchase,medium income non defaulters,1,1,1
2,0,234.309275,33,secondary education,1,married,0,M,employee,0,22815,housing,medium income non defaulters,0,1,0
3,3,171.864467,32,secondary education,1,married,0,M,employee,0,22815,education,medium income non defaulters,2,1,0
4,0,14177.753002,53,secondary education,1,civil partnership,1,F,retiree,0,18962,wedding,medium income non defaulters,0,1,2
5,0,38.591076,27,bachelor's degree,0,civil partnership,1,M,business,0,27577,housing,high income non defaulters,0,2,0
6,0,119.966752,43,bachelor's degree,0,married,0,F,business,0,27577,housing,high income non defaulters,0,2,1
7,0,6.365815,50,secondary education,1,married,0,M,employee,0,22815,education,medium income non defaulters,0,1,2
8,2,288.744387,35,bachelor's degree,0,civil partnership,1,F,employee,0,22815,wedding,medium income non defaulters,1,1,1
9,0,91.198185,41,secondary education,1,married,0,M,employee,0,22815,housing,medium income non defaulters,0,1,1


### Conclusion

### Step 3. Answer these questions

- Is there a relation between having kids and repaying a loan on time?

In [30]:
#using pivot table to compare the borrowers with less and more kids
logs.pivot_table(index=['children'], values = 'debt', aggfunc={'sum','mean','count','max','min'})

Unnamed: 0_level_0,count,max,mean,min,sum
children,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,14091.0,1.0,0.075438,0.0,1063.0
1,4855.0,1.0,0.091658,0.0,445.0
2,2128.0,1.0,0.094925,0.0,202.0
3,330.0,1.0,0.081818,0.0,27.0
4,41.0,1.0,0.097561,0.0,4.0
5,9.0,0.0,0.0,0.0,0.0


### Conclusion

Thus when comparing borowers with kids it can be seen people with children are more defaulters.More kids as looking out the figures showed highest mean.

- Is there a relation between marital status and repaying a loan on time?

In [33]:
logs.pivot_table(index=['family_status'], values = 'debt', aggfunc={'sum','mean','count'})

Unnamed: 0_level_0,count,mean,sum
family_status,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
civil partnership,4151.0,0.093471,388.0
divorced,1195.0,0.07113,85.0
married,12339.0,0.075452,931.0
unmarried,2810.0,0.097509,274.0
widow / widower,959.0,0.065693,63.0


### Conclusion

The relationship can be seen those who are unmarried and civil partnership shows the highest rate and neraly same figures. when compared to widower and divorcee who show the nearby figures . 

- Is there a relation between income level and repaying a loan on time?

In [34]:
logs.pivot_table(index=['total_income_category'], values = 'debt', aggfunc={'sum','mean','count'})

Unnamed: 0_level_0,count,mean,sum
total_income_category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,2.0,0.5,1.0
1,14915.0,0.085686,1278.0
2,6535.0,0.070696,462.0
4,2.0,0.0,0.0


### Conclusion

Overall in the data ,it can be analyzed that people with income between 23000 and 32000 are showing good figures.

- How do different loan purposes affect on-time repayment of the loan?

In [35]:
logs.pivot_table(index=['purpose'], values = 'debt', aggfunc={'sum','mean','count'})

Unnamed: 0_level_0,count,mean,sum
purpose,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
car purchase,4306.0,0.09359,403.0
education,4013.0,0.0922,370.0
housing,3809.0,0.067209,256.0
property,2538.0,0.074862,190.0
real estate,4464.0,0.075269,336.0
wedding,2324.0,0.080034,186.0


### Conclusion

It can be seen from the figures that car and education has highest debtors while property has the lowest.

### Step 4. General conclusion

This project provide a report for a bank’s loan division. we have to find out if a customer’s marital status and number of children has an impact on whether they will default on a loan.The credit scoring depends upon borrower to pay loan.As while analzing data it can be concluded that
1)the data in the table contain children days_employed,dob_years,education,education_id,family_status,family_status_id,gender	income_type	debt,total_income,purpose
2) Description of the data
     a) children: the number of children in the family
     b) days_employed: how long the customer has been working
     c) dob_years: the customer’s age
     d) education: the customer’s education level
     e) education_id: identifier for the customer’s education
     f) family_status: the customer’s marital status
     g) family_status_id: identifier for the customer’s marital status
     h) gender: the customer’s gender
     i) income_type: the customer’s income type
     j) debt: whether the customer has ever defaulted on a loan
     k) total_income: monthly income
     l) purpose: reason for taking out a loan
3)First of all looking data into detailed view ,it is quite visible that there is alot of missing data specially in days_employed and total_income
4)Secondly after analyzing data filling the missing value of total_income by median on the basis of income type and filling days_employed by 0 was the best option to opt out 
5)then changing the data types from float to int in the total_income column and making the values of days_employed positive as most of the data is corrupted in it
6) Thirdly removing the duplicates in education column ,then categorized the data by stemming in purpose column by assigning priority to it .and categorizing total_income on the basis of figures of income,date of birth on the basis of years and then the number of childrens in the children column
Overall when analysing the relationship 
1) The relationship between number of children and borrower are inversely propotional to each other,as when we see the details with one kid count is 4855.0 and mean is 9.1% ,compared to 4 kid where count is 4 and mean is 9.7%.the highest count is figured 
2) When looking into marital status it is also the same with weak relationship between borrower.when looking into figures for unmarried count is 2813.0 and mean is 9.7% and those who are married counted 12380 with mean of7.5%
3) analyzing the relationship between income level and repaying the loan ,it can be seen that those people whose income is between 23000 and 32000 listed less defaulter rate with mean of 8.5% while the lowest mean of 5% with salary less than 17000
4)looking into loan purposes and ontime repayment.it is figured that car purchase and education have highest debtors showing the mean of 9.3% and 9.1% while car purchase showed the highest with 9,3%


As over goal was to find out the credit scoring of a potential customer. A ** credit scoring ** is used to evaluate the ability of a potential borrower to repay their loan. it can be analysed that there are multiple relationship between different column. Overall it can be concluded that, the credit scoring depend upon factors , number of children and marital status have reverse effect on credit of borrowers and total income is definietly one of the most important when combined with purpose and debt showed different relationship. 
