## Credit EDA 

This case study aims to give you an idea of applying EDA in a real business scenario. In this case study, apart from applying the techniques that you have learnt in the EDA module, you will also develop a basic understanding of risk analytics in banking and financial services and understand how data is used to minimise the risk of losing money while lending to customers.

Business Understanding
The loan providing companies find it hard to give loans to the people due to their insufficient or non-existent credit history. Because of that, some consumers use it as their advantage by becoming a defaulter. Suppose you work for a consumer finance company which specialises in lending various types of loans to urban customers. You have to use EDA to analyse the patterns present in the data. This will ensure that the applicants are capable of repaying the loan are not rejected.

 

When the company receives a loan application, the company has to decide for loan approval based on the applicant’s profile. Two types of risks are associated with the bank’s decision:

If the applicant is likely to repay the loan, then not approving the loan results in a loss of business to the company

If the applicant is not likely to repay the loan, i.e. he/she is likely to default, then approving the loan may lead to a financial loss for the company.

In [None]:
# Importing the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_rows', None) #so that large dataframes can be seen in output wholely
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Reading dataset from local
application_data = pd.DataFrame(pd.read_csv("../input/loan-defaulter/application_data.csv"))

In [None]:
#getting an idea about the shape of the dataframe
application_data.shape

In [None]:
application_data.info()

In [None]:
#getting an idea about the datatypes of the dataframe
application_data.dtypes

# Data Cleaning

In [None]:

#getting the percentage of null values in each column
application_data.isnull().sum()/len(application_data)*100

In [None]:
#this cell can be deleted
#emptycol=application_data.isnull().sum()
#emptycol=emptycol[emptycol.values>(0.3*len(emptycol))]
#print(emptycol)
#len(emptycol)


In [None]:
#findidng coloums having greater than 30% null value

emptycol=application_data.isnull().sum()/len(application_data)*100
#print(emptycol)
emptycol=emptycol[emptycol.values>30.0]
#emptycol=emptycol[emptycol.values>(0.3*len(emptycol))]
print(emptycol)
len(emptycol)


In [None]:
# Removing those 50 columns
emptycol = list(emptycol[emptycol.values>=30.0].index)
application_data.drop(labels=emptycol,axis=1,inplace=True)
print(len(emptycol))

In [None]:
# Checking the columns having less null percentage

application_data.isnull().sum()/len(application_data)*100

#### We can see that we have some null value in the columns.All these values are less than 30% 
# Analysis of few columns and  if null value replacement is required or not
##### 1. Starting with AMT_ANNUITY column

In [None]:
#box plotting the values of AMT_ANNUITY
sns.boxplot(y=application_data['AMT_ANNUITY'])
plt.yscale('log')
plt.show()

In [None]:
print(application_data['AMT_ANNUITY'].mean())
print(application_data['AMT_ANNUITY'].median())
print(application_data['AMT_ANNUITY'].describe())

#### Explanation:  From the Box Plot we can see that there are sever outliners and the difference between max and min is quite severe. So we are taking median value to replace those null values.

In [None]:
# Filling missing values with median
missingValuesFill=application_data['AMT_ANNUITY'].median()
application_data['AMT_ANNUITY'].fillna(value = missingValuesFill, inplace =True)

In [None]:
#application_data.isnull().sum()/len(application_data)*100

#### 2. Analysis of CNT_FAM_MEMBERS

In [None]:
application_data['CNT_FAM_MEMBERS'].value_counts(dropna=False)
#application_data['AMT_ANNUITY'].fillna(value = application_data['CNT_FAM_MEMBERS'].median(), inplace =True)

In [None]:
#ploting the data from CNT_FAM_MEMBERS coloumn in a box plot to detect outliners
sns.boxplot(y=application_data['CNT_FAM_MEMBERS'])
plt.yscale('log')
plt.show()

In [None]:
print(application_data['CNT_FAM_MEMBERS'].mean())
print(application_data['CNT_FAM_MEMBERS'].median())
print(application_data['CNT_FAM_MEMBERS'].describe())

In [None]:
# Filling missing values with median
missingValuesFill=application_data['CNT_FAM_MEMBERS'].median()
application_data['CNT_FAM_MEMBERS'].fillna(value = missingValuesFill, inplace =True)

#### Explanation: From the boxPlot we can see that there are seven outliners and their is quite a difference between 75% and max .So we are taking median value to replace those null values

### 3. Analysis of Code gender

In [None]:
application_data['CODE_GENDER'].value_counts(dropna=False)

#### We can see that Female(F) is having the majority and only 4 rows are having XNA values. So, there wont be any major impact in the dataset if we can update those columns with Gender 'F'.

In [None]:
## replace XNA with F
application_data.loc[application_data['CODE_GENDER']=='XNA','CODE_GENDER']='F'
application_data['CODE_GENDER'].value_counts()

#### 4. Analysis of ORGANIZATION_TYPE 

In [None]:
application_data['ORGANIZATION_TYPE'].value_counts(dropna=False)

In [None]:
print(application_data['ORGANIZATION_TYPE'].mode())
print(application_data['ORGANIZATION_TYPE'].describe())

#### There are 55374 rows wit XNA value which is 18% of the total count. So we can discard them.

#### 5. Analysis of AMT_GOODS_PRICE

In [None]:
#box plotting the values of AMT_ANNUITY
sns.boxplot(y=application_data['AMT_GOODS_PRICE'])
plt.yscale('log')
plt.show()

In [None]:
print(application_data['AMT_GOODS_PRICE'].describe())
print(application_data['AMT_GOODS_PRICE'].median())
print(application_data['AMT_GOODS_PRICE'].mean())
print(application_data['AMT_GOODS_PRICE'].max())
print(application_data['AMT_GOODS_PRICE'].min())

#### so as we are not getting any clear understanding of the data , so we would keep the null values.

#### 6. Analysis of AMT_REQ_CREDIT_BUREAU_DAY

In [None]:
sns.boxplot(y=application_data['AMT_REQ_CREDIT_BUREAU_DAY'])
plt.show()

In [None]:
print(application_data['AMT_REQ_CREDIT_BUREAU_DAY'].describe())

#####  Explanation: As we can see the numerical column AMT_REQ_CREDIT_BUREAU_DAY has outliers so they need to be removed or capped.Also for missing value fillup we need to use the median in this case.

# Handling outlier

In [None]:
##----Removing outliers for the column below----##
columns_of_outliers=['AMT_REQ_CREDIT_BUREAU_DAY']
for col in columns_of_outliers:
    percentiles = application_data[col].quantile([0.01,0.99]).values
    application_data[col][application_data[col] <= percentiles[0]] = percentiles[0]
    application_data[col][application_data[col] >= percentiles[1]] = percentiles[1]


In [None]:
sns.boxplot(y=application_data['AMT_REQ_CREDIT_BUREAU_DAY'])
plt.show()

## Changing the datatype for the required columns

In [None]:
# Casting all variable into numeric in the dataset

numeric_columns=['TARGET','CNT_CHILDREN','AMT_INCOME_TOTAL','AMT_CREDIT','AMT_ANNUITY','REGION_POPULATION_RELATIVE','DAYS_BIRTH',
                'DAYS_EMPLOYED','DAYS_REGISTRATION','DAYS_ID_PUBLISH','HOUR_APPR_PROCESS_START','LIVE_REGION_NOT_WORK_REGION', 'REG_CITY_NOT_LIVE_CITY',
       'REG_CITY_NOT_WORK_CITY', 'LIVE_CITY_NOT_WORK_CITY']

application_data[numeric_columns]=application_data[numeric_columns].apply(pd.to_numeric)
application_data.head(5)

In [None]:
# Creating bins for income amount

bins = [0,25000,50000,75000,100000,125000,150000,175000,200000,225000,250000,275000,300000,325000,350000,375000,400000,425000,450000,475000,500000,10000000000]
slot = ['0-25000', '25000-50000','50000-75000','75000,100000','100000-125000', '125000-150000', '150000-175000','175000-200000',
       '200000-225000','225000-250000','250000-275000','275000-300000','300000-325000','325000-350000','350000-375000',
       '375000-400000','400000-425000','425000-450000','450000-475000','475000-500000','500000 and above']

application_data['AMT_INCOME_RANGE']=pd.cut(application_data['AMT_INCOME_TOTAL'],bins,labels=slot)

In [None]:
# Creating bins for Credit amount

bins = [0,150000,200000,250000,300000,350000,400000,450000,500000,550000,600000,650000,700000,750000,800000,850000,900000,1000000000]
slots = ['0-150000', '150000-200000','200000-250000', '250000-300000', '300000-350000', '350000-400000','400000-450000',
        '450000-500000','500000-550000','550000-600000','600000-650000','650000-700000','700000-750000','750000-800000',
        '800000-850000','850000-900000','900000 and above']

application_data['AMT_CREDIT_RANGE']=pd.cut(application_data['AMT_CREDIT'],bins=bins,labels=slots)

In [None]:
# Dividing the dataset into two dataset of  target=1(client with payment difficulties) and target=0(all other)

target0 = application_data.loc[application_data["TARGET"]==0]
target1 = application_data.loc[application_data["TARGET"]==1]


In [None]:
# Calculating Imbalance percentage
    
round(len(target0)/len(target1),2)

# Univariate analysis for categories

In [None]:
# reusable ploting function

def plotfunc(df,col,title,hue =None):
    
    sns.set_style('darkgrid')
    sns.set_context('poster')
    plt.rcParams["axes.labelsize"] = 20
    plt.rcParams['axes.titlesize'] = 22
    plt.rcParams['axes.titlepad'] = 30
    
    
    temp = pd.Series(data = hue)
    fig, ax = plt.subplots()
    width = len(df[col].unique()) + 7 + 4*len(temp.unique())
    fig.set_size_inches(width , 8)
    plt.xticks(rotation=45)
    plt.yscale('log')
    plt.title(title)
    ax = sns.countplot(data = df, x= col, order=df[col].value_counts().index,hue = hue,palette='bright') 
        
    plt.show()
    
    
    

In [None]:
# PLotting for income range for target0

plotfunc(target0,col='AMT_INCOME_RANGE',title='Target 0 income range',hue='CODE_GENDER')


Points to be concluded from the above graph for target =0(Non- Defaulters).

1. Female counts are higher than male.
2. This graph show that females are more than male in having credits for that range.


In [None]:
# PLotting for income range for target1

plotfunc(target1,col='AMT_INCOME_RANGE',title='Target 1 income range',hue='CODE_GENDER')

Points to be concluded from the above graph for target = 1 (Defaulters).

1. Male counts are higher than female.
2. This graph show that males are more than female in having credits between 100000 and 200000.


In [None]:
# Plotting for Income type for target 0

plotfunc(target0,col='NAME_INCOME_TYPE',title='Target 0 Income type',hue='CODE_GENDER')

Points to be concluded from the above graph for target =0(Non- Defaulters).

1. Female are having more credit than males
2. High number of credit for income type working , commercial associate , pensioner and state servant
3. Low number of credit for income type student ,unemployed, businessman and maternity leave

In [None]:
# Plotting for Income type for target1

plotfunc(target1,col='NAME_INCOME_TYPE',title='Target 1 Income type',hue='CODE_GENDER')

Points to be concluded from the above graph for target =1 (Defaulters).

1. High number of credit for income type working , commercial associate , pensioner and state servant.Same as of target0
2. Low number of credit for income type unemployed and maternity leave

In [None]:
fig, ax =plt.subplots(1,2,figsize=(15,5))

sns.countplot(target0['CNT_CHILDREN'], ax=ax[0]).set_title('Target 0( Not A Defaulter)')
sns.countplot(target1['CNT_CHILDREN'], ax=ax[1]).set_title('Target 1 (Defaulter)')
fig.show()

Points to be concluded from the above graph :

1. We can see that low child count maximizes that chances of both being a defaulter and also non defaulter.So we cannot conclude any specifics from this exploration.

In [None]:
fig, ax =plt.subplots(1,2,figsize=(35,13))
sns.countplot(target0['NAME_EDUCATION_TYPE'], ax=ax[0]).set_title('Target 0( Not A Defaulter)')
sns.countplot(target1['NAME_EDUCATION_TYPE'], ax=ax[1]).set_title('Target 1 (Defaulter)')
fig.show()

Points to be concluded from the above graph :

1. From this comparison we can see that people with secondary education has defaulted the most.

In [None]:
plotfunc(target0,col='NAME_CONTRACT_TYPE',title='Target 0 of contract type',hue='CODE_GENDER')

Points to be concluded from the above graph for target =0(Non- Defaulters).

   1. Cash Loan contracts have a higher number of credit than revolving loan contracts
   2. Count of female is more

In [None]:
plotfunc(target1,col='NAME_CONTRACT_TYPE',title='Target 1 contract type',hue='CODE_GENDER')

Points to be concluded from the above graph for target = 1 (Defaulters).

   1. Cash Loan contracts have a higher number of credit than revolving loan contracts
   2. There is only female revolving loans

# Continuous Univariate Analysis 

In [None]:
# Function for box plot
def cusBoxPlot(data,col,title):
    sns.set_style('darkgrid')
    sns.set_context('poster')
    plt.rcParams["axes.labelsize"] = 20
    plt.rcParams['axes.titlesize'] = 22
    plt.rcParams['axes.titlepad'] = 30
    
    plt.title(title)
    plt.yscale('log')
    sns.boxplot(data =data, x=col,orient='v')
    plt.show()

In [None]:
# Distribution of income amount for target0

cusBoxPlot(data=target0,col='AMT_INCOME_TOTAL',title='Target 0 income amount')

In [None]:
# Distribution of income amount for target1

cusBoxPlot(data=target1,col='AMT_INCOME_TOTAL',title='Target 1 income amount')

Points to be concluded from the above 2 graph 

1. Outliners are present in both
2. 3rd quartile is narrow for both target 1 and target 0
3. Most of the clients have income in the 1st quartile

In [None]:
# Disrtibution of credit amount for target 0

cusBoxPlot(data=target0,col='AMT_CREDIT',title='Target 0 credit amount')

In [None]:
# Disrtibution of credit amount for target 1

cusBoxPlot(data=target1,col='AMT_CREDIT',title='Target 1 credit amount')

Points to be concluded from the above 2 graph 

1. Outliners are present in both
2. 3rd quartile is narrow for both target 1 and target 0
3. Most of the clients have credit amount in the 1st quartile

# Bivariate analysis

### Analysing  correlation for numerical columns for both target 0 and 1

#### Plotting Correlation matrix for Target 0 application data

In [None]:


d=target0[['SK_ID_CURR','CNT_CHILDREN','AMT_INCOME_TOTAL','AMT_CREDIT','AMT_ANNUITY',
                               'AMT_GOODS_PRICE','DAYS_BIRTH','DAYS_EMPLOYED','CNT_FAM_MEMBERS','REGION_RATING_CLIENT',
                              'REGION_POPULATION_RELATIVE','DAYS_ID_PUBLISH']]
plt.figure(figsize=(30,30))

sns.heatmap(d.corr(), fmt='.1f', cmap="RdYlGn", annot=True)
plt.show()

#### These columns have high correlation values for Target 0.
"AMT_GOODS_PRICE" and "AMT_CREDIT"
"AMT_ANNUITY" and"AMT_CREDIT"
"AMT_ANNUITY" and "AMT_GOODS_PRICE"
"CNT_FAM_MEMBERS" and "CNT_CHILDREN"
"AMT_ANNUITY" and"AMT_INCOME_TOTAL"
"AMT_INCOME_TOTAL" and"AMT_GOODS_PRICE"


#### -- Plotting Correlation matrix for Target 1 application data --

In [None]:

d=target1[['SK_ID_CURR','CNT_CHILDREN','AMT_INCOME_TOTAL','AMT_CREDIT','AMT_ANNUITY',
                               'AMT_GOODS_PRICE','DAYS_BIRTH','DAYS_EMPLOYED','CNT_FAM_MEMBERS','REGION_RATING_CLIENT',
                              'REGION_POPULATION_RELATIVE','DAYS_ID_PUBLISH']]
#plt.figure(figsize=(15,10))

#sns.heatmap(d.corr(), cmap="YlGnBu", annot=True)

f, ax = plt.subplots(figsize=(30, 30))
sns.heatmap(d.corr(), annot=True, fmt='.1f',cmap="RdYlGn", linewidths=.5, ax=ax)

plt.show()

Both for Target 0 and Target 1 these columns have high correlation values.
"AMT_GOODS_PRICE" and "AMT_CREDIT"
"AMT_ANNUITY" and "AMT_CREDIT"
"AMT_ANNUITY" and "AMT_GOODS_PRICE"
"CNT_FAM_MEMBERS" and "CNT_CHILDREN"
"AMT_ANNUITY" and "AMT_INCOME_TOTAL"
"AMT_INCOME_TOTAL" and "AMT_GOODS_PRICE"

In [None]:
#ploting income vs credit for target 0
sns.jointplot('AMT_INCOME_TOTAL', 'AMT_CREDIT', target0)
plt.show()

In [None]:
#ploting income vs credit for target 1
sns.jointplot('AMT_INCOME_TOTAL', 'AMT_CREDIT', target1)
plt.show()

In [None]:
sns.jointplot('CNT_CHILDREN', 'AMT_INCOME_TOTAL', target0)
plt.show()

In [None]:
sns.jointplot('CNT_CHILDREN', 'AMT_INCOME_TOTAL', target1)
plt.show()

## 1.  Analysis of Credit amount with respect to Education status 

In [None]:

sns.catplot(data =target0, x='NAME_EDUCATION_TYPE',y='AMT_CREDIT', hue ='NAME_FAMILY_STATUS',height=6,aspect=4, kind="bar", palette="muted")
plt.title('Credit amount vs Education Status for Traget 0')
#plt.show()

Points to be concluded from the above graph for target = 0 (Non-Defaulters).

1. Customers holding academic degree have greater credit amount, Civil marriage segment being the highest among them.
2. Lower educated customers tends to have lower credit amount, Widows being the lowest among them
3. Married customers in almost all education segment except lower secondary and academic degrees have a higher credit amount.

In [None]:
sns.catplot(data =target1, x='NAME_EDUCATION_TYPE',y='AMT_CREDIT', hue ='NAME_FAMILY_STATUS',height=6,aspect=4, kind="bar", palette="muted")
plt.title('Credit amount vs Education Status for Traget 1')

Points to be concluded from the above graph for target = 1 (Defaulters).

1. Married Academic degree holding customers generally have a higher credit amount and so their   defaulting rate is also high
2. Accross all education segment married customer tends to have higher credit amount
3. Customers holding lower eductation tends to have a lower credit amount
4. Single and Married are the only 2 family types present in academic degree .

## 2. Analysis of  Income amount with respect to Education Status

In [None]:
# Box plotting for Income amount in logarithmic scale

plt.figure(figsize=(16,12))
plt.xticks(rotation=45)
plt.yscale('log')
sns.boxplot(data =target0, x='NAME_EDUCATION_TYPE',y='AMT_INCOME_TOTAL', hue ='NAME_FAMILY_STATUS',orient='v')
plt.title('Income amount vs Education Status for Target 0')
plt.show()

Points to be concluded from the above graph for target = 0 (Non-Defaulters).

1. For Education type 'Higher education' the income amount mean is mostly equal with family status. It does contain many outliers.
2. Less outlier are having for Academic degree but they are having the income amount is little higher that Higher education.
3. Lower secondary of civil marriage family status are have less income amount than others.


In [None]:
# Box plotting for Income amount in logarithmic scale

plt.figure(figsize=(16,12))
plt.xticks(rotation=45)
plt.yscale('log')
sns.boxplot(data =target1, x='NAME_EDUCATION_TYPE',y='AMT_INCOME_TOTAL', hue ='NAME_FAMILY_STATUS',orient='v')
plt.title('Income amount vs Education Status for Target 1')
plt.show()

####  Explanation: 

Points to be concluded from the above graph for target = 1 (Defaulters).

1. Have some similarity with Target0, From above boxplot for Education type 'Higher education' the income amount is mostly equal with family status.
2. No outlier for Academic degree but there income amount is little higher that Higher education.
3. Lower secondary are have less income amount than others.

In [None]:
# Reading the dataset of previous application

previous_application =pd.read_csv(r"../input/loan-defaulter/previous_application.csv")

In [None]:
# Cleaning the missing data

# listing the null values columns having more than 30%

emptycol1=previous_application.isnull().sum()
emptycol1=emptycol1[emptycol1.values>(0.3*len(emptycol1))]
len(emptycol1)


In [None]:
# Removing those 15 columns

emptycol1 = list(emptycol1[emptycol1.values>=0.3].index)
previous_application.drop(labels=emptycol1,axis=1,inplace=True)

previous_application.shape

In [None]:
# Removing the column values of 'XNA' and 'XAP'

previous_application=previous_application.drop(previous_application[previous_application['NAME_CASH_LOAN_PURPOSE']=='XNA'].index)
previous_application=previous_application.drop(previous_application[previous_application['NAME_CASH_LOAN_PURPOSE']=='XNA'].index)
previous_application=previous_application.drop(previous_application[previous_application['NAME_CASH_LOAN_PURPOSE']=='XAP'].index)

previous_application.shape

In [None]:
# Now merging the Application dataset with previous appliaction dataset

Merged_data =pd.merge(left=application_data,right=previous_application,how='inner',on='SK_ID_CURR',suffixes='_x')

In [None]:
# Renaming the column names after merging

Merged_data = Merged_data.rename({'NAME_CONTRACT_TYPE_' : 'NAME_CONTRACT_TYPE','AMT_CREDIT_':'AMT_CREDIT','AMT_ANNUITY_':'AMT_ANNUITY',
                         'WEEKDAY_APPR_PROCESS_START_' : 'WEEKDAY_APPR_PROCESS_START',
                         'HOUR_APPR_PROCESS_START_':'HOUR_APPR_PROCESS_START','NAME_CONTRACT_TYPEx':'NAME_CONTRACT_TYPE_PREV',
                         'AMT_CREDITx':'AMT_CREDIT_PREV','AMT_ANNUITYx':'AMT_ANNUITY_PREV',
                         'WEEKDAY_APPR_PROCESS_STARTx':'WEEKDAY_APPR_PROCESS_START_PREV',
                         'HOUR_APPR_PROCESS_STARTx':'HOUR_APPR_PROCESS_START_PREV'}, axis=1)



In [None]:
# Removing unwanted columns for analysis

Merged_data.drop(['SK_ID_CURR','WEEKDAY_APPR_PROCESS_START', 'HOUR_APPR_PROCESS_START','REG_REGION_NOT_LIVE_REGION', 
              'REG_REGION_NOT_WORK_REGION','LIVE_REGION_NOT_WORK_REGION', 'REG_CITY_NOT_LIVE_CITY',
              'REG_CITY_NOT_WORK_CITY', 'LIVE_CITY_NOT_WORK_CITY','WEEKDAY_APPR_PROCESS_START_PREV',
              'HOUR_APPR_PROCESS_START_PREV', 'FLAG_LAST_APPL_PER_CONTRACT','NFLAG_LAST_APPL_IN_DAY'],axis=1,inplace=True)

**Performing univariate analysis**

In [None]:
# Distribution of contract status in logarithmic scale

sns.set_style('whitegrid')
sns.set_context('talk')

plt.figure(figsize=(15,30))
plt.rcParams["axes.labelsize"] = 20
plt.rcParams['axes.titlesize'] = 22
plt.rcParams['axes.titlepad'] = 30
plt.xticks(rotation=90)
plt.yscale('log')
plt.title('Distribution of contract status with purposes')
ax = sns.countplot(data = Merged_data, x= 'NAME_CASH_LOAN_PURPOSE', 
                   order=Merged_data['NAME_CASH_LOAN_PURPOSE'].value_counts().index,hue = 'NAME_CONTRACT_STATUS',palette='husl') 

We can conclude the below points from the graph:
1.Most rejection of loans came from purpose 'repairs'.
2.We have almost equal number of approves and rejection for Medicine,Every day expenses and education
purposes.

In [None]:
# Distribution of contract status

sns.set_style('whitegrid')
sns.set_context('talk')

plt.figure(figsize=(15,30))
plt.rcParams["axes.labelsize"] = 20
plt.rcParams['axes.titlesize'] = 22
plt.rcParams['axes.titlepad'] = 30
plt.xticks(rotation=90)
plt.yscale('log')
plt.title('Distribution of purposes with target ')
ax = sns.countplot(data = Merged_data, x= 'NAME_CASH_LOAN_PURPOSE', 
                   order=Merged_data['NAME_CASH_LOAN_PURPOSE'].value_counts().index,hue = 'TARGET',palette='husl') 

we can conclude from above plot that  Loan purposes with 'Repairs' are facing more difficulites in payment on time.


**Performing bivariate analysis**

In [None]:
# Box plotting for Credit amount prev vs Housing type in logarithmic scale

#plt.figure(figsize=(16,12))
#plt.xticks(rotation=90)
#sns.barplot(data =Merged_data, y='AMT_CREDIT_PREV',hue='TARGET',x='NAME_HOUSING_TYPE',palette='husl')
sns.catplot(x="NAME_HOUSING_TYPE", y="AMT_CREDIT_PREV", hue="TARGET", data=Merged_data, kind="violin",height=6,aspect=4,palette='husl')
plt.title('Prev Credit amount vs Housing type')
plt.show()

So, we can conclude that bank should avoid giving loans to the housing type of co-op apartment as they are having difficulties in payment.
