![](https://www.paisabazaar.com/wp-content/uploads/2017/11/Loan-2.jpg)


# **Introduction:**



The current data set includes details of the 500 people who have opted for loan. Also, the data mentions whether the person has paid back the loan or not and if paid, in how many days they have paid. In this project, we will try to draw few insights on sample Loan data.


****Please find the details of dataset below which can help to understand the features in it.
****

1-Loan_id : A unique loan (ID) assigned to each loan customers- system generated

2- Loan_status : Tell us if a loan is paid off, in collection process - customer is yet to payoff, or paid off after the collection efforts

3-Principal : Principal loan amount at the case origination OR Amount of Loan Applied

4-terms : Schedule(time period to repay)

5-Effective_date : When the loan got originated (started)

6-Due_date : Due date by which loan should be paid off

7-Paidoff_time : Actual time when loan was paid off , null means yet to be paid

8-Past_due_days : How many days a loan has past due date

9-Age : Age of customer

10-Education : Education level of customer applied for loan

11-Gender : Customer Gender (Male/Female)

Loading the initial libraries

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly_express as px

**Loading the dataset** 

In [None]:
loan = pd.read_csv('../input/loandata/Loan payments data.csv')

Checking first 5 and last 5 records from the datasets

In [None]:
loan.head(5)

In [None]:
loan.tail(5)

Let's check the duplicate data in data set

In [None]:
loan.duplicated().sum()

 There are no duplicate values present.

In [None]:
loan.info()

In [None]:
loan.describe()

# Check Null Values

In [None]:
loan.isnull().sum()

found null values in paid off time and past due days columns. Need to remove Null values.

In [None]:
loan['past_due_days'] = loan['past_due_days'].fillna(0)

In [None]:
#Filling the empty values in 'paid_off_time' as '-1'
loan['paid_off_time'] = loan['paid_off_time'].fillna(-1)

In [None]:
loan.isnull().sum()

# Spelling Correction

In [None]:
#Changed the name of a value in 'education' column from 'Bechalor' to 'Bachelor'
loan['education']= loan['education'].replace('Bechalor','Bachelor')

In [None]:
loan.head()

# Checking Rows and Columns 

In [None]:
loan.shape

# Check Data Types of each columns

In [None]:
loan.dtypes

Convert Following columns to the Datetime Format

In [None]:
#Coverting the following columns to 'datetime'
loan['effective_date'] = pd.to_datetime(loan['effective_date'])
loan['due_date'] = pd.to_datetime(loan['due_date'])
loan['paid_off_time'] = pd.to_datetime(loan['paid_off_time']).dt.date
loan['paid_off_time'] = pd.to_datetime(loan['paid_off_time'])
loan.head()

In [None]:
loan.dtypes

# Exploratory Data Analysis - EDA

In [None]:
a = loan['loan_status'].value_counts()
pd.DataFrame(a)

**Observation:**

* We can see here out of 500 peoples 300 people repaid the full amount on time. collection shows 100 people not repaid the loan.Collection paid off shows 100 peoples repaid the loan but lately after due date.

In [None]:
plt.figure(figsize = [10,5])
plt.pie(loan['loan_status'].value_counts(),labels=loan['loan_status'].unique(),explode=[0,0.1,0],startangle=144,autopct='%1.f%%')
plt.title('Loan Status Distribution',fontsize = 20)
plt.show()

# Gender Analysis

In [None]:
b= loan['Gender'].value_counts()
pd.DataFrame(b)

**Obervation:**

* Out of 500 their are 423 males and 77 females present

In [None]:
c = loan.groupby(['Gender'])['loan_status'].value_counts()
pd.DataFrame(c)

In [None]:
plt.figure(figsize = [10,5])
sns.countplot(loan['Gender'],hue=loan['loan_status'],palette='rocket')
plt.legend(loc='upper right')
plt.title('Gender vs Loan Status',fontsize=20)
plt.xlabel('Gender', fontsize=16)
plt.ylabel('Count', fontsize=16)
plt.show()

**Observations:**

* Around 40% of male population have repaid their loan lately (or yet to pay)
 
* Around 30% of female population have repaid their loan lately (or yet to pay)

# Education Analysis

In [None]:
d = loan['education'].value_counts()
pd.DataFrame(d)

In [None]:
plt.figure(figsize = [10,5])
sns.countplot(loan['education'],hue=loan['loan_status'],palette='flare')
plt.legend(loc='upper right')
plt.title('Education vs Loan Status',fontsize=20)
plt.xlabel('Education', fontsize=16)
plt.ylabel('Count', fontsize=16)
plt.show()

**# Observations:**

* Majority of the loan takers are from High School or College background

* Very few people from Masters or above background took loan.

# Age Analysis

In [None]:
plt.figure(figsize = [14,5])
sns.countplot(loan['age'],hue=loan['loan_status'],palette='twilight')
plt.legend(loc='upper left')
plt.title('Age vs Loan Status',fontsize=20)
plt.xlabel('Age', fontsize=16)
plt.ylabel('Count', fontsize=16)
plt.show()

**Observations:**

*  Majority of the people who took loan have age ranging from 24 years to 38 years

# Principal Analysis

In [None]:
e = loan['Principal'].value_counts()
pd.DataFrame(e)

In [None]:
plt.figure(figsize = [10,5])
sns.countplot(loan['Principal'],hue=loan['loan_status'],palette='mako')
plt.legend(loc='upper left')
plt.title('Principal vs Loan Status',fontsize=20)
plt.xlabel('Principal', fontsize=16)
plt.ylabel('Count', fontsize=16)
plt.show()

**Observations:**

*Majority of the people have opted for Principal of  800  and  1000 

# Term Analysis

In [None]:
plt.figure(figsize = [10,5])
sns.countplot(loan['terms'],hue=loan['loan_status'],palette='YlGn')
plt.legend(loc='upper left')
plt.title('Terms vs Loan Status',fontsize=20)
plt.xlabel('Terms', fontsize=16)
plt.ylabel('Count', fontsize=16)
plt.show()

**Observations:**

*  Only few people have opted loan for 7 days term

*  Majority of the late payments are from people who have their loan terms as 15 days and 30 days 

# Loan Effective Date Analysis

In [None]:
g = loan.groupby(['effective_date'])['loan_status'].value_counts()
pd.DataFrame(g)

In [None]:
plt.figure(figsize = [10,5])
dates = loan['effective_date'].dt.date
sns.countplot(x=dates, hue=loan['loan_status'],palette='Blues')
plt.legend(loc='upper right')
plt.title('Effective Date vs Loan Status',fontsize=20)
plt.xlabel('Effective Date', fontsize=16)
plt.ylabel('Count', fontsize=16)
plt.show()

**Observations:**

* On 11th and 12th September, loan was given to many people maybe as part of a drive.


**Let see Data Distribution**

In [None]:
loan.hist(figsize = (11,11), color="#008080")

# Age vs Past Due Days

In [None]:
px.scatter(loan, x="age", y="past_due_days", size ="terms" ,color="loan_status",
           hover_data=['Gender','Principal'], log_x=True, size_max=8)

Observations:

* Most of the Elder people (35 - 50 years) have paid back loan on time.

In [None]:
correlation = loan[loan.columns].corr()
plt.figure(figsize=(12, 10))
plot = sns.heatmap(correlation, vmin = -1, vmax = 1,annot=True, annot_kws={"size": 10})
plot.set_xticklabels(plot.get_xticklabels(), rotation=30)

# Conclusion:

* 20% of the people have **not repaid** the loan 20% of the people have **repaid** the loan but lately after due date and **60%** of the   people have **repaid** the loan **on time** 
* Majority of the loan takers are from High School or College background.
* Majority of the people who took loan have age ranging from 24 years to 38 years.
* Majority of the people have opted for Principal of  $800  and  $1000 
* Majority of the late payments are from people who have their loan terms as 15 days and 30 days.
* Most of the Elder people (35 - 50 years) have paid back loan on time.

# Thank You!