# Table of Contents
 
1. Importing Libraries
2. Dataset Description
3. Importing the Dataset
4. Exploratory Data Analysis

## 1. Importing Libraries

In [3]:
# data processing
import pandas as pd
import numpy as np 

# visualization
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns


## 2. Dataset Description

- person_age - Age of the borrower applying for a loan
- person_income - Annual income of the borrower applying for a loan
- person_home_ownership
    - OWN - They own a home
    - RENT - They rent a home
    - MORTGAGE - They have a mortgage on the home they own
    - OTHER - Other categories of home ownership
- person_emp_length - Employment length of borrower in years
- loan_intent - What the borrower intends to use the loan for
- loan_grade 
    - A - The borrower has a high creditworthiness, indicating low risk.
    - B - The borrower is relatively low-risk, but not as creditworthy as Grade A.
    - C - The borrower's creditworthiness is moderate.
    - D - The borrower is considered to have higher risk compared to previous grades.
    - E - The borrower's creditworthiness is lower, indicating a higher risk.
    - F - The borrower poses a significant credit risk.
    - G - The borrower's creditworthiness is the lowest, signifying the highest risk.
- loan_amnt - The loan amount the borrower is requesting
- loan_int_rate - The loan interest rate
- loan_status 	
    - 0 is non-default (The borrower paid every loan payment on time, so no default was indicated)
    - 1 is default (The borrower failed to pay according to the agreed terms, so they defaulted on the loan)
- loan_percent_income - Percentage of income from the loan
- cb_person_default_on_file
    - Y - The borrower has a histroy of defaulting on their loans
    - N - The borrower does not have a history of defaulting on their loans
- cb_preson_cred_hist_length - Crdit history of the borrower

## 3. Importing the Dataset

In [4]:
# Reading the data and printing a sample

data = pd.read_csv("credit_risk_dataset.csv")

data.sample(5)

Unnamed: 0,person_age,person_income,person_home_ownership,person_emp_length,loan_intent,loan_grade,loan_amnt,loan_int_rate,loan_status,loan_percent_income,cb_person_default_on_file,cb_person_cred_hist_length
26995,30,120000,MORTGAGE,6.0,MEDICAL,A,9600,7.66,0,0.08,N,9
2931,24,115154,RENT,8.0,HOMEIMPROVEMENT,C,3000,14.27,0,0.03,N,2
22298,28,54000,MORTGAGE,11.0,HOMEIMPROVEMENT,A,1200,7.51,0,0.02,N,7
23158,29,54000,MORTGAGE,1.0,PERSONAL,G,25000,20.16,1,0.46,Y,8
11925,23,33000,RENT,0.0,DEBTCONSOLIDATION,A,11000,7.66,1,0.33,N,2


## 4. Exploratory Data Analysis

### Key Questions

1. What is the distribution of key numerical variables?
    - How are age, income, employment length, loan amount, and interest rate distributed? Are there any anomalies or outliers?

2. What are the characteristics of categorical variables?
    - How do borrowers distribute across home ownership statuses, loan intents, loan grades, and credit history?

3. Is there a relationship between borrower characteristics and loan default rates?
    - Does loan default correlate with specific borrower demographics (age, income level) or loan features (amount, interest rate, grade)?

4. How does the loan default rate vary by categorical features?
    - Are there significant differences in default rates across different home ownership statuses, loan purposes, or loan grades?

5. Are there any missing values or anomalies that need to be addressed?
    - Which variables have missing data, and how might these gaps impact the analysis and model performance?
    - Are there any data inconsistencies or outliers that suggest data entry errors or require special handling?

6. What insights can be drawn from the relationship between loan amount, income, and default?
    - Is there a pattern indicating that a higher loan amount relative to borrower income correlates with a higher likelihood of default?

7. How do loan interest rates affect the probability of default?
    - Does a higher interest rate correlate with a higher default rate, possibly indicating riskier loans?

8. What is the impact of employment length on default rates?
    - Does longer employment length correlate with lower default rates, suggesting more stable borrower profiles?

9. How does the length of credit history affect default probabilities?
    - Do borrowers with longer credit histories have lower default rates, indicating more financial stability?

10. Are there any notable patterns or trends when combining multiple features?
    - For example, does the combination of high loan amounts, higher interest rates, and short employment length significantly increase default risk?

In [11]:
data.describe()

Unnamed: 0,person_age,person_income,person_emp_length,loan_amnt,loan_int_rate,loan_status,loan_percent_income,cb_person_cred_hist_length
count,32581.0,32581.0,31686.0,32581.0,29465.0,32581.0,32581.0,32581.0
mean,27.7346,66074.85,4.789686,9589.371106,11.011695,0.218164,0.170203,5.804211
std,6.348078,61983.12,4.14263,6322.086646,3.240459,0.413006,0.106782,4.055001
min,20.0,4000.0,0.0,500.0,5.42,0.0,0.0,2.0
25%,23.0,38500.0,2.0,5000.0,7.9,0.0,0.09,3.0
50%,26.0,55000.0,4.0,8000.0,10.99,0.0,0.15,4.0
75%,30.0,79200.0,7.0,12200.0,13.47,0.0,0.23,8.0
max,144.0,6000000.0,123.0,35000.0,23.22,1.0,0.83,30.0
