# CAN I GET A LITTLE CREDIT? 
### An Exploration of Credit Worthiness
### Using Classification to Predict Serious Repayment Delinquency

-----

**Personal Project & Final Report Created By:** Rachel Robbins-Mayhill | April 27, 2022

---

<img src='loan_risk.png' width="1500" height="500" align="center"/>

In [1]:
# Import for data manipulation
import pandas as pd
import numpy as np

# Import for data viz
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.ticker import StrMethodFormatter
import squarify

# Import for Hypothesis Testing
import scipy.stats as stats

# Import for acquisition
import env
import os
import wrangle

# Add for setting to see all rows and columns
pd.options.display.max_rows = None
pd.options.display.max_columns = None

# Import to ignore warnings
import warnings
warnings.filterwarnings('ignore')

## PROJECT DESCRIPTION

Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit. 

Credit scoring algorithms, which make a guess at the probability of default, are the methods banks use to determine whether or not a loan should be granted. This project aims to improve upon the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years.

I am interested in this project because identifying at-risk populations helps to protect the consumer, the business, the market, and society as a whole. Identifying customers who are at risk of default helps to prevent the consumer from entering into a situation that could be harmful to their long-term financial stability. It helps the banking institution prevent significant and costly losses which could impact business sustainability and limit the potential to help others. Lastly, as we saw with the housing crisis of 2008, accurately identifying at-risk loan applicants can prevent the destabilizing of the market which can have far-reaching consequences for society as a whole.

## EXECUTIVE SUMMARY

## PROJECT GOAL

The goal of this project is to build a model that borrowers can use to help make the best financial decisions.

## INITIAL QUESTIONS:
Data-Focused Questions

1. Are applicants in certain age groups more likely to be seriously delinquent?
2. Are applicants with higher debt to income ratio more likely to be seriously delinquent?|


================================================================================

## I. ACQUIRE

The data for this report was acquired by accessing 'client_data.csv' from the Codeup SQL database. The following query was used to acquire the data:

In [2]:
df = wrangle.get_client_data()

Reading from .csv file.
Data acquisition complete.


In [3]:
df.shape

(150000, 12)

## The Original DataFrame Size: 150,000 rows and 12 columns.

The data acquisition for this project requires a .csv file of the data to be saved locally. The .csv file can be found in the 'Personal Project' repository on GitHub. Once the .csv file has been saved locally, the data can be accessed using the following function saved within the wrangle.py file inside the 'Personal Project' repository on GitHub:

get_client_data

This function acquires data from a local .csv file and returns a dataframe using pandas. The function informs the user of completion.

===================================================================================================================================

## II. PREPARE

After data acquisition, the table was analyzed and cleaned to facilitate functional exploration, clarify confusion, and standardize datatypes.

The preparation of this data can be replicated using the following function saved within the wrangle.py file inside the 'mwb-rrm-codeup-time-series-project' repository on GitHub.

prep_superstore
The function takes in the original superstore dataframe and returns it with the changes noted below.

Steps Taken to Clean & Prepare Data:

- Delete "Unnamed' index
- Rename columns for understanding, while making lowercase
- Drop missing values (29_731 in monthly_income and 3_924 in quantity_dep)
- Deal with outliers - only if they impact exploration or modeling

---

### Results of Data Preparation

In [4]:
# apply the data preparation observations and tasks to clean the data using the wrangle_client function found in the wrangle.py
df = wrangle.wrangle_client(df)
df.head()

Unnamed: 0,serious_delinquency,revolv_unsec_utilization,age,quantity_30_59_pd,debt_to_income_ratio,monthly_income,quantity_loans_and_lines,quantity_90_days_pd,quantity_real_estate_loans,quantity_60_89_days_pd,quantity_dependents,age_bins,quantity_dependents_bins
0,1,0.766127,45,2,0.802982,9120.0,13,0,6,0,2.0,age_40-49,1_2_dep
1,0,0.957151,40,0,0.121876,2600.0,4,0,0,0,1.0,age_40-49,1_2_dep
2,0,0.65818,38,1,0.085113,3042.0,2,1,0,0,0.0,age_30-39,0_dep
3,0,0.23381,30,0,0.03605,3300.0,5,0,0,0,0.0,age_30-39,0_dep
4,0,0.907239,49,1,0.024926,63588.0,7,0,1,0,0.0,age_40-49,0_dep


In [5]:
df.shape

(120269, 13)

## Prepared DataFrame Size: 120,269 rows, 11 columns

----

### PREPARE - SPLIT

In [6]:
# split the data into train, validate, and test using the split_data function found in the wrangle.py
train, validate, test = wrangle.split_data(df)

train -> (67350, 13)
validate -> (28865, 13)
test -> (24054, 13)


===================================================================================================================================

## III. EXPLORE

### EXPLORE QUESTIONS

---

### EXPLORE - Univariate

### EXPLORE - Biivariate

### EXPLORE - Multivariate

### EXPLORATION SUMMARY

===================================================================================================================================

## IV. MODEL

### MODEL - SCALE

===================================================================================================================================

## V. CONCLUSION

### RECOMMENDATIONS

### NEXT STEPS