# Prosper Loan Data Analysis
## by John King

## Investigation Overview

> In this analysis, I investigate Prosper's credit rating model. I first draw correlation between key credit features and rating score, then evaluate each rating scale's performance in predicting risk, and finally investigate lesser weighted credit features as they relate to risk.

## Dataset Overview

> The dataset being used in this analysis is a very complete set with details on a large number of loans originated by Prosper. It includes information about borrowers such as borrower credit scrore, credit rating, income, etc. Loan status, loan category, loan amount, interest rate, yield and loss, are also included. The data set consists of appx 114,000 rows and 81 attribute columns. Each row represents a single loan, and attributes are provided for each loan where applicable. This provides a wealth of detail for each loan.

# Findings

## Credit features and their weight in determining Prosper Rating and Prosper Score

> We observe a limited set of credit features, and find that credit score and available credit have fairly strong positive correlations to Prosper Rating and Score, while debt to income ratio, deliquencies and credit utilization have negative correlations. Surprisingly, employment status duration and stated monthly income have very little to no correlation.  

![corr_matrix.png](attachment:corr_matrix.png)

## Performance of Prosper Rating and Prosper Score in predicting charge-off, default and delinquency

> Prosper Rating displays a fairly linear relationship to rates of chargeoff, default and deliquency, while Prosper Score displays some unexepected results in mid-tier ratings.  

![prosper_rating.png](attachment:prosper_rating.png)


![prosper_score.png](attachment:prosper_score.png)

## Prosper Rating is more determinitive of borrower APR

> We find that Prosper Rating has almost perfect correlation to borrower APR with a correlation coefficient of .96, while Prosper Score has a weaker correlation at .67.

![rating_corr.png](attachment:rating_corr.png)

## Employment Status Duration and Stated Monthly Income can help predict chargeoff

> Visualizing chargeoffs in relation to these two lesser-weighted features reveals a trend: most chargeoffs occur on loans disbursed to borrowers with employment status duration less than 75 months, and stated monthly income below 5000.

![esd_scatter.png](attachment:esd_scatter.png)

## Segmentation reveals contradiction

> If we segment borrowers at two extremes of employment status duration, and incorporate stated monthly income and Prosper Rating into the segmentation, we find its possible that a lower risk borrower can be rated lower than a higher risk borrower and therefore pay a substantially higher APR.  

![d_e_apr.png](attachment:d_e_apr.png)

![aa_a_apr.png](attachment:aa_a_apr.png)

## Segmentation Findings

Our first segmentation compares D rated borrowers to E rated borrowers. Here is what we see:

Rating D, ESD < 75, SMI < 5000, chargeoff rate: 12.9%, mean APR: 28.2%  
Rating E, ESD > 200, SMI > 5000, chargeoff rate: 7.8%, mean APR: 32.9%

The borrowers in the lower segment are 38% less likely to have their loans charged off, yet they are rated lower and therefore pay a higher rate. In this instance, **a borrower with 40% less risk is paying an APR 17% higher.**

We then compare AA rated borrowers with A rated borrowers:

Rating AA, ESD < 75, SMI < 5000, chargeoff rate: 1.7%, mean APR: 9%  
Rating A, ESD > 200, SMI > 5000, chargeoff rate: 1.6%, mean APR: 13.9%

The borrowers in the lower segment are 6% less likely to have their loans charged off, yet they are rated lower and therefore pay a higher rate. In this instance, **a borrower with 6% less risk is paying an APR 54% higher**

## The ESD > 200 SMI > 5000 segmentation is not a trivial population
This segment of borrowers represents 7.2% of the overall borrower population

## Conclusion
It seems that employment status duration and stated monthly income are features that should be more heavily weighted in Prosper's rating models. As the segmentation explored earlier represents a non-trivial portion of the overall borrower population, adjusting for ESD and SMI might allow for more competitive rate offers for qualified borrowers and a more accurate rating system overall. Additional credit features should also be investigated for correlation to Prosper Rating and relationship to borrower risk.