# Part II - Effects of loan attributes on Borrower APR and Original Loan Amount
## by (Naomi Kamweru)





## Investigation Overview


> In the investigation, I look at some of the loan attributes that can affect the Borrower APR and the Original Loan Amount. My main focus being Credit Grade, Prosper Ratings, Income Range and Occupation.
 


## Dataset Overview

> This dataset consists of information on 113937 prosper loan listings 
with 81 variables including Borrower APR, Credit Grade, Prosper Rating 
and Occupation among many others.

In [1]:
# import all packages and set plots to be embedded inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

%matplotlib inline

# suppress warnings from final output
import warnings
warnings.simplefilter("ignore")

In [2]:
# load in the dataset into a pandas dataframe
loans = pd.read_csv('prosperLoanData.csv')

> Note that the above cells have been set as "Skip"-type slides. That means
that when the notebook is rendered as http slides, those cells won't show up.

## Distribution of occupations

> There are many different occupations in the dataset, but from the visualization, most borrowers are Computer Programmers, Executives or Teachers.

In [3]:
# Define a function to add the x and y labels and title to a plot
def addDesc(x,y,t):
    plt.xlabel(x)
    plt.ylabel(y)
    plt.title(t)

In [4]:
# Here I create a new series showing the original loan amount
# grouped by the occupation
occ_amount = loans.groupby('Occupation')['LoanOriginalAmount'].mean()

In [None]:
# A plot to show how the original loan amount differs according to the borrower's occupation
occ_amount.plot(kind='bar', figsize=(20,20))
addDesc('Occupation','Original Loan Amount','Occupation vs Original Loan Ammount')


## Distributions of Credit Grades

> Only a few borrowers were not graded: that is count of NC

In [None]:
# PLot of the credit grade distribution
loans['CreditGrade'].value_counts().plot(kind='bar')
plt.xticks(rotation = 0)
addDesc('Credit Grade','Credit Grade Count','Distribution of Credit Grade')



## Original loan amount by Income Range and Credit Grade
> The outliers for the listings graded as low risk have more outliers than the ones listed as high risk regardless of the income range, however, the outliers for the borrowers who are not employed are way less than the rest.



In [None]:
# Here I create a facet grid showing how the original loan amount has changed
# per credit grade for the different income ranges
g = sb.FacetGrid(loans, col='IncomeRange', col_wrap=3)
g.map(sb.stripplot, 'CreditGrade','LoanOriginalAmount', jitter=0.35,color='steelblue', s=3)

### Generate Slideshow
> The command below generated a slideshow

In [None]:
# Use this command if you are running this file in local
!jupyter nbconvert Part_II_slide_deck_template.ipynb --to slides --post serve --no-input --no-prompt

> In the classroom workspace, the generated HTML slideshow will be placed in the home folder. 

> In local machines, the command above should open a tab in your web browser where you can scroll through your presentation. Sub-slides can be accessed by pressing 'down' when viewing its parent slide. Make sure you remove all of the quote-formatted guide notes like this one before you finish your presentation! At last, you can stop the Kernel. 