<a href="https://colab.research.google.com/github/rpdieego/credit_risk_modeling/blob/master/Credit_Risk_Modeling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Credit Risk Modeling

## 1 - Context

## 2 - Data Understanding

### Data Dictionary

*   **addr_state** - The state provided by the borrower in the loan application;
*   **annual_inc** - The self-reported annual income provided by the borrower during registration;
*   **annual_inc_joint** - The combined self-reported annual income provided by the co-borrowers during registration;
*   **application_type** - Indicates whether the loan is an individual application or a joint application with two co-borrowers;
*   **collection_recovery_fee** - post charge off collection fee;
*   **collections_12_mths_ex_med** - Number of collections in 12 months excluding medical collections;
*   **delinq_2yrs** - The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years;
*   **desc** - Loan description provided by the borrower;
*   **dti** - A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income;
*   **dti_joint** - A ratio calculated using the co-borrowers' total monthly payments on the total debt obligations, excluding mortgages and the requested LC loan, divided by the co-borrowers' combined self-reported monthly income;
*   **earliest_cr_line** - The month the borrower's earliest reported credit line was opened;
*   **emp_length** - Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years. ;
*   **emp_title** - The job title supplied by the Borrower when applying for the loan.*
   *  *Employer Title replaces Employer Name for all loans listed after 9 / 23 / 2013*
*   **fico_range_high** - The upper boundary range the borrower’s FICO at loan origination belongs to;
*   **fico_range_low** - The lower boundary range the borrower’s FICO at loan origination belongs to;
*   **funded_amnt** - The total amount committed to that loan at that point in time;
*   **funded_amnt_inv** - The total amount committed by investors for that loan at that point in time;
*   **grade** - LC assigned loan grade;
*   **home_ownership** - The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.;
*   **id** - A unique LC assigned ID for the loan listing;
*   **initial_list_status** - The initial listing status of the loan;
   * Possible values are – **W**, **F**
*   **inq_last_6mths** - The number of inquiries in past 6 months (excluding auto and mortgage inquiries);
*   **installment** - The monthly payment owed by the borrower if the loan originates;
*   **int_rate** - Interest Rate on the loan;
*   **is_inc_v** - Indicates if income was verified by LC, not verified, or if the income source was verified;
*   **issue_d** - The month which the loan was funded;
*   **last_credit_pull_d** - The most recent month LC pulled credit for this loan;
*   **last_fico_range_high** - The upper boundary range the borrower’s last FICO pulled belongs to.;
*   **last_fico_range_low** - The lower boundary range the borrower’s last FICO pulled belongs to.
*   **last_pymnt_amnt** - Last total payment amount received;
*   **last_pymnt_d** - Last month payment was received;
*   **loan_amnt** - The listed amount of the loan applied for by the borrower. If at some point in time, the credit department reduces the loan amount, then it will be reflected in this value;
*   **loan_status** - Current status of the loan;
*   **member_id** - A unique LC assigned Id for the borrower member;
*   **mths_since_last_delinq** - The number of months since the borrower's last delinquency.;
*   **mths_since_last_major_derog** - Months since most recent 90-day or worse rating;
*   **mths_since_last_record** - The number of months since the last public record;
*   **next_pymnt_d** - Next scheduled payment date;
*   **open_acc** - The number of open credit lines in the borrower's credit file.;
*   **out_prncp** - Remaining outstanding principal for total amount funded;
*   **out_prncp_inv** - Remaining outstanding principal for portion of total amount funded by investors;
*   **policy_code** - 
   * 1: publicly available policy;
   * 2: new products not publicly available policy;
*   **pub_rec** - Number of derogatory public records;
*   **purpose** - A category provided by the borrower for the loan request;
*   **pymnt_plan** - Indicates if a payment plan has been put in place for the loan;
*   **recoveries** - post charge off gross recovery;
*   **revol_bal** - Total credit revolving balance;
*   **revol_util** - Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit;
*   **sub_grade** - LC assigned loan subgrade;
*   **term** - The number of payments on the loan. Values are in months and can be either 36 or 60;
*   **title** - The loan title provided by the borrower;
*   **total_acc** - The total number of credit lines currently in the borrower's credit file;
*   **total_pymnt** - Payments received to date for total amount funded;
*   **total_pymnt_inv** - Payments received to date for portion of total amount funded by investors;
*   **total_rec_int** - Interest received to date;
*   **total_rec_late_fee** - Late fees received to date;
*   **total_rec_prncp** - Principal received to date;
*   **url** - URL for the LC page with listing data;
*   **verified_status_joint** - Indicates if the co-borrowers' joint income was verified by LC, not verified, or if the income source was verified;
*   **zip_code** - The first 3 numbers of the zip code provided by the borrower in the loan application;
*   **open_acc_6m** - Number of open trades in last 6 months;
*   **open_il_6m** - Number of currently active installment trades;
*   **open_il_12m** - Number of installment accounts opened in past 12 months;
*   **open_il_24m** - Number of installment accounts opened in past 24 months;
*   **mths_since_rcnt_il** - Months since most recent installment accounts opened;
*   **total_bal_il** - Total current balance of all installment accounts;
*   **il_util** - Ratio of total current balance to high credit/credit limit on all install acct;
*   **open_rv_12m** - Number of revolving trades opened in past 12 months;
*   **open_il_24m** - Number of installment accounts opened in past 24 months;
*   **mths_since_rcnt_il** - Months since most recent installment accounts opened;
*   **total_bal_il** - Total current balance of all installment accounts;
*   **il_util** - Ratio of total current balance to high credit/credit limit on all install acct.
*   **open_rv_12m** - Number of revolving trades opened in past 12 months;
*   **open_rv_24m** - Number of revolving trades opened in past 24 months;
*   **max_bal_bc** - Maximum current balance owed on all revolving accounts;
*   **all_util** - Balance to credit limit on all trades;
*   **total_rev_hi_lim** - Total revolving high credit/credit limit;
*   **inq_fi** - Number of personal finance inquiries;
*   **total_cu_tl** - Number of finance trades;
*   **inq_last_12m** - Number of credit inquiries in past 12 months;
*   **acc_now_delinq** - The number of accounts on which the borrower is now delinquent;
*   **tot_coll_amt** - Total collection amounts ever owed;
*   **tot_cur_bal** - Total current balance of all accounts;







## 3 - Exploratory Data Analysis

In [13]:
import pandas as pd

In [4]:
# Code to read csv file into Colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [11]:
# Import creditcard.csv from Google Drive

link = 'https://drive.google.com/open?id=1zzsUZOgObyZ9l8-GzapqFyL9HIZeztcw'

fluff, id = link.split('=')
print (id) # Verify that we have everything after '='

downloaded = drive.CreateFile({'id':id}) 
downloaded.GetContentFile('loan_data_2007_2014.csv')  
lc_df = pd.read_csv('loan_data_2007_2014.csv')

1zzsUZOgObyZ9l8-GzapqFyL9HIZeztcw


  interactivity=interactivity, compiler=compiler, result=result)


In [12]:
lc_df.head()

Unnamed: 0.1,Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,last_credit_pull_d,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type,annual_inc_joint,dti_joint,verification_status_joint,acc_now_delinq,tot_coll_amt,tot_cur_bal,open_acc_6m,open_il_6m,open_il_12m,open_il_24m,mths_since_rcnt_il,total_bal_il,il_util,open_rv_12m,open_rv_24m,max_bal_bc,all_util,total_rev_hi_lim,inq_fi,total_cu_tl,inq_last_12m
0,0,1077501,1296599,5000,5000,4975.0,36 months,10.65,162.87,B,B2,,10+ years,RENT,24000.0,Verified,Dec-11,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/22/11 > I need to upgra...,credit_card,Computer,860xx,AZ,27.65,0.0,Jan-85,1.0,,,3.0,0.0,13648,83.7,9.0,f,0.0,0.0,5861.071414,5831.78,5000.0,861.07,0.0,0.0,0.0,Jan-15,171.62,,Jan-16,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,
1,1,1077430,1314167,2500,2500,2500.0,60 months,15.27,59.83,C,C4,Ryder,< 1 year,RENT,30000.0,Source Verified,Dec-11,Charged Off,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/22/11 > I plan to use t...,car,bike,309xx,GA,1.0,0.0,Apr-99,5.0,,,3.0,0.0,1687,9.4,4.0,f,0.0,0.0,1008.71,1008.71,456.46,435.17,0.0,117.08,1.11,Apr-13,119.66,,Sep-13,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,
2,2,1077175,1313524,2400,2400,2400.0,36 months,15.96,84.33,C,C5,,10+ years,RENT,12252.0,Not Verified,Dec-11,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,,small_business,real estate business,606xx,IL,8.72,0.0,Nov-01,2.0,,,2.0,0.0,2956,98.5,10.0,f,0.0,0.0,3003.653644,3003.65,2400.0,603.65,0.0,0.0,0.0,Jun-14,649.91,,Jan-16,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,
3,3,1076863,1277178,10000,10000,10000.0,36 months,13.49,339.31,C,C1,AIR RESOURCES BOARD,10+ years,RENT,49200.0,Source Verified,Dec-11,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/21/11 > to pay for prop...,other,personel,917xx,CA,20.0,0.0,Feb-96,1.0,35.0,,10.0,0.0,5598,21.0,37.0,f,0.0,0.0,12226.30221,12226.3,10000.0,2209.33,16.97,0.0,0.0,Jan-15,357.48,,Jan-15,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,
4,4,1075358,1311748,3000,3000,3000.0,60 months,12.69,67.79,B,B5,University Medical Group,1 year,RENT,80000.0,Source Verified,Dec-11,Current,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/21/11 > I plan on combi...,other,Personal,972xx,OR,17.94,0.0,Jan-96,0.0,38.0,,15.0,0.0,27783,53.9,38.0,f,766.9,766.9,3242.17,3242.17,2233.1,1009.07,0.0,0.0,0.0,Jan-16,67.79,Feb-16,Jan-16,0.0,,1,INDIVIDUAL,,,,0.0,,,,,,,,,,,,,,,,,


In [16]:
lc_df['fico_range_high'].unique()

KeyError: ignored