# Are religious Americans more likely to repay loans?
The recent rise of peer-to-peer lending has been stunning. Starting from zero a few years ago, platforms like Lending Club are already placing $10 billion of unsecured personal loans per year. 

The concept is simple: a borrower fills in an application, investors choose to fund part or all of the loan, and the platform takes care of fine details, like collecting payments and annoying late borrowers. In theory, this disintermediation of banks from consumer credit lowers borrowing rates, improves investor returns, and leaves a healthy margin for the platform's profits.

The platforms publish anonymized default history from previous loans. This data helps investors make smarter decisions about future loans - attracting more investment dollars. Investors use the data to try to pick loans with a high interest rate, but a lower likelihood of default.

## Data set
For now, I will use only 36 month loans originating before 2013 (maturing before 2016). For now, I will restrict my data set to 2012 (as we go back further, the underwriting standards change). If I fill rows to handle current loans (see below), I will bring in more recent data.

## Predictive factors
Many factors look to be of interest. Obvious ones include:
- FICO Score
- Debt to Income ratio
- Size of loan

Additional factors that people have provided some evidence for:
- Loan description length
  - Long descriptions are bad: http://drjasondavis.com/blog/2012/04/08/lending-club-loan-analysis-making-money-with-logistic-regression
  - Long descriptions are good: https://lendingclubmodeling.wordpress.com/2011/04/25/why-loan-descriptions-and-qa-matter/
- Loan purpose: http://blog.lendingrobot.com/research/predicting-the-number-of-payments-in-peer-lending/

Other useful articles:
- http://blog.lendingrobot.com/research/predicting-returns-for-ongoing-loans/
- http://kldavenport.com/gradient-boosting-analysis-of-lendingclubs-data/

## Analysis

### Predicted variable
We're ultimately concerned with risk-adjusted return. However, analysis is simpler with certain assumptions. These may be revisited in a future analysis:
- Risk-adjustment can be handled separately
- Return = interest rate - probability of default * loss given default
- PD follows the same hazard curve, regardless of risk grade or other factors
- LGD is the same for different risk grades and only depends on lateness severity

Therefore, we can focus on PD only.

Explore:
- shape of default curve at different risk grades
- chance of late payments developing into default at different grades
- recovery amounts at different grades
- do late fees help out?

## Conclusion

### Caveats
This study has focused on a methodology for answering sociological questions and predicting defaults. Other issues to resolve for successful investing in P2P include:
- API access: smart investors, such as hedge funds are rumored to quickly cream off the best loans. Any superior statistical view must also compete on execution speed.
- Returns should be risk-adjusted for proper comparison to alternative investments, such as the S&P 500. An examination of similar grades of unsecuritized personal loans (credit cards) across the entire credit cycle would help capture both beta and true risk of default.
- Lending Club uses NAR (Net Annualized Return, a non-standard method that assumes all current loans will pay 100%) http://danielodio.com/one-year-in-lending-money-to-complete-strangers-via-an-api-and-why-you-should-try-it
- Investment size: what is the proper amount to invest in each loan to achieve the Kelly Criterion?
- Loans do not receive capital gains treatment and are taxed as ordinary income. 

## Load data from Lending Club
Lending Club provides information on all past applications (and current loan status)

## Input files
- **LoanStats3a.csv.zip:** 2007-2011 Lending Club default history from https://www.lendingclub.com/info/download-data.action
- **LoanStats3b.csv.zip:** 2012-2013 Lending Club default history
- **LoanStats3c.csv.zip:** 2013-2014 Lending Club default history (currently reading this one only)
- **LoanStats3d.csv.zip:** 2015-2015 Lending Club default history
  - According to http://www.lendingmemo.com/lendingrobot-investing-review/, lending standards changed drastically starting in 2013
- **ZIP_COUNTY_122015.xlsx:** list of zip codes and county codes
- **national_county.txt:** list of states and counties mapped to code from US Census
- **2010.csv:** list of US counties, population, congregations, and adherents from http://www.rcms2010.org/ 
  - This website also has info breaking down into specific religion
- **US_elect_county:** 2012 presidential elections by county from http://www.theguardian.com/news/datablog/2012/nov/07/us-2012-election-county-results-download

In [10]:
import pandas as pd
pd.set_option('display.max_columns', 100)
%matplotlib inline
# 2012-2013 loan data - files are available from deeper history
# had to unzip file first
past_loans = pd.read_csv('LoanStats3b.csv', header = 1, index_col=0)
past_loans.head(100)

Unnamed: 0_level_0,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,last_credit_pull_d,collections_12_mths_ex_med,mths_since_last_major_derog,...,open_il_12m,open_il_24m,mths_since_rcnt_il,total_bal_il,il_util,open_rv_12m,open_rv_24m,max_bal_bc,all_util,total_rev_hi_lim,inq_fi,total_cu_tl,inq_last_12m,acc_open_past_24mths,avg_cur_bal,bc_open_to_buy,bc_util,chargeoff_within_12_mths,delinq_amnt,mo_sin_old_il_acct,mo_sin_old_rev_tl_op,mo_sin_rcnt_rev_tl_op,mo_sin_rcnt_tl,mort_acc,mths_since_recent_bc,mths_since_recent_bc_dlq,mths_since_recent_inq,mths_since_recent_revol_delinq,num_accts_ever_120_pd,num_actv_bc_tl,num_actv_rev_tl,num_bc_sats,num_bc_tl,num_il_tl,num_op_rev_tl,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1
10179520,12031088,3000,3000,3000,36 months,12.85%,100.87,B,B4,Auditor,10+ years,RENT,25000,Verified,Dec-2013,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,debt,322xx,FL,24.68,0,May-1991,0,58,53,5,2,2875,54.2%,26,f,0.00,0.00,3181.548905,3181.55,3000.00,181.55,0.00,0,0,Jul-2014,2677.23,,Jan-2016,0,69,...,,,,,,,,,,5300,,,,3,3906,2050,52.3,0,0,164,271,7,7,6,14,69,8,69,1,2,3,3,6,11,4,9,3,5,0,0,0,1,91.3,66.7,2,0,32082,19530,4300,26782
10129403,11981032,7550,7550,7550,36 months,16.24%,266.34,C,C5,Special Order Fulfillment Clerk,3 years,RENT,28000,Not Verified,Dec-2013,Current,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Debt consolidation,951xx,CA,8.40,0,Oct-2010,0,,,4,0,5759,72%,5,w,2704.82,2704.82,6673.720000,6673.72,4845.18,1828.54,0.00,0,0,Feb-2016,266.34,Mar-2016,Feb-2016,0,,...,,,,,,,,,,8000,,,,1,1440,160,96.0,0,0,,38,17,17,0,17,,17,,0,2,4,2,2,0,4,5,4,4,0,0,0,0,100.0,100.0,0,0,8000,5759,4000,0
10129506,11981122,20800,20800,20800,36 months,13.53%,706.16,B,B5,Operations Manager,10+ years,RENT,81500,Verified,Dec-2013,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/31/13 > My goal is to p...,debt_consolidation,Reducing Debt to Purchase Home,100xx,NY,16.73,0,Jun-1998,2,64,,29,0,23473,54.5%,41,f,0.00,0.00,23926.640008,23926.64,20800.00,3126.64,0.00,0,0,May-2015,13334.93,,Feb-2016,0,71,...,,,,,,,,,,43100,,,,9,869,6811,54.6,0,0,115,186,0,0,0,0,70,0,70,1,8,24,11,17,1,29,40,24,29,0,0,0,3,90.2,50.0,0,0,43100,23473,15000,0
10159548,12011167,15000,15000,15000,36 months,8.90%,476.30,A,A5,aircraft maintenance engineer,2 years,MORTGAGE,63000,Not Verified,Dec-2013,Current,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/31/13 > To pay Home Dep...,debt_consolidation,Pay off,334xx,FL,16.51,0,Mar-1998,0,34,,8,0,11431,74.2%,29,w,5013.38,5013.38,11907.500000,11907.50,9986.62,1920.88,0.00,0,0,Feb-2016,476.30,Apr-2016,Feb-2016,0,34,...,,,,,,,,,,15400,,,,3,38927,2969,79.1,0,0,147,189,24,13,4,24,75,12,75,3,3,4,3,10,8,6,17,4,8,0,0,0,0,89.3,66.7,0,0,288195,39448,14200,33895
10159584,12011200,9750,9750,9750,36 months,13.98%,333.14,C,C1,Medical Assistant,1 year,RENT,26000,Not Verified,Dec-2013,Current,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/31/13 > While being in ...,debt_consolidation,Debt Consilation,927xx,CA,25.12,0,Jan-2007,0,,,12,0,7967,52.8%,28,f,3421.05,3421.05,8327.970000,8327.97,6328.95,1999.02,0.00,0,0,Feb-2016,333.14,Apr-2016,Feb-2016,0,,...,,,,,,,,,,15100,,,,2,1177,1752,75.7,0,0,67,83,12,12,0,12,,20,,0,6,7,6,11,8,9,20,7,12,0,0,0,2,100.0,66.7,0,0,21314,14123,7200,6214
10159498,1319523,12000,12000,12000,36 months,6.62%,368.45,A,A2,MANAGER INFORMATION DELIVERY,10+ years,MORTGAGE,105000,Not Verified,Dec-2013,Current,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,UNIVERSAL CARD,060xx,CT,14.05,0,Mar-1994,1,43,,12,0,13168,21.6%,22,w,3921.71,3921.71,9211.250000,9211.25,8078.29,1132.96,0.00,0,0,Feb-2016,368.45,Apr-2016,Feb-2016,0,,...,,,,,,,,,,61100,,,,4,26765,39432,25.0,0,0,146,237,20,3,4,20,,3,43,0,2,2,5,5,9,8,9,2,12,0,0,0,2,95.5,0.0,0,0,333044,42603,52600,42769
10159611,12011228,10000,10000,10000,36 months,9.67%,321.13,B,B1,Registered Nurse,7 years,MORTGAGE,102000,Not Verified,Dec-2013,Current,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Clean Up,027xx,MA,15.55,2,Oct-1989,0,11,,9,0,9912,44.4%,22,f,3367.47,3367.47,8027.940000,8027.94,6632.53,1395.41,0.00,0,0,Feb-2016,321.13,Apr-2016,Feb-2016,0,54,...,,,,,,,,,,22300,,,,3,4349,973,89.4,0,0,243,290,23,8,0,25,11,8,11,1,3,4,3,6,9,6,13,4,9,0,0,0,1,77.3,66.7,0,0,58486,39143,9200,36186
10149342,12000897,27050,27050,27050,36 months,10.99%,885.46,B,B2,Team Leadern Customer Ops & Systems,10+ years,OWN,55000,Verified,Dec-2013,Current,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/31/13 > Combining high ...,debt_consolidation,Debt Consolidation,481xx,MI,22.87,0,Oct-1986,0,,,14,0,36638,61.2%,27,w,9225.20,9225.20,22136.500000,22136.50,17824.80,4311.70,0.00,0,0,Feb-2016,885.46,Apr-2016,Dec-2015,0,,...,,,,,,,,,,59900,,,,3,9570,16473,53.9,0,0,117,326,16,6,4,16,,8,,0,2,4,4,8,8,10,15,4,14,0,0,0,1,100.0,25.0,0,0,138554,70186,35700,33054
10119623,11971241,12000,12000,12000,36 months,11.99%,398.52,B,B3,LTC,10+ years,MORTGAGE,130000,Source Verified,Dec-2013,Current,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Debt consolidation,809xx,CO,13.03,0,Nov-1997,1,,,9,0,10805,67%,19,f,4131.75,4131.75,9962.920000,9962.92,7868.25,2094.67,0.00,0,0,Feb-2016,398.52,Apr-2016,Feb-2016,0,,...,,,,,,,,,,16200,,,,4,36362,3567,93.0,0,0,173,193,4,4,3,85,,4,,0,3,5,4,4,8,5,8,5,9,,0,0,3,100.0,1.0,0,0,365874,44327,10700,57674
10129477,11981093,14000,14000,14000,36 months,12.85%,470.71,B,B4,Assistant Director - Human Resources,4 years,RENT,88000,Not Verified,Dec-2013,Current,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Debt consolidation,282xx,NC,10.02,1,Jun-1988,0,16,115,6,1,3686,81.9%,14,f,4859.87,4859.87,11767.650000,11767.65,9140.13,2627.52,0.00,0,0,Feb-2016,470.71,Apr-2016,Feb-2016,0,,...,,,,,,,,,,4500,,,,3,2945,480,87.7,0,0,111,103,24,13,0,38,16,,16,0,3,4,3,9,3,4,10,4,6,0,0,0,0,78.6,100.0,1,0,31840,17672,3900,27340


In [19]:
import pandas as pd
pd.set_option('display.max_columns', 100)
%matplotlib inline
# 2012-2013 loan data - files are available from deeper history
# had to unzip file and remove first row with legal info
past_loans = pd.read_csv('LoanStats3b.csv', header = 1, index_col=0)
past_loans['term'] = past_loans['term'].astype('category')
past_loans['grade'] = past_loans['grade'].astype('category')
past_loans['sub_grade'] = past_loans['sub_grade'].astype('category')
past_loans['home_ownership'] = past_loans['home_ownership'].astype('category')
past_loans['verification_status'] = past_loans['verification_status'].astype('category')

# Making assumption that any currently late loans are bad; may need to revisit
# Assumptions from Lending Club https://www.lendingclub.com/account/investorReturnsAdjustments.action
# Current 0% loss
# In Grace Period 28% loss
# Late 16-30 days 58% loss
# Late 31-120 days 74% loss
# Default 89% loss
di = {'Current': 'Good', 'Fully Paid': 'Good', 'Charged Off': 'Bad', 'Late (31-120 days)': 'Bad',
       'In Grace Period': 'Bad', 'Late (16-30 days)': 'Bad', 'Default': 'Bad'}
# past_loans.replace({'loan_status': di}, inplace=True)
past_loans['loan_status'] = past_loans['loan_status'].astype('category')
past_loans['pymnt_plan'] = past_loans['pymnt_plan'].astype('category')
past_loans['purpose'] = past_loans['purpose'].astype('category')
past_loans['application_type'] = past_loans['application_type'].astype('category')

# convert dates
past_loans['issue_d'] = pd.to_datetime(past_loans['issue_d'])
past_loans['earliest_cr_line'] = pd.to_datetime(past_loans['earliest_cr_line'])
past_loans['last_pymnt_d'] = pd.to_datetime(past_loans['last_pymnt_d'])
past_loans['next_pymnt_d'] = pd.to_datetime(past_loans['next_pymnt_d'])
past_loans['last_credit_pull_d'] = pd.to_datetime(past_loans['last_credit_pull_d'])
past_loans['issue_d'] = pd.to_datetime(past_loans['issue_d'])

# convert floats
past_loans['int_rate'] = pd.to_numeric(past_loans['int_rate'].str.extract('(.*)%'))/100

# drop pointless data
past_loans=past_loans.dropna(axis=1,how='all')
past_loans.drop('application_type',axis = 1) # only contains "individual"

# drop data that's not 2012 origination and 36 months term
past_loans = past_loans.drop(past_loans[(past_loans.issue_d >= '2013-01-01') | (past_loans.term == '60 months')].index)


past_loans.head(100)

Unnamed: 0_level_0,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,last_credit_pull_d,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type,acc_now_delinq,tot_coll_amt,tot_cur_bal,total_rev_hi_lim,acc_open_past_24mths,avg_cur_bal,bc_open_to_buy,bc_util,chargeoff_within_12_mths,delinq_amnt,mo_sin_old_il_acct,mo_sin_old_rev_tl_op,mo_sin_rcnt_rev_tl_op,mo_sin_rcnt_tl,mort_acc,mths_since_recent_bc,mths_since_recent_bc_dlq,mths_since_recent_inq,mths_since_recent_revol_delinq,num_accts_ever_120_pd,num_actv_bc_tl,num_actv_rev_tl,num_bc_sats,num_bc_tl,num_il_tl,num_op_rev_tl,num_rev_accts,num_rev_tl_bal_gt_0,num_sats,num_tl_120dpd_2m,num_tl_30dpd,num_tl_90g_dpd_24m,num_tl_op_past_12m,pct_tl_nvr_dlq,percent_bc_gt_75,pub_rec_bankruptcies,tax_liens,tot_hi_cred_lim,total_bal_ex_mort,total_bc_limit,total_il_high_credit_limit
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1
2828755,3411415,2000,2000,2000,36 months,0.1727,71.58,C,C5,University of Phoenix,2 years,RENT,26000.0,Not Verified,2012-12-01,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Consol,970xx,OR,25.62,0,2005-03-01,1,70,,11,0,7354,75%,13,w,0.00,0.00,2576.622657,2576.62,2000.00,576.62,0.00,0.00,0.0000,2016-01-01,71.32,NaT,2016-01-01,0,70,1,INDIVIDUAL,0,0,36339,9800,8,3634,1933,78.8,0,0,23,95,5,5,0,5,70,5,70,0,4,5,6,8,4,7,9,5,11,0,0,0,2,92,40.0,0,0,42364,36339,9100,32564
2828209,3410838,7750,7750,7750,36 months,0.1311,261.54,B,B4,city of branson,10+ years,MORTGAGE,39500.0,Verified,2012-12-01,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/27/12 > My loan is for ...,debt_consolidation,consolidation loan,656xx,MO,34.24,1,1993-08-01,2,6,,11,0,8787,38.5%,38,w,0.00,0.00,9415.390000,9415.39,7750.00,1665.39,0.00,0.00,0.0000,2016-01-01,261.49,NaT,2016-02-01,0,24,1,INDIVIDUAL,0,0,197956,22800,2,19796,1745,77.0,0,0,236,144,19,19,4,68,,1,,1,1,5,2,4,21,7,13,5,11,0,0,0,0,92,100.0,0,0,212574,82047,7600,69774
2634739,3176905,4500,4500,4500,36 months,0.1905,165.07,D,D4,henry Schein Inc,8 years,RENT,55000.0,Not Verified,2012-12-01,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,,moving,right track 2,115xx,NY,26.31,0,1998-10-01,0,,99,6,1,2686,95.9%,21,w,0.00,0.00,5306.001464,5306.00,4500.00,806.00,0.00,0.00,0.0000,2014-02-01,3327.31,NaT,2016-02-01,0,,1,INDIVIDUAL,0,0,33098,2800,3,5516,114,95.9,0,0,91,173,58,6,0,66,,,,0,2,2,2,2,14,2,7,2,6,0,0,0,1,100,100.0,1,0,59285,33098,2800,56485
2837824,3420387,20850,20850,20850,60 months,0.1777,526.85,D,D1,Sentara Healthcare,6 years,MORTGAGE,143784.0,Source Verified,2012-12-01,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/27/12 > This loan would...,debt_consolidation,Debt Consolidation,234xx,VA,24.20,0,1997-12-01,0,58,,27,0,23066,97.7%,56,w,0.00,0.00,24597.690000,24597.69,20850.00,3747.69,0.00,0.00,0.0000,2014-02-01,18275.49,NaT,2016-02-01,0,,1,INDIVIDUAL,0,0,534695,23600,13,20565,498,96.1,0,0,138,183,7,2,4,7,,7,,0,4,9,4,5,41,10,11,9,27,0,0,0,9,98,100.0,0,0,557228,197634,12900,191628
2837644,3420200,12000,12000,12000,36 months,0.1433,412.06,C,C1,MCSD,6 years,OWN,44000.0,Not Verified,2012-12-01,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/27/12 > Debt consolidat...,credit_card,Debt Consoldation,330xx,FL,25.01,2,1996-02-01,1,14,,18,0,12038,37.4%,26,w,0.00,0.00,14233.637927,14233.64,12000.00,2233.64,0.00,0.00,0.0000,2014-10-01,5430.38,NaT,2016-02-01,0,,1,INDIVIDUAL,0,0,52770,32200,4,2932,11134,49.4,0,0,136,205,2,2,1,2,14,2,14,0,4,6,4,7,11,9,14,6,18,0,0,0,4,92,50.0,0,0,82860,52770,22000,50660
2836798,3419283,10000,10000,10000,36 months,0.1972,370.22,D,D5,Dept.of Defense,10+ years,RENT,84000.0,Verified,2012-12-01,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/27/12 > My wife and I h...,wedding,Wedding expenses,928xx,CA,19.63,0,2002-09-01,2,,58,11,1,3320,48.8%,27,w,0.00,0.00,12392.039762,12392.04,10000.00,2392.04,0.00,0.00,0.0000,2014-07-01,6104.77,NaT,2015-12-01,0,,1,INDIVIDUAL,0,0,53546,6800,11,4868,486,83.8,0,0,125,116,1,1,0,32,,6,,0,2,5,2,7,9,7,18,5,11,0,0,0,3,100,100.0,1,0,68643,53546,3000,61843
2826877,3409383,6000,6000,6000,36 months,0.1629,211.81,C,C4,United States Postal Service,10+ years,RENT,66500.0,Not Verified,2012-12-01,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/28/12 > Last year I rel...,debt_consolidation,Debt consolidation,274xx,NC,5.61,0,2000-09-01,0,,95,5,1,8928,70.3%,9,w,0.00,0.00,6161.261757,6161.26,6000.00,161.26,0.00,0.00,0.0000,2013-03-01,5950.14,NaT,2016-02-01,0,,1,INDIVIDUAL,0,0,19031,12700,3,3806,272,93.2,0,0,130,150,10,10,1,44,,13,,0,1,4,1,2,3,4,5,4,5,0,0,0,1,100,100.0,1,0,27277,19031,4000,14577
2824664,3406833,8000,8000,8000,36 months,0.1849,291.19,D,D2,ref-chem,10+ years,MORTGAGE,80000.0,Verified,2012-12-01,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,,small_business,Business,776xx,TX,18.60,0,2004-05-01,1,,,6,0,3871,59.6%,18,w,0.00,0.00,9471.719653,9471.72,8000.00,1471.72,0.00,0.00,0.0000,2014-05-01,1013.72,NaT,2016-02-01,0,,1,INDIVIDUAL,0,0,44856,6500,5,7476,178,94.1,0,0,67,105,7,7,0,7,,4,,0,2,3,2,6,10,3,8,3,6,0,0,0,2,100,100.0,0,0,71127,44856,3000,64627
2837301,3419817,28000,28000,28000,36 months,0.0762,872.52,A,A3,Placid Express,10+ years,RENT,96200.0,Not Verified,2012-12-01,Fully Paid,n,https://www.lendingclub.com/browse/loanDetail....,Borrower added on 12/26/12 > Decided to fore...,debt_consolidation,Short Term Balance Transfer,113xx,NY,16.91,0,2004-11-01,0,,,7,0,12842,6.7%,12,w,0.00,0.00,31410.650000,31410.65,28000.00,3410.65,0.00,0.00,0.0000,2016-01-01,872.45,NaT,2016-02-01,0,,1,INDIVIDUAL,0,0,36455,59200,3,5208,29158,23.2,0,0,99,58,20,4,0,20,,7,,0,2,2,2,3,7,3,5,2,7,0,0,0,1,100,0.0,0,0,125810,36455,42000,66610
2694737,3256914,35000,35000,35000,60 months,0.1972,921.85,D,D5,Integrated Wind Energy,2 years,MORTGAGE,100000.0,Verified,2012-12-01,Charged Off,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Consolidation Loan,754xx,TX,25.62,5,2002-11-01,0,15,,11,0,4007,37.1%,21,w,0.00,0.00,23181.380000,23181.38,8611.02,10747.83,0.00,3822.53,688.0554,2014-10-01,921.85,NaT,2015-03-01,0,,1,INDIVIDUAL,0,0,191870,10800,6,17443,4876,39.8,0,0,79,123,1,1,4,1,15,10,15,0,3,5,3,5,10,5,7,5,11,0,0,0,1,76,33.3,0,0,229663,54371,8100,72984


In [2]:
past_loans[(past_loans['loan_status'] != "Current") & (past_loans['loan_status'] != "Fully Paid")].head(100)

Unnamed: 0_level_0,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,sub_grade,emp_title,emp_length,home_ownership,annual_inc,verification_status,issue_d,loan_status,pymnt_plan,url,desc,purpose,title,zip_code,addr_state,dti,delinq_2yrs,earliest_cr_line,fico_range_low,fico_range_high,inq_last_6mths,mths_since_last_delinq,mths_since_last_record,open_acc,pub_rec,revol_bal,revol_util,total_acc,initial_list_status,out_prncp,out_prncp_inv,total_pymnt,total_pymnt_inv,total_rec_prncp,total_rec_int,total_rec_late_fee,recoveries,collection_recovery_fee,last_pymnt_d,last_pymnt_amnt,next_pymnt_d,last_credit_pull_d,last_fico_range_high,last_fico_range_low,collections_12_mths_ex_med,mths_since_last_major_derog,policy_code,application_type,acc_now_delinq,tot_coll_amt,tot_cur_bal,total_credit_rv
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1
37662224,40425321,7650,7650,7650,36 months,0.1366,260.20,C,C3,Technical Specialist,< 1 year,RENT,50000.00,Source Verified,2014-12-01,Charged Off,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Debt consolidation,850xx,AZ,34.81,0,2002-08-01,685,689,1,,,11,0,16822,91.9%,20,f,0.00,0.00,1043.99,1043.99,704.38,339.61,0.00,0.00,0.0000,2015-08-01,17.70,NaT,2015-12-01,539,535,0,,1,INDIVIDUAL,0,0,64426,18300
37800722,40563521,12975,12975,12975,36 months,0.1786,468.17,D,D5,Sales,10+ years,RENT,60000.00,Source Verified,2014-12-01,Late (31-120 days),n,https://www.lendingclub.com/browse/loanDetail....,,house,Home buying,331xx,FL,22.42,0,1999-01-01,680,684,0,48,,11,0,5200,33.1%,19,f,10346.86,10346.86,4181.34,4181.34,2628.14,1553.20,0.00,0.00,0.0000,2015-10-01,468.17,2016-01-01,2015-12-01,634,630,0,,1,INDIVIDUAL,0,900,17281,15700
37822030,40585070,18450,18450,18450,36 months,0.1431,633.36,C,C4,construction foreman,10+ years,MORTGAGE,108000.00,Not Verified,2014-12-01,Late (31-120 days),n,https://www.lendingclub.com/browse/loanDetail....,,home_improvement,Home improvement,810xx,CO,23.37,0,2003-03-01,680,684,3,,,11,0,5925,87%,20,f,12981.68,12981.68,6932.56,6932.56,5468.32,1464.24,0.00,0.00,0.0000,2015-09-01,633.36,2016-01-01,2015-12-01,614,610,0,100,1,INDIVIDUAL,0,0,237795,6800
37791995,40555038,4500,4500,4500,36 months,0.1499,155.98,C,C5,Nanny,4 years,OWN,34000.00,Not Verified,2014-12-01,Late (31-120 days),n,https://www.lendingclub.com/browse/loanDetail....,,major_purchase,Major purchase,210xx,MD,28.77,0,1997-02-01,660,664,1,25,54,18,1,12506,45%,51,f,3444.33,3444.33,1554.18,1554.18,1055.67,498.51,0.00,0.00,0.0000,2015-11-01,155.98,2016-01-01,2015-12-01,654,650,0,25,1,INDIVIDUAL,0,5098,24481,27800
37711991,40485052,14000,14000,14000,36 months,0.1649,495.60,D,D3,Customerservice,9 years,RENT,45000.00,Source Verified,2014-12-01,Charged Off,n,https://www.lendingclub.com/browse/loanDetail....,,credit_card,Credit card refinancing,331xx,FL,13.47,0,2008-08-01,660,664,1,,,20,0,10585,39.9%,22,f,0.00,0.00,2955.08,2955.08,303.22,173.14,0.00,2478.72,446.1696,2015-02-01,495.60,NaT,2015-06-01,559,555,0,,1,INDIVIDUAL,0,0,15659,26500
37671924,40434950,16000,16000,16000,36 months,0.1299,539.03,C,C2,Sr manager,< 1 year,MORTGAGE,80000.00,Source Verified,2014-12-01,Charged Off,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Debt consolidation,800xx,CO,22.32,6,2001-12-01,660,664,1,4,,17,0,10768,53.8%,46,f,0.00,0.00,2677.83,2677.83,1869.18,808.65,0.00,0.00,0.0000,2015-06-01,539.03,NaT,2015-12-01,574,570,0,,1,INDIVIDUAL,0,0,318905,20000
37752007,40515020,22200,22200,22200,60 months,0.1714,553.40,D,D4,Clinical Data Lead,10+ years,RENT,74500.00,Source Verified,2014-12-01,Late (31-120 days),n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Debt consolidation,015xx,MA,8.05,0,2001-09-01,695,699,0,36,,7,0,7014,47.4%,14,w,19679.08,19679.08,5502.29,5502.29,2520.92,2981.37,0.00,0.00,0.0000,2015-11-01,553.40,2016-01-01,2015-12-01,574,570,0,,1,INDIVIDUAL,0,0,11528,14800
37108024,39870843,13825,13825,13825,60 months,0.1366,319.26,C,C3,parks dept.,< 1 year,RENT,27800.00,Source Verified,2014-12-01,Late (31-120 days),n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Debt consolidation,981xx,WA,24.22,0,1991-04-01,695,699,2,,,6,0,11048,47.4%,12,w,11962.36,11962.36,3496.12,3496.12,1862.64,1633.48,0.00,0.00,0.0000,2015-12-01,319.26,2016-02-01,2015-12-01,709,705,0,,1,INDIVIDUAL,0,0,19394,23300
37711880,40484927,20000,20000,20000,36 months,0.1559,699.10,D,D1,Graphic Designer,2 years,MORTGAGE,50000.00,Source Verified,2014-12-01,Late (31-120 days),n,https://www.lendingclub.com/browse/loanDetail....,,credit_card,Credit card refinancing,189xx,PA,16.47,0,2010-03-01,660,664,1,,,9,0,7186,74.1%,12,f,16334.19,16334.19,5636.74,5636.74,3665.81,1901.01,69.92,0.00,0.0000,2015-09-01,1433.16,2016-01-01,2015-12-01,499,0,0,,1,INDIVIDUAL,0,0,21850,9700
37781998,40545047,14400,14400,14400,60 months,0.1924,375.45,E,E2,Administrative Assistant,10+ years,MORTGAGE,70000.00,Verified,2014-12-01,Charged Off,n,https://www.lendingclub.com/browse/loanDetail....,,debt_consolidation,Debt consolidation,105xx,NY,26.81,0,1976-04-01,665,669,1,29,68,14,1,11080,73.9%,29,f,0.00,0.00,727.81,727.81,291.46,436.35,0.00,0.00,0.0000,2015-03-01,375.45,NaT,2015-12-01,584,580,0,67,1,INDIVIDUAL,0,0,40195,15000


# Explore different categorical data
What values are inside the categorical fields?

In [3]:
pd.unique(past_loans['term'])

array([' 36 months', ' 60 months', nan], dtype=object)

In [4]:
pd.unique(past_loans['grade'])

array(['A', 'C', 'D', 'B', 'E', 'F', 'G', nan], dtype=object)

In [5]:
pd.unique(past_loans['sub_grade'])

array(['A3', 'C1', 'D4', 'C3', 'D1', 'D5', 'B5', 'B4', 'C4', 'E5', 'D2',
       'B3', 'C5', 'E4', 'E3', 'C2', 'B2', 'A5', 'F1', 'B1', 'D3', 'A4',
       'E1', 'E2', 'G2', 'A1', 'G1', 'A2', 'F3', 'F2', 'G3', 'F4', 'G4',
       'F5', 'G5', nan], dtype=object)

In [6]:
pd.unique(past_loans['home_ownership'])

array(['MORTGAGE', 'RENT', 'OWN', 'ANY', nan], dtype=object)

In [7]:
pd.unique(past_loans['verification_status'])

array(['Not Verified', 'Source Verified', 'Verified', nan], dtype=object)

In [8]:
pd.unique(past_loans['loan_status'])

array(['Current', 'Fully Paid', 'Charged Off', 'Late (31-120 days)',
       'In Grace Period', 'Late (16-30 days)', 'Default', nan], dtype=object)

In [9]:
pd.unique(past_loans['pymnt_plan'])

array(['n', 'y', nan], dtype=object)

In [10]:
pd.unique(past_loans['purpose'])

array(['credit_card', 'debt_consolidation', 'car', 'house',
       'home_improvement', 'other', 'medical', 'moving', 'major_purchase',
       'vacation', 'small_business', 'renewable_energy', 'wedding', nan], dtype=object)

# Examine field types and amount of data
Is there any missing data?

In [11]:
past_loans.info()

<class 'pandas.core.frame.DataFrame'>
Index: 235631 entries, 36805548 to Total amount funded in policy code 2: 873652739
Data columns (total 60 columns):
member_id                      235629 non-null float64
loan_amnt                      235629 non-null float64
funded_amnt                    235629 non-null float64
funded_amnt_inv                235629 non-null float64
term                           235629 non-null category
int_rate                       235629 non-null float64
installment                    235629 non-null float64
grade                          235629 non-null category
sub_grade                      235629 non-null category
emp_title                      222393 non-null object
emp_length                     235629 non-null object
home_ownership                 235629 non-null category
annual_inc                     235629 non-null float64
verification_status            235629 non-null category
issue_d                        235629 non-null datetime64[ns]
loan_status

# Check out summary stats on the numeric fields

In [12]:
past_loans.describe().transpose() #.to_string()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
member_id,235629,24019299.796719,8825162.024128,137225.0,15579724.0,22953173.0,31767065.0,40860827.0
loan_amnt,235629,14870.156793,8438.318193,1000.0,8325.0,13000.0,20000.0,35000.0
funded_amnt,235629,14870.156793,8438.318193,1000.0,8325.0,13000.0,20000.0,35000.0
funded_amnt_inv,235629,14865.334169,8435.524995,950.0,8325.0,13000.0,20000.0,35000.0
int_rate,235629,0.137713,0.043255,0.06,0.1099,0.1365,0.1629,0.2606
installment,235629,442.482374,245.050238,23.36,265.68,384.12,578.71,1409.99
annual_inc,235629,74854.148281,55547.533374,3000.0,45377.0,65000.0,90000.0,7500000.0
dti,235629,18.04077,8.023002,0.0,12.02,17.63,23.76,39.99
delinq_2yrs,235629,0.344512,0.898319,0.0,0.0,0.0,0.0,22.0
fico_range_low,235629,692.497358,29.246641,660.0,670.0,685.0,705.0,845.0


## Explore date fields

In [13]:
past_loans.groupby(past_loans['last_pymnt_d'].map(lambda x: x.year))

<pandas.core.groupby.DataFrameGroupBy object at 0x10ef49790>

In [14]:
pd.unique(past_loans.sort_values('last_pymnt_d')['last_pymnt_d'])

array(['2014-01-31T19:00:00.000000000-0500',
       '2014-02-28T19:00:00.000000000-0500',
       '2014-03-31T20:00:00.000000000-0400',
       '2014-04-30T20:00:00.000000000-0400',
       '2014-05-31T20:00:00.000000000-0400',
       '2014-06-30T20:00:00.000000000-0400',
       '2014-07-31T20:00:00.000000000-0400',
       '2014-08-31T20:00:00.000000000-0400',
       '2014-09-30T20:00:00.000000000-0400',
       '2014-10-31T20:00:00.000000000-0400',
       '2014-11-30T19:00:00.000000000-0500',
       '2014-12-31T19:00:00.000000000-0500',
       '2015-01-31T19:00:00.000000000-0500',
       '2015-02-28T19:00:00.000000000-0500',
       '2015-03-31T20:00:00.000000000-0400',
       '2015-04-30T20:00:00.000000000-0400',
       '2015-05-31T20:00:00.000000000-0400',
       '2015-06-30T20:00:00.000000000-0400',
       '2015-07-31T20:00:00.000000000-0400',
       '2015-08-31T20:00:00.000000000-0400',
       '2015-09-30T20:00:00.000000000-0400',
       '2015-10-31T20:00:00.000000000-0400',
       '20

In [15]:
pd.unique(past_loans['issue_d'])

array(['2014-11-30T19:00:00.000000000-0500',
       '2014-10-31T20:00:00.000000000-0400',
       '2014-09-30T20:00:00.000000000-0400',
       '2014-08-31T20:00:00.000000000-0400',
       '2014-07-31T20:00:00.000000000-0400',
       '2014-06-30T20:00:00.000000000-0400',
       '2014-05-31T20:00:00.000000000-0400',
       '2014-04-30T20:00:00.000000000-0400',
       '2014-03-31T20:00:00.000000000-0400',
       '2014-02-28T19:00:00.000000000-0500',
       '2014-01-31T19:00:00.000000000-0500',
       '2013-12-31T19:00:00.000000000-0500', 'NaT'], dtype='datetime64[ns]')

# Fill in rows for each date to last payment
A simple look at the data will be skewed because active loans may still default in the future. Only using 2012 36 month loans.

The data is supplied with status for each loan. Best to fill data with records for each month from loan start (and status = current) until today or last payment). This will allow me to determine the shape of the conditional default curve (what percentage of current loans default vs age?) and then use the current data set. I will also gain a better understanding of Loss Given Default for this data set.

Not sure if I have time to make this happen. When I do, I will go back and import data from 2013+.

## Create list of possible dates to match with

In [16]:
# What are all the dates in the table?
# for each date, what are the earlier dates
# create new table with "current" until paid or written off
# table should stop before defaults and late payments start registering


In [20]:
all_dates = pd.DataFrame(index=pd.unique(past_loans.sort_values('last_pymnt_d')['last_pymnt_d']))#, columns='date_d')
all_dates['report_d'] = pd.unique(past_loans.sort_values('last_pymnt_d')['last_pymnt_d'])
all_dates['common'] = 1
all_dates.dropna(how = 'any')
all_dates[(pd.notnull(all_dates['report_d']))]

Unnamed: 0,report_d,common
2014-02-01,2014-02-01,1
2014-03-01,2014-03-01,1
2014-04-01,2014-04-01,1
2014-05-01,2014-05-01,1
2014-06-01,2014-06-01,1
2014-07-01,2014-07-01,1
2014-08-01,2014-08-01,1
2014-09-01,2014-09-01,1
2014-10-01,2014-10-01,1
2014-11-01,2014-11-01,1


In [23]:
past_loans['common'] = 1
partly_merged = pd.merge(past_loans, all_dates, on = 'common')
full_past_loans = partly_merged[(partly_merged['last_pymnt_d'] >= partly_merged['report_d'] & 
                                 pd.notnull(partly_merged['report_d']))]
full_past_loans.head(100)

ValueError: operands could not be broadcast together with shapes (5655144,) (706893,) 

# Create dependent and independent data


In [80]:
X = past_loans.drop('loan_status', axis = 1)
y = past_loans['loan_status']

# Split into training and test data

In [81]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)
# 10 cross validation iterations with 20% test / 80% train
from sklearn.cross_validation import ShuffleSplit
cv = ShuffleSplit(X_train.shape[0], n_iter=10, test_size=0.2, random_state=0)

# Standardize the data

In [82]:
from sklearn.preprocessing import StandardScaler
stdsc = StandardScaler()
# transform our training features
X_train_std = stdsc.fit_transform(X_train)
# transform the testing features in the same way
X_test_std = stdsc.transform(X_test)

ValueError: could not convert string to float: INDIVIDUAL