In [None]:
+ I dropped all of the date columns should I consider extracting more features from them before dropping?  (month has already been included) 
+ Should I remove the features that would not be available at the time of prediction?  i.e. won't know how many lenders when the loan is posted. 
+ Partner_ID is an ID number but the number is not relative in terms of being bigger indicates anything
+ Is is OK to leave dates?  Can the model interpret this information? 

## Kiva Loan Funding - PreProcessing

**PURPOSE**: Predict which microfinance loans will be funded and how quickly they will be funded

**AUTHOR** : Maureen Wiebe

**DATA SOURCES**:<br> 
- Kiva Developer Tools: https://www.kiva.org/build/data-snapshots
    
**REV DATE**: 4-10-2020

In [21]:
import pandas as pd 
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
plt.style.use('seaborn')
pd.set_option('max_columns', None)

In [2]:
loans_2019 = pd.read_pickle('C:/Users/mwalz2/Documents/Python/Springboard/Kiva_Capstone_Project/data/interim/loans_2019_clean.pkl')

In [4]:
loans_2019.select_dtypes(include ='object').describe(include='object')

Unnamed: 0,LOAN_NAME,ORIGINAL_LANGUAGE,DESCRIPTION,DESCRIPTION_TRANSLATED,STATUS,ACTIVITY_NAME,SECTOR_NAME,LOAN_USE,COUNTRY_CODE,COUNTRY_NAME,TOWN_NAME,CURRENCY_POLICY,CURRENCY,TAGS,BORROWER_NAMES,BORROWER_GENDERS,BORROWER_PICTURED,REPAYMENT_INTERVAL,DISTRIBUTION_MODEL
count,108236,108237,108237,108237,108237,108237,108237,108237,108237,108237,105145,108237,108237,84024,108236,108237,108237,108237,108237
unique,51834,5,108198,108198,2,163,15,64778,66,66,4679,2,53,25061,55260,2292,1098,3,2
top,Mary,English,Vanna’s Group lives in a rural village in Take...,Vanna’s Group lives in a rural village in Take...,funded,Personal Housing Expenses,Agriculture,to build a sanitary toilet for her family,PH,Philippines,Kaduna,shared,PHP,user_favorite,Mary,female,true,monthly,field_partner
freq,361,77439,4,4,104162,15556,28895,8064,27610,27610,1434,80650,27610,6125,361,77009,97012,94941,107832


### Categorical Features 
The following freatures were selected to be one-hot encoded in addition to the features that were selected prior: 
1. DISTRIBUTION_MODEL
2. REPAYMENT_INTERVAL
3. COUNTRY_CODE
4. CURRENCY
5. SECTOR_NAME
7. ORIGINAL_LANGUAGE 

Several other categorical features (ex. [Loan]Description, Borrower Names, etc.) were not selected to be included as part of the dataset for modeling because without further processing they would not add additional knowledge to the model. 

In [17]:
model_loans = pd.get_dummies(loans_2019, columns =['DISTRIBUTION_MODEL','REPAYMENT_INTERVAL','COUNTRY_CODE','CURRENCY','SECTOR_NAME','ORIGINAL_LANGUAGE'], 
                            prefix=['Dist_Model','Repayment', 'Country', 'Currency', 'Sector','Post_Language'],drop_first = True)

### Modeling Dataset
Additional numeric columns were eliminated because they provided information to the model that would not be known when determining if the loan was ultimately funded.  For example the total number of lenders or the amount of time to raise the loan funds. Other columns with duplicative information were excluded for the modeling. 

The feature set as well as the target were designated as X, y respectively.  


In [25]:
X = model_loans.drop(['LOAN_ID','FUNDED_PERCENT','CURRENCY_EXCHANGE_COVERAGE_RATE','RAISED_TIME','RAISED_HOURS','NUM_LENDERS_TOTAL','Status_funded','AVG_LENDER_AMT','POSTED_TIME','PLANNED_EXPIRATION_TIME','DISBURSE_TIME'],axis =1).select_dtypes(exclude ='object')
y = model_loans['Status_funded']

In [26]:
X.describe()

Unnamed: 0,FUNDED_AMOUNT,LOAN_AMOUNT,PARTNER_ID,CURRENCY_EXCHANGE_RATE_CALC,MONTH,LENDER_TERM,NUM_JOURNAL_ENTRIES,NUM_BULK_ENTRIES,NUM_BORROWERS_TOTAL,FEMALE_ONLY_LOAN,Tag_Woman Owned Biz,Tag_Elderly,Tag_user_favorite,Tag_Vegan,Tag_Parent,Tag_Repeat Borrower,Tag_Repair Renew Replace,Tag_Supporting Family,Tag_Biz Durable Asset,Tag_Unique,Tag_Widowed,Tag_Animals,Tag_Single,Tag_Single Parent,Tag_volunteer_like,Tag_volunteer_pick,Tag_Health and Sanitation,Tag_Eco-friendly,Tag_First Loan,Tag_Refugee,Tag_Schooling,Tag_Technology,Tag_Fabrics,Tag_Job Creator,Tag_Female Education,Tag_Sustainable Ag,Tag_Trees,Tag_Orphan,Tag_US immigrant,Activity_Agriculture,Activity_Air Conditioning,Activity_Animal Sales,Activity_Aquaculture,Activity_Arts,Activity_Auto Repair,Activity_Bakery,Activity_Balut-Making,Activity_Barber Shop,Activity_Beauty Salon,Activity_Beekeeping,Activity_Beverages,Activity_Bicycle Repair,Activity_Bicycle Sales,Activity_Blacksmith,Activity_Bookbinding,Activity_Bookstore,Activity_Bricks,Activity_Butcher Shop,Activity_Cafe,Activity_Call Center,Activity_Carpentry,Activity_Catering,Activity_Cattle,Activity_Celebrations,Activity_Cement,Activity_Cereals,Activity_Charcoal Sales,Activity_Cheese Making,Activity_Child Care,Activity_Cleaning Services,Activity_Cloth & Dressmaking Supplies,Activity_Clothing,Activity_Clothing Sales,Activity_Cobbler,Activity_Communications,Activity_Computer,Activity_Computers,Activity_Construction,Activity_Construction Supplies,Activity_Consumer Goods,Activity_Cosmetics Sales,Activity_Crafts,Activity_Dairy,Activity_Decorations Sales,Activity_Dental,Activity_Education provider,Activity_Electrical Goods,Activity_Electrician,Activity_Electronics Repair,Activity_Electronics Sales,Activity_Embroidery,Activity_Energy,Activity_Entertainment,Activity_Event Planning,Activity_Farm Supplies,Activity_Farming,Activity_Film,Activity_Fish Selling,Activity_Fishing,Activity_Florist,Activity_Flowers,Activity_Food,Activity_Food Market,Activity_Food Production/Sales,Activity_Food Stall,Activity_Fruits & Vegetables,Activity_Fuel/Firewood,Activity_Funerals,Activity_Furniture Making,Activity_Games,Activity_General Store,Activity_Goods Distribution,Activity_Grocery Store,Activity_Hardware,Activity_Health,Activity_Higher education costs,Activity_Home Appliances,Activity_Home Energy,Activity_Home Products Sales,Activity_Hotel,Activity_Internet Cafe,Activity_Jewelry,Activity_Knitting,Activity_Land Rental,Activity_Landscaping / Gardening,Activity_Laundry,Activity_Liquor Store / Off-License,Activity_Livestock,Activity_Machine Shop,Activity_Machinery Rental,Activity_Manufacturing,Activity_Medical Clinic,Activity_Metal Shop,Activity_Milk Sales,Activity_Mobile Phones,Activity_Mobile Transactions,Activity_Motorcycle Repair,Activity_Motorcycle Transport,Activity_Movie Tapes & DVDs,Activity_Music Discs & Tapes,Activity_Musical Instruments,Activity_Musical Performance,Activity_Natural Medicines,Activity_Office Supplies,Activity_Paper Sales,Activity_Party Supplies,Activity_Patchwork,Activity_Perfumes,Activity_Personal Care Products,Activity_Personal Expenses,Activity_Personal Housing Expenses,Activity_Personal Medical Expenses,Activity_Personal Products Sales,Activity_Pharmacy,Activity_Phone Accessories,Activity_Phone Repair,Activity_Phone Use Sales,Activity_Photography,Activity_Pigs,Activity_Plastics Sales,Activity_Poultry,Activity_Primary/secondary school costs,Activity_Printing,Activity_Property,Activity_Pub,Activity_Quarrying,Activity_Recycled Materials,Activity_Recycling,Activity_Religious Articles,Activity_Renewable Energy Products,Activity_Restaurant,Activity_Retail,Activity_Rickshaw,Activity_Secretarial Services,Activity_Services,Activity_Sewing,Activity_Shoe Sales,Activity_Souvenir Sales,Activity_Spare Parts,Activity_Sporting Good Sales,Activity_Tailoring,Activity_Taxi,Activity_Technology,Activity_Textiles,Activity_Timber Sales,Activity_Tourism,Activity_Transportation,Activity_Traveling Sales,Activity_Upholstery,Activity_Used Clothing,Activity_Used Shoes,Activity_Utilities,Activity_Vehicle,Activity_Vehicle Repairs,Activity_Veterinary Sales,Activity_Waste Management,Activity_Water Distribution,Activity_Weaving,Activity_Wedding Expenses,Activity_Well digging,Activity_Wholesale,Dist_Model_field_partner,Repayment_irregular,Repayment_monthly,Country_AM,Country_BD,Country_BF,Country_BO,Country_BR,Country_CD,Country_CG,Country_CM,Country_CO,Country_CR,Country_DO,Country_EC,Country_EG,Country_FJ,Country_GE,Country_GH,Country_GT,Country_HN,Country_HT,Country_ID,Country_IL,Country_IN,Country_JO,Country_KE,Country_KG,Country_KH,Country_LB,Country_LR,Country_LS,Country_MD,Country_MG,Country_ML,Country_MW,Country_MX,Country_MZ,Country_NG,Country_NI,Country_NP,Country_PA,Country_PE,Country_PG,Country_PH,Country_PK,Country_PR,Country_PS,Country_PY,Country_RW,Country_SB,Country_SL,Country_SN,Country_SV,Country_TG,Country_TH,Country_TJ,Country_TL,Country_TO,Country_TR,Country_TZ,Country_UG,Country_US,Country_VN,Country_WS,Country_XK,Country_ZM,Country_ZW,Currency_AMD,Currency_BOB,Currency_BRL,Currency_COP,Currency_CRC,Currency_DOP,Currency_EGP,Currency_EUR,Currency_FJD,Currency_GEL,Currency_GHS,Currency_GTQ,Currency_HNL,Currency_HTG,Currency_IDR,Currency_ILS,Currency_INR,Currency_JOD,Currency_KES,Currency_KGS,Currency_KHR,Currency_LBP,Currency_LRD,Currency_LSL,Currency_MDL,Currency_MGA,Currency_MWK,Currency_MXN,Currency_MZN,Currency_NGN,Currency_NIO,Currency_NPR,Currency_PEN,Currency_PGK,Currency_PHP,Currency_PKR,Currency_PYG,Currency_RWF,Currency_SBD,Currency_SLL,Currency_THB,Currency_TJS,Currency_TOP,Currency_TRY,Currency_TZS,Currency_UGX,Currency_USD,Currency_VND,Currency_WST,Currency_XAF,Currency_XOF,Currency_ZMW,Sector_Arts,Sector_Clothing,Sector_Construction,Sector_Education,Sector_Entertainment,Sector_Food,Sector_Health,Sector_Housing,Sector_Manufacturing,Sector_Personal Use,Sector_Retail,Sector_Services,Sector_Transportation,Sector_Wholesale,Post_Language_French,Post_Language_Portuguese,Post_Language_Russian,Post_Language_Spanish
count,108237.0,108237.0,107832.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0,108237.0
mean,706.39869,741.476113,210.723644,0.057774,9.465229,13.340798,1.185972,1.007668,1.780657,0.775354,0.257223,0.128958,0.435341,0.07041,0.255005,0.171328,0.045696,0.025213,0.069126,0.007151,0.015272,0.086967,0.025167,0.030433,0.016695,0.02744,0.133577,0.148849,0.022229,0.007114,0.093231,0.040938,0.026442,0.011142,0.006199,0.017859,0.011447,0.000508,0.000453,0.033602,5.5e-05,0.00716,0.001146,0.001645,0.001192,0.00534,2.8e-05,0.001173,0.010191,0.000748,0.006846,0.000185,3.7e-05,0.000582,6.5e-05,0.000416,0.001062,0.002337,0.00122,0.00012,0.000665,0.001774,0.020418,9e-06,0.000231,0.011586,0.004518,0.000379,0.000203,0.000425,0.001811,0.004222,0.031597,0.000471,8.3e-05,0.004564,0.000416,0.003187,0.001968,0.000998,0.005876,0.005543,0.016852,0.000388,0.000573,0.006098,0.000249,0.000203,0.000527,0.000591,0.002448,6.5e-05,0.000323,0.000212,0.002439,0.125632,9e-06,0.01578,0.010893,0.000573,0.000508,0.014043,0.011318,0.042,0.009091,0.019836,0.001654,9e-06,0.002301,0.000259,0.065514,0.000139,0.021518,0.001497,0.000527,0.021998,0.023938,0.015392,0.004684,0.000961,0.000739,0.001949,0.000407,6.5e-05,0.000397,0.00048,0.000887,0.013055,3.7e-05,0.000139,0.002273,0.000527,0.001386,0.000702,0.00061,0.000989,0.000591,0.006209,7.4e-05,9e-06,1.8e-05,0.000148,0.000905,0.00061,0.000305,7.4e-05,2.8e-05,0.000443,0.000148,0.013554,0.143722,0.011697,0.003834,0.001423,0.000859,0.00012,0.000305,0.000425,0.031976,0.000859,0.01311,0.018783,0.000508,0.00133,0.001016,0.000194,0.000896,0.000268,9.2e-05,0.000129,0.006708,0.035866,0.000924,5.5e-05,0.005312,0.010579,0.004425,0.000157,0.001737,2.8e-05,0.010606,0.002393,0.00012,0.002088,0.00049,0.000268,0.00291,0.00012,8.3e-05,0.007668,0.001062,9e-06,0.001118,0.000915,0.000249,1.8e-05,0.000933,0.006209,0.000157,5.5e-05,0.000286,0.996258,0.034258,0.877158,0.005516,9e-06,0.008075,0.003723,0.002042,0.006153,9e-06,0.001377,0.037399,0.004398,0.001589,0.040873,0.004204,0.00425,0.006375,0.008962,0.008084,0.009322,0.00328,0.004573,0.000249,0.020335,0.006809,0.121511,0.014681,0.034785,0.015374,0.020511,0.002753,0.002799,0.019051,0.001746,0.001192,0.004176,0.005571,0.013249,0.013332,0.000896,0.000443,0.021065,0.000887,0.255088,0.007465,0.000203,0.008601,0.014838,0.009359,0.005534,0.00036,0.006153,0.045114,0.0159,0.000527,0.03338,0.009026,0.003936,0.001414,0.00388,0.049558,0.003668,0.021093,0.018894,0.000536,0.001109,9e-06,0.004425,0.003622,0.002042,0.03739,0.004148,0.001589,0.004204,0.000536,0.00425,0.006375,0.00886,0.008029,0.009322,0.00316,0.004453,0.000286,0.020289,0.006809,0.121493,0.014681,0.031431,0.003677,0.020511,0.002753,0.002799,0.019051,0.001192,0.004139,0.005571,0.013249,0.00583,0.000887,0.020594,0.000868,0.255088,0.007465,0.014838,0.009359,0.005534,0.00036,0.00048,0.033334,0.003936,0.001414,0.003853,0.04953,0.139222,0.021093,0.018894,0.001377,0.031874,0.001109,0.018395,0.04455,0.007853,0.051443,0.00073,0.183301,0.015651,0.145052,0.006726,0.055175,0.14119,0.050112,0.012436,0.000425,0.060996,0.0074,0.026174,0.189972
std,1251.819393,1275.54506,137.037221,0.049392,1.663367,6.447662,0.615601,0.230921,3.222503,0.417351,0.437105,0.335155,0.495804,0.255838,0.435866,0.376797,0.208826,0.156773,0.25367,0.084261,0.122633,0.281787,0.156633,0.171777,0.128126,0.163362,0.340199,0.355941,0.147428,0.084044,0.290757,0.198147,0.160446,0.104967,0.078492,0.132439,0.106377,0.022536,0.021272,0.180204,0.007445,0.084315,0.033828,0.04052,0.034502,0.072881,0.005265,0.034234,0.100433,0.027346,0.082458,0.013592,0.006079,0.024119,0.008042,0.020386,0.032579,0.048291,0.034901,0.010959,0.025783,0.04208,0.141426,0.00304,0.015196,0.107012,0.067063,0.019459,0.014255,0.020611,0.042516,0.064842,0.174926,0.021702,0.009118,0.067404,0.020386,0.056368,0.044318,0.031573,0.07643,0.074248,0.128717,0.019695,0.023927,0.07785,0.015792,0.014255,0.022942,0.024309,0.04942,0.008042,0.01798,0.014576,0.049327,0.331435,0.00304,0.124625,0.103799,0.023927,0.022536,0.11767,0.105782,0.200591,0.094914,0.139437,0.040633,0.00304,0.047909,0.016082,0.247431,0.011771,0.145103,0.038659,0.022942,0.146678,0.152857,0.123107,0.068281,0.030983,0.027177,0.044109,0.020158,0.008042,0.019928,0.021913,0.029769,0.113509,0.006079,0.011771,0.04762,0.022942,0.037201,0.026489,0.024686,0.031426,0.024309,0.07855,0.008597,0.00304,0.004299,0.012157,0.030077,0.024686,0.017458,0.008597,0.005265,0.021054,0.012157,0.115629,0.350809,0.107517,0.061802,0.037693,0.0293,0.010959,0.017458,0.020611,0.175937,0.0293,0.113747,0.135758,0.022536,0.036451,0.031863,0.013928,0.029923,0.016366,0.009612,0.011372,0.081625,0.185956,0.030382,0.007445,0.072693,0.102308,0.066377,0.012532,0.04164,0.005265,0.10244,0.048859,0.010959,0.045647,0.022123,0.016366,0.053869,0.010959,0.009118,0.087233,0.032579,0.00304,0.033417,0.03023,0.015792,0.004299,0.030533,0.07855,0.012532,0.007445,0.016921,0.061056,0.181892,0.328257,0.074063,0.00304,0.089497,0.060906,0.045141,0.078201,0.00304,0.037077,0.189739,0.06617,0.039832,0.197997,0.0647,0.065053,0.079588,0.094242,0.089548,0.096101,0.057176,0.067472,0.015792,0.141144,0.082236,0.326722,0.120272,0.183235,0.123034,0.141739,0.052399,0.052836,0.136704,0.041751,0.034502,0.064487,0.074432,0.114339,0.114692,0.029923,0.021054,0.143601,0.029769,0.435913,0.086078,0.014255,0.092345,0.120904,0.096289,0.074186,0.018979,0.078201,0.207555,0.12509,0.022942,0.179629,0.094579,0.062613,0.037571,0.062172,0.217031,0.060452,0.143694,0.13615,0.023143,0.033279,0.00304,0.066377,0.060072,0.045141,0.189717,0.064274,0.039832,0.0647,0.023143,0.065053,0.079588,0.093711,0.089243,0.096101,0.056123,0.066584,0.016921,0.140987,0.082236,0.3267,0.120272,0.17448,0.060528,0.141739,0.052399,0.052836,0.136704,0.034502,0.064203,0.074432,0.114339,0.076131,0.029769,0.14202,0.029457,0.435913,0.086078,0.120904,0.096289,0.074186,0.018979,0.021913,0.179509,0.062613,0.037571,0.06195,0.216973,0.34618,0.143694,0.13615,0.037077,0.175667,0.033279,0.134375,0.206315,0.08827,0.2209,0.027007,0.386915,0.124121,0.352155,0.081736,0.228323,0.348219,0.218178,0.11082,0.020611,0.239324,0.085707,0.159653,0.392281
min,0.0,25.0,9.0,0.0,7.0,3.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,200.0,200.0,137.0,0.0,8.0,8.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,425.0,450.0,156.0,0.1,10.0,14.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,800.0,850.0,243.0,0.1,11.0,14.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,100000.0,100000.0,604.0,0.1,12.0,140.0,90.0,45.0,44.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


### Splitting the Dataset: Train & Test
To ensure that we can reliably test how good the model is at predicting outcomes, the dataset will be split into a train set (for training the model) and a test set to calculate performance metrics on.  A 70/30 split will be used.  

In [27]:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)

### Data Scaling 
For some of the models that I will be testing having a scaled datasets allows for the models to provide better predictions.  When the features aren't scaled the model can place greater importance on features that have a larger scale.  For this dataset I will be using the Min Max scaler to normalize the data since several of the freatures do not follow a normal distribution.   

In [29]:
#fit the scaler to the train dataset 
norm = MinMaxScaler().fit(X_train)

#transform the train dataset 
X_train_norm = norm.transform(X_train)

#transform the test dataset based on the scaler fitted to the train dataset to prevent any data leakage
X_test_norm = norm.transform(X_test)