# Previous Applications
## About the data
<blockquote>previous_application: This dataset has details of previous applications made by clients to Home Credit. Only those clients find place here who also exist in <i>application</i> data. Each current loan in the <i>application</i> data (identified by <i>SK_ID_CURR</i>) can have multiple previous loan applications. Each previous application has one row and is identified by the feature <i>SK_ID_PREV</i>.</blockquote> 


## Feature Explanations
<blockquote><p style="font-size:13px"> 
SK_ID_PREV : 	ID of previous credit in Home credit related to loan in our sample. (One loan in our sample can have 0,1,2 or more previous loan applications in Home Credit, previous application could, but not necessarily have to lead to credit) <br>						
SK_ID_CURR: 	ID of loan in our sample<br>						
NAME_CONTRACT_TYPE: 	Contract product type (Cash loan, consumer loan [POS] ,...) of the previous application<br>						
AMT_ANNUITY: 	Annuity of previous application<br>						
AMT_APPLICATION: 	For how much credit did client ask on the previous application<br>						
AMT_CREDIT: 	Final credit amount on the previous application. This differs from AMT_APPLICATION in a way that the AMT_APPLICATION is the amount for which the client initially applied for, but during our approval process he could have received different amount - AMT_CREDIT<br>						
AMT_DOWN_PAYMENT: 	Down payment on the previous application<br>						
AMT_GOODS_PRICE: 	Goods price of good that client asked for (if applicable) on the previous application<br>						
WEEKDAY_APPR_PROCESS_START: 	On which day of the week did the client apply for previous application<br>						
HOUR_APPR_PROCESS_START: 	Approximately at what day hour did the client apply for the previous application<br>						
FLAG_LAST_APPL_PER_CONTRACT: 	Flag if it was last application for the previous contract. Sometimes by mistake of client or our clerk there could be more applications for one single contract<br>						
NFLAG_LAST_APPL_IN_DAY: 	Flag if the application was the last application per day of the client. Sometimes clients apply for more applications a day. Rarely it could also be error in our system that one application is in the database twice<br>						
NFLAG_MICRO_CASH: 	Flag Micro finance loan<br>						
RATE_DOWN_PAYMENT: 	Down payment rate normalized on previous credit<br>						
RATE_INTEREST_PRIMARY: 	Interest rate normalized on previous credit<br>						
RATE_INTEREST_PRIVILEGED: 	Interest rate normalized on previous credit<br>						
NAME_CASH_LOAN_PURPOSE: 	Purpose of the cash loan<br>						
NAME_CONTRACT_STATUS: 	Contract status (approved, cancelled, ...) of previous application<br>						
DAYS_DECISION: 	Relative to current application when was the decision about previous application made<br>						
NAME_PAYMENT_TYPE: 	Payment method that client chose to pay for the previous application<br>						
CODE_REJECT_REASON: 	Why was the previous application rejected<br>						
NAME_TYPE_SUITE: 	Who accompanied client when applying for the previous application<br>						
NAME_CLIENT_TYPE: 	Was the client old or new client when applying for the previous application<br>						
NAME_GOODS_CATEGORY: 	What kind of goods did the client apply for in the previous application<br>						
NAME_PORTFOLIO: 	Was the previous application for CASH, POS, CAR, …<br>						
NAME_PRODUCT_TYPE: 	Was the previous application x-sell o walk-in<br>						
CHANNEL_TYPE: 	Through which channel we acquired the client on the previous application<br>						
SELLERPLACE_AREA: 	Selling area of seller place of the previous application<br>						
NAME_SELLER_INDUSTRY: 	The industry of the seller<br>						
CNT_PAYMENT: 	Term of previous credit at application of the previous application<br>						
NAME_YIELD_GROUP: 	Grouped interest rate into small medium and high of the previous application<br>						
PRODUCT_COMBINATION: 	Detailed product combination of the previous application<br>						
DAYS_FIRST_DRAWING: 	Relative to application date of current application when was the first disbursement of the previous application<br>						
DAYS_FIRST_DUE: 	Relative to application date of current application when was the first due supposed to be of the previous application<br>						
DAYS_LAST_DUE_1ST_VERSION: 	Relative to application date of current application when was the first due of the previous application<br>						
DAYS_LAST_DUE: 	Relative to application date of current application when was the last due date of the previous application<br>						
DAYS_TERMINATION: 	Relative to application date of current application when was the expected termination of the previous application<br>						
NFLAG_INSURED_ON_APPROVAL: 	Did the client requested insurance during the previous application<br>	</p></blockquote>

In [1]:
# Last amended: 24rd October, 2020
# Myfolder: C:\Users\Administrator\OneDrive\Documents\home_credit_default_risk
# Objective: 
#           Solving Kaggle problem: Home Credit Default Risk
#           Processing previous_application dataset
#
# Data Source: https://www.kaggle.com/c/home-credit-default-risk/data
# Ref: https://www.kaggle.com/jsaguiar/lightgbm-with-simple-features

In [39]:
# 1.0 Libraries
#     (Some of these may not be needed here.)
%reset -f
import numpy as np
import pandas as pd
import gc

# 1.1 Reduce read data size
#     There is a file reducing.py
#      in this folder. A class
#       in it is used to reduce
#        dataframe size
#     (Code modified by me to
#      exclude 'category' dtype)
import reducing

# 1.2 Misc
import warnings
import os
warnings.simplefilter(action='ignore', category=FutureWarning)


In [40]:
# 1.3
pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_columns', 100)

In [41]:
# 1.4 Display multiple commands outputs from a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [42]:
# 2.0 Onehot encoding (OHE) function. Uses pd.get_dummies()
#     i) To transform 'object' columns to dummies. 
#    ii) Treat NaN as one of the categories
#   iii) Returns transformed-data and new-columns created

def one_hot_encoder(df, nan_as_category = True):
    original_columns = list(df.columns)
    categorical_columns = [col for col in df.columns if df[col].dtype == 'object']
    df = pd.get_dummies(df,
                        columns= categorical_columns,
                        dummy_na= nan_as_category       # Treat NaNs as category
                       )
    new_columns = [c for c in df.columns if c not in original_columns]
    return df, new_columns

In [43]:
# 2.1
pathToFolder = "C:\\Users\\Administrator\\OneDrive\\Documents\\home_credit_default_risk"
os.chdir(pathToFolder)

In [44]:
# 2.2 Some constants
num_rows=None                # Implies read all rows
nan_as_category = True       # While transforming 
                             #   'object' columns to dummies

In [45]:
# 3.0 Read previous application data first
prev = pd.read_csv(
                   'previous_application.csv.zip',
                   nrows = num_rows
                   )

# 3.0.1 Reduce memory usage by appropriately
#       changing data-types per feature:

prev = reducing.Reducer().reduce(prev)

reduced df from 471.4808 MB to 414.1386 MB in 3.82 seconds


In [46]:
# 3.0.2
prev.shape             # (rows=16,70,214, cols = 37)
prev.head(5)
prev.columns

(1670214, 37)

Unnamed: 0,SK_ID_PREV,SK_ID_CURR,NAME_CONTRACT_TYPE,AMT_ANNUITY,AMT_APPLICATION,AMT_CREDIT,AMT_DOWN_PAYMENT,AMT_GOODS_PRICE,WEEKDAY_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,FLAG_LAST_APPL_PER_CONTRACT,NFLAG_LAST_APPL_IN_DAY,RATE_DOWN_PAYMENT,RATE_INTEREST_PRIMARY,RATE_INTEREST_PRIVILEGED,NAME_CASH_LOAN_PURPOSE,NAME_CONTRACT_STATUS,DAYS_DECISION,NAME_PAYMENT_TYPE,CODE_REJECT_REASON,NAME_TYPE_SUITE,NAME_CLIENT_TYPE,NAME_GOODS_CATEGORY,NAME_PORTFOLIO,NAME_PRODUCT_TYPE,CHANNEL_TYPE,SELLERPLACE_AREA,NAME_SELLER_INDUSTRY,CNT_PAYMENT,NAME_YIELD_GROUP,PRODUCT_COMBINATION,DAYS_FIRST_DRAWING,DAYS_FIRST_DUE,DAYS_LAST_DUE_1ST_VERSION,DAYS_LAST_DUE,DAYS_TERMINATION,NFLAG_INSURED_ON_APPROVAL
0,2030495,271877,Consumer loans,1730.43,17145.0,17145.0,0.0,17145.0,SATURDAY,15,Y,1,0.0,0.182832,0.867336,XAP,Approved,-73,Cash through the bank,XAP,,Repeater,Mobile,POS,XNA,Country-wide,35,Connectivity,12.0,middle,POS mobile with interest,365243.0,-42.0,300.0,-42.0,-37.0,0.0
1,2802425,108129,Cash loans,25188.615,607500.0,679671.0,,607500.0,THURSDAY,11,Y,1,,,,XNA,Approved,-164,XNA,XAP,Unaccompanied,Repeater,XNA,Cash,x-sell,Contact center,-1,XNA,36.0,low_action,Cash X-Sell: low,365243.0,-134.0,916.0,365243.0,365243.0,1.0
2,2523466,122040,Cash loans,15060.735,112500.0,136444.5,,112500.0,TUESDAY,11,Y,1,,,,XNA,Approved,-301,Cash through the bank,XAP,"Spouse, partner",Repeater,XNA,Cash,x-sell,Credit and cash offices,-1,XNA,12.0,high,Cash X-Sell: high,365243.0,-271.0,59.0,365243.0,365243.0,1.0
3,2819243,176158,Cash loans,47041.335,450000.0,470790.0,,450000.0,MONDAY,7,Y,1,,,,XNA,Approved,-512,Cash through the bank,XAP,,Repeater,XNA,Cash,x-sell,Credit and cash offices,-1,XNA,12.0,middle,Cash X-Sell: middle,365243.0,-482.0,-152.0,-182.0,-177.0,1.0
4,1784265,202054,Cash loans,31924.395,337500.0,404055.0,,337500.0,THURSDAY,9,Y,1,,,,Repairs,Refused,-781,Cash through the bank,HC,,Repeater,XNA,Cash,walk-in,Credit and cash offices,-1,XNA,24.0,high,Cash Street: high,,,,,,


Index(['SK_ID_PREV', 'SK_ID_CURR', 'NAME_CONTRACT_TYPE', 'AMT_ANNUITY',
       'AMT_APPLICATION', 'AMT_CREDIT', 'AMT_DOWN_PAYMENT', 'AMT_GOODS_PRICE',
       'WEEKDAY_APPR_PROCESS_START', 'HOUR_APPR_PROCESS_START',
       'FLAG_LAST_APPL_PER_CONTRACT', 'NFLAG_LAST_APPL_IN_DAY',
       'RATE_DOWN_PAYMENT', 'RATE_INTEREST_PRIMARY',
       'RATE_INTEREST_PRIVILEGED', 'NAME_CASH_LOAN_PURPOSE',
       'NAME_CONTRACT_STATUS', 'DAYS_DECISION', 'NAME_PAYMENT_TYPE',
       'CODE_REJECT_REASON', 'NAME_TYPE_SUITE', 'NAME_CLIENT_TYPE',
       'NAME_GOODS_CATEGORY', 'NAME_PORTFOLIO', 'NAME_PRODUCT_TYPE',
       'CHANNEL_TYPE', 'SELLERPLACE_AREA', 'NAME_SELLER_INDUSTRY',
       'CNT_PAYMENT', 'NAME_YIELD_GROUP', 'PRODUCT_COMBINATION',
       'DAYS_FIRST_DRAWING', 'DAYS_FIRST_DUE', 'DAYS_LAST_DUE_1ST_VERSION',
       'DAYS_LAST_DUE', 'DAYS_TERMINATION', 'NFLAG_INSURED_ON_APPROVAL'],
      dtype='object')

In [47]:
# 3.1 Let us examine how many unique IDs exist 

prev['SK_ID_PREV'].nunique()   # 1670214 Unique number
prev['SK_ID_CURR'].nunique()   # 338857  So a number of repeat exist
                               # We have to aggregate over it
                               #  to extract behaviour of clients

1670214

338857

In [48]:
# 3.2 Let us see distribution of dtypes
#     There are 16 'object' types here

prev.dtypes.value_counts()

object     16
float64    14
uint8      2 
uint32     2 
float32    1 
int16      1 
int32      1 
dtype: int64

In [49]:
# 3.3
prev.shape                       # (1670214, 37)

# 3.3.1
# What is the actual number of persons
#  who might have taken multiple loans?

prev['SK_ID_CURR'].nunique()     # 338857  -- Many duplicate values exist
                                   #            Consider SK_ID_CURR as Foreign Key
                                   #            Primary key exists in application_train data
                                   # Primary key: SK_ID_BUREAU
            
# 3.3.2
# As expected, there are no duplicate values here

prev['SK_ID_PREV'].nunique()   # 1670214 -- Unique id for each row 

(1670214, 37)

338857

1670214

In [50]:
# 4.0 OneHotEncode (OHE) 'object' types in bureau

prev, cat_cols = one_hot_encoder(
                                 prev,
                                 nan_as_category= True
                                 )

In [52]:
# 4.1

len(cat_cols)      # 159
cat_cols

159

['NAME_CONTRACT_TYPE_Cash loans',
 'NAME_CONTRACT_TYPE_Consumer loans',
 'NAME_CONTRACT_TYPE_Revolving loans',
 'NAME_CONTRACT_TYPE_XNA',
 'NAME_CONTRACT_TYPE_nan',
 'WEEKDAY_APPR_PROCESS_START_FRIDAY',
 'WEEKDAY_APPR_PROCESS_START_MONDAY',
 'WEEKDAY_APPR_PROCESS_START_SATURDAY',
 'WEEKDAY_APPR_PROCESS_START_SUNDAY',
 'WEEKDAY_APPR_PROCESS_START_THURSDAY',
 'WEEKDAY_APPR_PROCESS_START_TUESDAY',
 'WEEKDAY_APPR_PROCESS_START_WEDNESDAY',
 'WEEKDAY_APPR_PROCESS_START_nan',
 'FLAG_LAST_APPL_PER_CONTRACT_N',
 'FLAG_LAST_APPL_PER_CONTRACT_Y',
 'FLAG_LAST_APPL_PER_CONTRACT_nan',
 'NAME_CASH_LOAN_PURPOSE_Building a house or an annex',
 'NAME_CASH_LOAN_PURPOSE_Business development',
 'NAME_CASH_LOAN_PURPOSE_Buying a garage',
 'NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land',
 'NAME_CASH_LOAN_PURPOSE_Buying a home',
 'NAME_CASH_LOAN_PURPOSE_Buying a new car',
 'NAME_CASH_LOAN_PURPOSE_Buying a used car',
 'NAME_CASH_LOAN_PURPOSE_Car repairs',
 'NAME_CASH_LOAN_PURPOSE_Education',
 'NAME_CASH_L

In [53]:
# 4.2.1 Just examine NULLs in few features
prev['DAYS_FIRST_DRAWING'].isnull().sum()     # 673065
# 4.2.2 And also this special constant value: 365243
(prev['DAYS_FIRST_DRAWING'] == 365243).sum()  # 934444

prev['DAYS_FIRST_DUE'].isnull().sum()         # 673065
(prev['DAYS_FIRST_DUE'] == 365243).sum()      #  40645

prev['DAYS_LAST_DUE'].isnull().sum()          # 673065
(prev['DAYS_LAST_DUE'] == 365243).sum()       # 211221

prev['DAYS_TERMINATION'].isnull().sum()       # 673065
(prev['DAYS_TERMINATION']== 365243).sum()     # 225913

673065

934444

673065

40645

673065

211221

673065

225913

In [54]:
# 4.3 Examine total number of unique values
#     in each one of the above four features

prev['DAYS_FIRST_DRAWING'].nunique()     # 2838
prev['DAYS_FIRST_DRAWING'].sort_values(ascending = False)[:5]
prev['DAYS_FIRST_DUE'].nunique()         # 2892
prev['DAYS_LAST_DUE'].nunique()          # 2873
prev['DAYS_TERMINATION'].nunique()       # 2830

2838

1670213    365243.0
572469     365243.0
572494     365243.0
572491     365243.0
572489     365243.0
Name: DAYS_FIRST_DRAWING, dtype: float64

2892

2873

2830

In [55]:
# 4.4 Convert Days 365243 values to nan

prev['DAYS_FIRST_DRAWING'].replace(365243, np.nan, inplace= True)
prev['DAYS_FIRST_DUE'].replace(365243, np.nan, inplace= True)
prev['DAYS_LAST_DUE_1ST_VERSION'].replace(365243, np.nan, inplace= True)
prev['DAYS_LAST_DUE'].replace(365243, np.nan, inplace= True)
prev['DAYS_TERMINATION'].replace(365243, np.nan, inplace= True)

In [56]:
# 4.5 So how many NULLS now exist in each one of
#     these four features:

prev['DAYS_FIRST_DRAWING'].isnull().sum()     # 1607509
prev['DAYS_FIRST_DUE'].isnull().sum()         #  713710
prev['DAYS_LAST_DUE'].isnull().sum()          #  884286
prev['DAYS_TERMINATION'].isnull().sum()       #  898978

1607509

713710

884286

898978

## Perform aggregations
<blockquote>On the whole of dataset, perform aggregations for numerical features and perform aggregations on just created OHE features. Numerical features are being aggregated as: <i>min, max, mean..</i> while OHE features aggregation is just <i>'mean'</i>.</blockquote>

In [57]:
# 5.0 One special feature
#     Add feature: value ask / value received percentage

prev['APP_CREDIT_PERC'] = prev['AMT_APPLICATION'] / prev['AMT_CREDIT']

# 5.1 Numeric features aggregations:
#     Dictionary of what all operations are to 
#     performed on numerical features:

num_aggregations = {
                     'AMT_ANNUITY':             ['min', 'max', 'mean'],
                     'AMT_APPLICATION':         ['min', 'max', 'mean'],
                     'AMT_CREDIT':              ['min', 'max', 'mean'],
                     'APP_CREDIT_PERC':         ['min', 'max', 'mean', 'var'],
                     'AMT_DOWN_PAYMENT':        ['min', 'max', 'mean'],
                     'AMT_GOODS_PRICE':         ['min', 'max', 'mean'],
                     'HOUR_APPR_PROCESS_START': ['min', 'max', 'mean'],
                     'RATE_DOWN_PAYMENT':       ['min', 'max', 'mean'],
                     'DAYS_DECISION':           ['min', 'max', 'mean'],
                     'CNT_PAYMENT':             ['mean', 'sum'],
                    }

In [58]:
# 5.2 Categorical features
#     Create a dictionary for aggregation operations:

cat_aggregations = {}
for cat in cat_cols:
    cat_aggregations[cat] = ['mean']

# 5.2.1    
cat_aggregations    

{'NAME_CONTRACT_TYPE_Cash loans': ['mean'],
 'NAME_CONTRACT_TYPE_Consumer loans': ['mean'],
 'NAME_CONTRACT_TYPE_Revolving loans': ['mean'],
 'NAME_CONTRACT_TYPE_XNA': ['mean'],
 'NAME_CONTRACT_TYPE_nan': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_FRIDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_MONDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_SATURDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_SUNDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_THURSDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_TUESDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_WEDNESDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_nan': ['mean'],
 'FLAG_LAST_APPL_PER_CONTRACT_N': ['mean'],
 'FLAG_LAST_APPL_PER_CONTRACT_Y': ['mean'],
 'FLAG_LAST_APPL_PER_CONTRACT_nan': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Building a house or an annex': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Business development': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Buying a garage': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land': ['mean'],
 'NAME_CASH_LOAN_PU

In [59]:
# 5.3 Perform aggregation now on SK_ID_CURR:

grouped = prev.groupby('SK_ID_CURR')
prev_agg=grouped.agg({**num_aggregations, **cat_aggregations})


In [60]:
# 5.3.1
prev_agg.shape    # (338857, 189)
prev_agg.columns
prev_agg.head()

(338857, 189)

MultiIndex([(                                       'AMT_ANNUITY',  'min'),
            (                                       'AMT_ANNUITY',  'max'),
            (                                       'AMT_ANNUITY', 'mean'),
            (                                   'AMT_APPLICATION',  'min'),
            (                                   'AMT_APPLICATION',  'max'),
            (                                   'AMT_APPLICATION', 'mean'),
            (                                        'AMT_CREDIT',  'min'),
            (                                        'AMT_CREDIT',  'max'),
            (                                        'AMT_CREDIT', 'mean'),
            (                                   'APP_CREDIT_PERC',  'min'),
            ...
            (           'PRODUCT_COMBINATION_Cash X-Sell: middle', 'mean'),
            (   'PRODUCT_COMBINATION_POS household with interest', 'mean'),
            ('PRODUCT_COMBINATION_POS household without interest', 'mean

Unnamed: 0_level_0,AMT_ANNUITY,AMT_ANNUITY,AMT_ANNUITY,AMT_APPLICATION,AMT_APPLICATION,AMT_APPLICATION,AMT_CREDIT,AMT_CREDIT,AMT_CREDIT,APP_CREDIT_PERC,APP_CREDIT_PERC,APP_CREDIT_PERC,APP_CREDIT_PERC,AMT_DOWN_PAYMENT,AMT_DOWN_PAYMENT,AMT_DOWN_PAYMENT,AMT_GOODS_PRICE,AMT_GOODS_PRICE,AMT_GOODS_PRICE,HOUR_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,RATE_DOWN_PAYMENT,RATE_DOWN_PAYMENT,RATE_DOWN_PAYMENT,DAYS_DECISION,DAYS_DECISION,DAYS_DECISION,CNT_PAYMENT,CNT_PAYMENT,NAME_CONTRACT_TYPE_Cash loans,NAME_CONTRACT_TYPE_Consumer loans,NAME_CONTRACT_TYPE_Revolving loans,NAME_CONTRACT_TYPE_XNA,NAME_CONTRACT_TYPE_nan,WEEKDAY_APPR_PROCESS_START_FRIDAY,WEEKDAY_APPR_PROCESS_START_MONDAY,WEEKDAY_APPR_PROCESS_START_SATURDAY,WEEKDAY_APPR_PROCESS_START_SUNDAY,WEEKDAY_APPR_PROCESS_START_THURSDAY,WEEKDAY_APPR_PROCESS_START_TUESDAY,WEEKDAY_APPR_PROCESS_START_WEDNESDAY,WEEKDAY_APPR_PROCESS_START_nan,FLAG_LAST_APPL_PER_CONTRACT_N,FLAG_LAST_APPL_PER_CONTRACT_Y,FLAG_LAST_APPL_PER_CONTRACT_nan,NAME_CASH_LOAN_PURPOSE_Building a house or an annex,NAME_CASH_LOAN_PURPOSE_Business development,NAME_CASH_LOAN_PURPOSE_Buying a garage,NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land,...,NAME_PORTFOLIO_nan,NAME_PRODUCT_TYPE_XNA,NAME_PRODUCT_TYPE_walk-in,NAME_PRODUCT_TYPE_x-sell,NAME_PRODUCT_TYPE_nan,CHANNEL_TYPE_AP+ (Cash loan),CHANNEL_TYPE_Car dealer,CHANNEL_TYPE_Channel of corporate sales,CHANNEL_TYPE_Contact center,CHANNEL_TYPE_Country-wide,CHANNEL_TYPE_Credit and cash offices,CHANNEL_TYPE_Regional / Local,CHANNEL_TYPE_Stone,CHANNEL_TYPE_nan,NAME_SELLER_INDUSTRY_Auto technology,NAME_SELLER_INDUSTRY_Clothing,NAME_SELLER_INDUSTRY_Connectivity,NAME_SELLER_INDUSTRY_Construction,NAME_SELLER_INDUSTRY_Consumer electronics,NAME_SELLER_INDUSTRY_Furniture,NAME_SELLER_INDUSTRY_Industry,NAME_SELLER_INDUSTRY_Jewelry,NAME_SELLER_INDUSTRY_MLM partners,NAME_SELLER_INDUSTRY_Tourism,NAME_SELLER_INDUSTRY_XNA,NAME_SELLER_INDUSTRY_nan,NAME_YIELD_GROUP_XNA,NAME_YIELD_GROUP_high,NAME_YIELD_GROUP_low_action,NAME_YIELD_GROUP_low_normal,NAME_YIELD_GROUP_middle,NAME_YIELD_GROUP_nan,PRODUCT_COMBINATION_Card Street,PRODUCT_COMBINATION_Card X-Sell,PRODUCT_COMBINATION_Cash,PRODUCT_COMBINATION_Cash Street: high,PRODUCT_COMBINATION_Cash Street: low,PRODUCT_COMBINATION_Cash Street: middle,PRODUCT_COMBINATION_Cash X-Sell: high,PRODUCT_COMBINATION_Cash X-Sell: low,PRODUCT_COMBINATION_Cash X-Sell: middle,PRODUCT_COMBINATION_POS household with interest,PRODUCT_COMBINATION_POS household without interest,PRODUCT_COMBINATION_POS industry with interest,PRODUCT_COMBINATION_POS industry without interest,PRODUCT_COMBINATION_POS mobile with interest,PRODUCT_COMBINATION_POS mobile without interest,PRODUCT_COMBINATION_POS other with interest,PRODUCT_COMBINATION_POS others without interest,PRODUCT_COMBINATION_nan
Unnamed: 0_level_1,min,max,mean,min,max,mean,min,max,mean,min,max,mean,var,min,max,mean,min,max,mean,min,max,mean,min,max,mean,min,max,mean,mean,sum,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,...,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean,mean
SK_ID_CURR,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2,Unnamed: 31_level_2,Unnamed: 32_level_2,Unnamed: 33_level_2,Unnamed: 34_level_2,Unnamed: 35_level_2,Unnamed: 36_level_2,Unnamed: 37_level_2,Unnamed: 38_level_2,Unnamed: 39_level_2,Unnamed: 40_level_2,Unnamed: 41_level_2,Unnamed: 42_level_2,Unnamed: 43_level_2,Unnamed: 44_level_2,Unnamed: 45_level_2,Unnamed: 46_level_2,Unnamed: 47_level_2,Unnamed: 48_level_2,Unnamed: 49_level_2,Unnamed: 50_level_2,Unnamed: 51_level_2,Unnamed: 52_level_2,Unnamed: 53_level_2,Unnamed: 54_level_2,Unnamed: 55_level_2,Unnamed: 56_level_2,Unnamed: 57_level_2,Unnamed: 58_level_2,Unnamed: 59_level_2,Unnamed: 60_level_2,Unnamed: 61_level_2,Unnamed: 62_level_2,Unnamed: 63_level_2,Unnamed: 64_level_2,Unnamed: 65_level_2,Unnamed: 66_level_2,Unnamed: 67_level_2,Unnamed: 68_level_2,Unnamed: 69_level_2,Unnamed: 70_level_2,Unnamed: 71_level_2,Unnamed: 72_level_2,Unnamed: 73_level_2,Unnamed: 74_level_2,Unnamed: 75_level_2,Unnamed: 76_level_2,Unnamed: 77_level_2,Unnamed: 78_level_2,Unnamed: 79_level_2,Unnamed: 80_level_2,Unnamed: 81_level_2,Unnamed: 82_level_2,Unnamed: 83_level_2,Unnamed: 84_level_2,Unnamed: 85_level_2,Unnamed: 86_level_2,Unnamed: 87_level_2,Unnamed: 88_level_2,Unnamed: 89_level_2,Unnamed: 90_level_2,Unnamed: 91_level_2,Unnamed: 92_level_2,Unnamed: 93_level_2,Unnamed: 94_level_2,Unnamed: 95_level_2,Unnamed: 96_level_2,Unnamed: 97_level_2,Unnamed: 98_level_2,Unnamed: 99_level_2,Unnamed: 100_level_2,Unnamed: 101_level_2
100001,3951.0,3951.0,3951.0,24835.5,24835.5,24835.5,23787.0,23787.0,23787.0,1.044079,1.044079,1.044079,,2520.0,2520.0,2520.0,24835.5,24835.5,24835.5,13,13,13.0,0.104326,0.104326,0.104326,-1740,-1740,-1740.0,8.0,8.0,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
100002,9251.775,9251.775,9251.775,179055.0,179055.0,179055.0,179055.0,179055.0,179055.0,1.0,1.0,1.0,,0.0,0.0,0.0,179055.0,179055.0,179055.0,9,9,9.0,0.0,0.0,0.0,-606,-606,-606.0,24.0,24.0,0.0,1.0,0.0,0.0,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,1.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
100003,6737.31,98356.995,56553.99,68809.5,900000.0,435436.5,68053.5,1035882.0,484191.0,0.868825,1.011109,0.949329,0.005324,0.0,6885.0,3442.5,68809.5,900000.0,435436.5,12,17,14.666667,0.0,0.100061,0.05003,-2341,-746,-1305.0,10.0,30.0,0.333333,0.666667,0.0,0.0,0,0.333333,0.0,0.333333,0.333333,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,0.666667,0.0,0.333333,0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,0.333333,0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,0.0,0.0,0.0,0.333333,0,0.0,0.0,0.0,0.333333,0.666667,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0
100004,5357.25,5357.25,5357.25,24282.0,24282.0,24282.0,20106.0,20106.0,20106.0,1.207699,1.207699,1.207699,,4860.0,4860.0,4860.0,24282.0,24282.0,24282.0,5,5,5.0,0.212008,0.212008,0.212008,-815,-815,-815.0,4.0,4.0,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,1.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
100005,4813.2,4813.2,4813.2,0.0,44617.5,22308.75,0.0,40153.5,20076.75,1.111173,1.111173,1.111173,,4464.0,4464.0,4464.0,44617.5,44617.5,44617.5,10,11,10.5,0.108964,0.108964,0.108964,-757,-315,-536.0,12.0,12.0,0.5,0.5,0.0,0.0,0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0,0.5,0.5,0.0,0.0,0.0,0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0


In [61]:
# 5.4 Rename multiindex columns:

prev_agg.columns = pd.Index(['PREV_' + e[0] + "_" + e[1].upper() for e in prev_agg.columns.tolist()])

In [62]:
# 5.5
prev_agg.shape      # (338857, 189)
prev_agg.columns
prev_agg.head()

(338857, 189)

Index(['PREV_AMT_ANNUITY_MIN', 'PREV_AMT_ANNUITY_MAX', 'PREV_AMT_ANNUITY_MEAN',
       'PREV_AMT_APPLICATION_MIN', 'PREV_AMT_APPLICATION_MAX',
       'PREV_AMT_APPLICATION_MEAN', 'PREV_AMT_CREDIT_MIN',
       'PREV_AMT_CREDIT_MAX', 'PREV_AMT_CREDIT_MEAN',
       'PREV_APP_CREDIT_PERC_MIN',
       ...
       'PREV_PRODUCT_COMBINATION_Cash X-Sell: middle_MEAN',
       'PREV_PRODUCT_COMBINATION_POS household with interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS household without interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS industry with interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS industry without interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS mobile with interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS mobile without interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS other with interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS others without interest_MEAN',
       'PREV_PRODUCT_COMBINATION_nan_MEAN'],
      dtype='object', length=189)

Unnamed: 0_level_0,PREV_AMT_ANNUITY_MIN,PREV_AMT_ANNUITY_MAX,PREV_AMT_ANNUITY_MEAN,PREV_AMT_APPLICATION_MIN,PREV_AMT_APPLICATION_MAX,PREV_AMT_APPLICATION_MEAN,PREV_AMT_CREDIT_MIN,PREV_AMT_CREDIT_MAX,PREV_AMT_CREDIT_MEAN,PREV_APP_CREDIT_PERC_MIN,PREV_APP_CREDIT_PERC_MAX,PREV_APP_CREDIT_PERC_MEAN,PREV_APP_CREDIT_PERC_VAR,PREV_AMT_DOWN_PAYMENT_MIN,PREV_AMT_DOWN_PAYMENT_MAX,PREV_AMT_DOWN_PAYMENT_MEAN,PREV_AMT_GOODS_PRICE_MIN,PREV_AMT_GOODS_PRICE_MAX,PREV_AMT_GOODS_PRICE_MEAN,PREV_HOUR_APPR_PROCESS_START_MIN,PREV_HOUR_APPR_PROCESS_START_MAX,PREV_HOUR_APPR_PROCESS_START_MEAN,PREV_RATE_DOWN_PAYMENT_MIN,PREV_RATE_DOWN_PAYMENT_MAX,PREV_RATE_DOWN_PAYMENT_MEAN,PREV_DAYS_DECISION_MIN,PREV_DAYS_DECISION_MAX,PREV_DAYS_DECISION_MEAN,PREV_CNT_PAYMENT_MEAN,PREV_CNT_PAYMENT_SUM,PREV_NAME_CONTRACT_TYPE_Cash loans_MEAN,PREV_NAME_CONTRACT_TYPE_Consumer loans_MEAN,PREV_NAME_CONTRACT_TYPE_Revolving loans_MEAN,PREV_NAME_CONTRACT_TYPE_XNA_MEAN,PREV_NAME_CONTRACT_TYPE_nan_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_FRIDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_MONDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_SATURDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_THURSDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_TUESDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_WEDNESDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_nan_MEAN,PREV_FLAG_LAST_APPL_PER_CONTRACT_N_MEAN,PREV_FLAG_LAST_APPL_PER_CONTRACT_Y_MEAN,PREV_FLAG_LAST_APPL_PER_CONTRACT_nan_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Building a house or an annex_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Business development_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Buying a garage_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land_MEAN,...,PREV_NAME_PORTFOLIO_nan_MEAN,PREV_NAME_PRODUCT_TYPE_XNA_MEAN,PREV_NAME_PRODUCT_TYPE_walk-in_MEAN,PREV_NAME_PRODUCT_TYPE_x-sell_MEAN,PREV_NAME_PRODUCT_TYPE_nan_MEAN,PREV_CHANNEL_TYPE_AP+ (Cash loan)_MEAN,PREV_CHANNEL_TYPE_Car dealer_MEAN,PREV_CHANNEL_TYPE_Channel of corporate sales_MEAN,PREV_CHANNEL_TYPE_Contact center_MEAN,PREV_CHANNEL_TYPE_Country-wide_MEAN,PREV_CHANNEL_TYPE_Credit and cash offices_MEAN,PREV_CHANNEL_TYPE_Regional / Local_MEAN,PREV_CHANNEL_TYPE_Stone_MEAN,PREV_CHANNEL_TYPE_nan_MEAN,PREV_NAME_SELLER_INDUSTRY_Auto technology_MEAN,PREV_NAME_SELLER_INDUSTRY_Clothing_MEAN,PREV_NAME_SELLER_INDUSTRY_Connectivity_MEAN,PREV_NAME_SELLER_INDUSTRY_Construction_MEAN,PREV_NAME_SELLER_INDUSTRY_Consumer electronics_MEAN,PREV_NAME_SELLER_INDUSTRY_Furniture_MEAN,PREV_NAME_SELLER_INDUSTRY_Industry_MEAN,PREV_NAME_SELLER_INDUSTRY_Jewelry_MEAN,PREV_NAME_SELLER_INDUSTRY_MLM partners_MEAN,PREV_NAME_SELLER_INDUSTRY_Tourism_MEAN,PREV_NAME_SELLER_INDUSTRY_XNA_MEAN,PREV_NAME_SELLER_INDUSTRY_nan_MEAN,PREV_NAME_YIELD_GROUP_XNA_MEAN,PREV_NAME_YIELD_GROUP_high_MEAN,PREV_NAME_YIELD_GROUP_low_action_MEAN,PREV_NAME_YIELD_GROUP_low_normal_MEAN,PREV_NAME_YIELD_GROUP_middle_MEAN,PREV_NAME_YIELD_GROUP_nan_MEAN,PREV_PRODUCT_COMBINATION_Card Street_MEAN,PREV_PRODUCT_COMBINATION_Card X-Sell_MEAN,PREV_PRODUCT_COMBINATION_Cash_MEAN,PREV_PRODUCT_COMBINATION_Cash Street: high_MEAN,PREV_PRODUCT_COMBINATION_Cash Street: low_MEAN,PREV_PRODUCT_COMBINATION_Cash Street: middle_MEAN,PREV_PRODUCT_COMBINATION_Cash X-Sell: high_MEAN,PREV_PRODUCT_COMBINATION_Cash X-Sell: low_MEAN,PREV_PRODUCT_COMBINATION_Cash X-Sell: middle_MEAN,PREV_PRODUCT_COMBINATION_POS household with interest_MEAN,PREV_PRODUCT_COMBINATION_POS household without interest_MEAN,PREV_PRODUCT_COMBINATION_POS industry with interest_MEAN,PREV_PRODUCT_COMBINATION_POS industry without interest_MEAN,PREV_PRODUCT_COMBINATION_POS mobile with interest_MEAN,PREV_PRODUCT_COMBINATION_POS mobile without interest_MEAN,PREV_PRODUCT_COMBINATION_POS other with interest_MEAN,PREV_PRODUCT_COMBINATION_POS others without interest_MEAN,PREV_PRODUCT_COMBINATION_nan_MEAN
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1
100001,3951.0,3951.0,3951.0,24835.5,24835.5,24835.5,23787.0,23787.0,23787.0,1.044079,1.044079,1.044079,,2520.0,2520.0,2520.0,24835.5,24835.5,24835.5,13,13,13.0,0.104326,0.104326,0.104326,-1740,-1740,-1740.0,8.0,8.0,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
100002,9251.775,9251.775,9251.775,179055.0,179055.0,179055.0,179055.0,179055.0,179055.0,1.0,1.0,1.0,,0.0,0.0,0.0,179055.0,179055.0,179055.0,9,9,9.0,0.0,0.0,0.0,-606,-606,-606.0,24.0,24.0,0.0,1.0,0.0,0.0,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,1.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
100003,6737.31,98356.995,56553.99,68809.5,900000.0,435436.5,68053.5,1035882.0,484191.0,0.868825,1.011109,0.949329,0.005324,0.0,6885.0,3442.5,68809.5,900000.0,435436.5,12,17,14.666667,0.0,0.100061,0.05003,-2341,-746,-1305.0,10.0,30.0,0.333333,0.666667,0.0,0.0,0,0.333333,0.0,0.333333,0.333333,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,0.666667,0.0,0.333333,0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,0.333333,0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,0.0,0.0,0.0,0.333333,0,0.0,0.0,0.0,0.333333,0.666667,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0
100004,5357.25,5357.25,5357.25,24282.0,24282.0,24282.0,20106.0,20106.0,20106.0,1.207699,1.207699,1.207699,,4860.0,4860.0,4860.0,24282.0,24282.0,24282.0,5,5,5.0,0.212008,0.212008,0.212008,-815,-815,-815.0,4.0,4.0,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,0.0,0.0,0.0,1.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
100005,4813.2,4813.2,4813.2,0.0,44617.5,22308.75,0.0,40153.5,20076.75,1.111173,1.111173,1.111173,,4464.0,4464.0,4464.0,44617.5,44617.5,44617.5,10,11,10.5,0.108964,0.108964,0.108964,-757,-315,-536.0,12.0,12.0,0.5,0.5,0.0,0.0,0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0,1.0,0.0,0.0,0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0,0.5,0.5,0.0,0.0,0.0,0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0


## More aggregations
<blockquote>Table, <i>prev_agg</i>, from previous operations is our main table that we will carry to next exercise. To this aggregated table, we add more aggregations. <br>We will perform aggregations on two subsets of data. On both the subsets on numerical features only. One subset is extracted by setting <i>NAME_CONTRACT_STATUS_Approved == 1</i> and the other subset is extracted by setting <i>NAME_CONTRACT_STATUS_Refused == 1</i>.<br><br>It is as if we are trying to extract the behaviour of those whose previous applications have been approved and those whose previous applications have NOT been approved.</blockquote>


In [67]:
# 6.0 Previous Applications: Summarise numerical features from Approved Applications

approved = prev[prev['NAME_CONTRACT_STATUS_Approved'] == 1]
approved_agg = approved.groupby('SK_ID_CURR').agg(num_aggregations)

In [68]:
# 6.1 Look at the aggregated results:

approved_agg.columns
approved_agg.head()

MultiIndex([(            'AMT_ANNUITY',  'min'),
            (            'AMT_ANNUITY',  'max'),
            (            'AMT_ANNUITY', 'mean'),
            (        'AMT_APPLICATION',  'min'),
            (        'AMT_APPLICATION',  'max'),
            (        'AMT_APPLICATION', 'mean'),
            (             'AMT_CREDIT',  'min'),
            (             'AMT_CREDIT',  'max'),
            (             'AMT_CREDIT', 'mean'),
            (        'APP_CREDIT_PERC',  'min'),
            (        'APP_CREDIT_PERC',  'max'),
            (        'APP_CREDIT_PERC', 'mean'),
            (        'APP_CREDIT_PERC',  'var'),
            (       'AMT_DOWN_PAYMENT',  'min'),
            (       'AMT_DOWN_PAYMENT',  'max'),
            (       'AMT_DOWN_PAYMENT', 'mean'),
            (        'AMT_GOODS_PRICE',  'min'),
            (        'AMT_GOODS_PRICE',  'max'),
            (        'AMT_GOODS_PRICE', 'mean'),
            ('HOUR_APPR_PROCESS_START',  'min'),
            ('HOUR_A

Unnamed: 0_level_0,AMT_ANNUITY,AMT_ANNUITY,AMT_ANNUITY,AMT_APPLICATION,AMT_APPLICATION,AMT_APPLICATION,AMT_CREDIT,AMT_CREDIT,AMT_CREDIT,APP_CREDIT_PERC,APP_CREDIT_PERC,APP_CREDIT_PERC,APP_CREDIT_PERC,AMT_DOWN_PAYMENT,AMT_DOWN_PAYMENT,AMT_DOWN_PAYMENT,AMT_GOODS_PRICE,AMT_GOODS_PRICE,AMT_GOODS_PRICE,HOUR_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,RATE_DOWN_PAYMENT,RATE_DOWN_PAYMENT,RATE_DOWN_PAYMENT,DAYS_DECISION,DAYS_DECISION,DAYS_DECISION,CNT_PAYMENT,CNT_PAYMENT
Unnamed: 0_level_1,min,max,mean,min,max,mean,min,max,mean,min,max,mean,var,min,max,mean,min,max,mean,min,max,mean,min,max,mean,min,max,mean,mean,sum
SK_ID_CURR,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2
100001,3951.0,3951.0,3951.0,24835.5,24835.5,24835.5,23787.0,23787.0,23787.0,1.044079,1.044079,1.044079,,2520.0,2520.0,2520.0,24835.5,24835.5,24835.5,13,13,13.0,0.104326,0.104326,0.104326,-1740,-1740,-1740.0,8.0,8.0
100002,9251.775,9251.775,9251.775,179055.0,179055.0,179055.0,179055.0,179055.0,179055.0,1.0,1.0,1.0,,0.0,0.0,0.0,179055.0,179055.0,179055.0,9,9,9.0,0.0,0.0,0.0,-606,-606,-606.0,24.0,24.0
100003,6737.31,98356.995,56553.99,68809.5,900000.0,435436.5,68053.5,1035882.0,484191.0,0.868825,1.011109,0.949329,0.005324,0.0,6885.0,3442.5,68809.5,900000.0,435436.5,12,17,14.666667,0.0,0.100061,0.05003,-2341,-746,-1305.0,10.0,30.0
100004,5357.25,5357.25,5357.25,24282.0,24282.0,24282.0,20106.0,20106.0,20106.0,1.207699,1.207699,1.207699,,4860.0,4860.0,4860.0,24282.0,24282.0,24282.0,5,5,5.0,0.212008,0.212008,0.212008,-815,-815,-815.0,4.0,4.0
100005,4813.2,4813.2,4813.2,44617.5,44617.5,44617.5,40153.5,40153.5,40153.5,1.111173,1.111173,1.111173,,4464.0,4464.0,4464.0,44617.5,44617.5,44617.5,11,11,11.0,0.108964,0.108964,0.108964,-757,-757,-757.0,12.0,12.0


In [69]:
# 6.2 Rename multi-index column names:

approved_agg.columns = pd.Index(['APPROVED_' + e[0] + "_" + e[1].upper() for e in approved_agg.columns.tolist()])


In [70]:
# 6.2.1 Look at it again:

approved_agg.shape     # (337698, 30)
approved_agg.head()

(337698, 30)

Unnamed: 0_level_0,APPROVED_AMT_ANNUITY_MIN,APPROVED_AMT_ANNUITY_MAX,APPROVED_AMT_ANNUITY_MEAN,APPROVED_AMT_APPLICATION_MIN,APPROVED_AMT_APPLICATION_MAX,APPROVED_AMT_APPLICATION_MEAN,APPROVED_AMT_CREDIT_MIN,APPROVED_AMT_CREDIT_MAX,APPROVED_AMT_CREDIT_MEAN,APPROVED_APP_CREDIT_PERC_MIN,APPROVED_APP_CREDIT_PERC_MAX,APPROVED_APP_CREDIT_PERC_MEAN,APPROVED_APP_CREDIT_PERC_VAR,APPROVED_AMT_DOWN_PAYMENT_MIN,APPROVED_AMT_DOWN_PAYMENT_MAX,APPROVED_AMT_DOWN_PAYMENT_MEAN,APPROVED_AMT_GOODS_PRICE_MIN,APPROVED_AMT_GOODS_PRICE_MAX,APPROVED_AMT_GOODS_PRICE_MEAN,APPROVED_HOUR_APPR_PROCESS_START_MIN,APPROVED_HOUR_APPR_PROCESS_START_MAX,APPROVED_HOUR_APPR_PROCESS_START_MEAN,APPROVED_RATE_DOWN_PAYMENT_MIN,APPROVED_RATE_DOWN_PAYMENT_MAX,APPROVED_RATE_DOWN_PAYMENT_MEAN,APPROVED_DAYS_DECISION_MIN,APPROVED_DAYS_DECISION_MAX,APPROVED_DAYS_DECISION_MEAN,APPROVED_CNT_PAYMENT_MEAN,APPROVED_CNT_PAYMENT_SUM
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1
100001,3951.0,3951.0,3951.0,24835.5,24835.5,24835.5,23787.0,23787.0,23787.0,1.044079,1.044079,1.044079,,2520.0,2520.0,2520.0,24835.5,24835.5,24835.5,13,13,13.0,0.104326,0.104326,0.104326,-1740,-1740,-1740.0,8.0,8.0
100002,9251.775,9251.775,9251.775,179055.0,179055.0,179055.0,179055.0,179055.0,179055.0,1.0,1.0,1.0,,0.0,0.0,0.0,179055.0,179055.0,179055.0,9,9,9.0,0.0,0.0,0.0,-606,-606,-606.0,24.0,24.0
100003,6737.31,98356.995,56553.99,68809.5,900000.0,435436.5,68053.5,1035882.0,484191.0,0.868825,1.011109,0.949329,0.005324,0.0,6885.0,3442.5,68809.5,900000.0,435436.5,12,17,14.666667,0.0,0.100061,0.05003,-2341,-746,-1305.0,10.0,30.0
100004,5357.25,5357.25,5357.25,24282.0,24282.0,24282.0,20106.0,20106.0,20106.0,1.207699,1.207699,1.207699,,4860.0,4860.0,4860.0,24282.0,24282.0,24282.0,5,5,5.0,0.212008,0.212008,0.212008,-815,-815,-815.0,4.0,4.0
100005,4813.2,4813.2,4813.2,44617.5,44617.5,44617.5,40153.5,40153.5,40153.5,1.111173,1.111173,1.111173,,4464.0,4464.0,4464.0,44617.5,44617.5,44617.5,11,11,11.0,0.108964,0.108964,0.108964,-757,-757,-757.0,12.0,12.0


In [71]:
# 6.3 Join 'approved_agg' with 'prev_agg'.

prev_agg = prev_agg.join(                      # prev_agg is on the left
                         approved_agg,         # table on the right
                         how='left',           # Join on left table. All its rows remain
                         on='SK_ID_CURR'       # Joining key. 
                        )

In [72]:
# 6.3.1

prev_agg.shape     # (338857, 219)
prev_agg.head()

(338857, 219)

Unnamed: 0_level_0,PREV_AMT_ANNUITY_MIN,PREV_AMT_ANNUITY_MAX,PREV_AMT_ANNUITY_MEAN,PREV_AMT_APPLICATION_MIN,PREV_AMT_APPLICATION_MAX,PREV_AMT_APPLICATION_MEAN,PREV_AMT_CREDIT_MIN,PREV_AMT_CREDIT_MAX,PREV_AMT_CREDIT_MEAN,PREV_APP_CREDIT_PERC_MIN,PREV_APP_CREDIT_PERC_MAX,PREV_APP_CREDIT_PERC_MEAN,PREV_APP_CREDIT_PERC_VAR,PREV_AMT_DOWN_PAYMENT_MIN,PREV_AMT_DOWN_PAYMENT_MAX,PREV_AMT_DOWN_PAYMENT_MEAN,PREV_AMT_GOODS_PRICE_MIN,PREV_AMT_GOODS_PRICE_MAX,PREV_AMT_GOODS_PRICE_MEAN,PREV_HOUR_APPR_PROCESS_START_MIN,PREV_HOUR_APPR_PROCESS_START_MAX,PREV_HOUR_APPR_PROCESS_START_MEAN,PREV_RATE_DOWN_PAYMENT_MIN,PREV_RATE_DOWN_PAYMENT_MAX,PREV_RATE_DOWN_PAYMENT_MEAN,PREV_DAYS_DECISION_MIN,PREV_DAYS_DECISION_MAX,PREV_DAYS_DECISION_MEAN,PREV_CNT_PAYMENT_MEAN,PREV_CNT_PAYMENT_SUM,PREV_NAME_CONTRACT_TYPE_Cash loans_MEAN,PREV_NAME_CONTRACT_TYPE_Consumer loans_MEAN,PREV_NAME_CONTRACT_TYPE_Revolving loans_MEAN,PREV_NAME_CONTRACT_TYPE_XNA_MEAN,PREV_NAME_CONTRACT_TYPE_nan_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_FRIDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_MONDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_SATURDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_THURSDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_TUESDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_WEDNESDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_nan_MEAN,PREV_FLAG_LAST_APPL_PER_CONTRACT_N_MEAN,PREV_FLAG_LAST_APPL_PER_CONTRACT_Y_MEAN,PREV_FLAG_LAST_APPL_PER_CONTRACT_nan_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Building a house or an annex_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Business development_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Buying a garage_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land_MEAN,...,PREV_NAME_YIELD_GROUP_middle_MEAN,PREV_NAME_YIELD_GROUP_nan_MEAN,PREV_PRODUCT_COMBINATION_Card Street_MEAN,PREV_PRODUCT_COMBINATION_Card X-Sell_MEAN,PREV_PRODUCT_COMBINATION_Cash_MEAN,PREV_PRODUCT_COMBINATION_Cash Street: high_MEAN,PREV_PRODUCT_COMBINATION_Cash Street: low_MEAN,PREV_PRODUCT_COMBINATION_Cash Street: middle_MEAN,PREV_PRODUCT_COMBINATION_Cash X-Sell: high_MEAN,PREV_PRODUCT_COMBINATION_Cash X-Sell: low_MEAN,PREV_PRODUCT_COMBINATION_Cash X-Sell: middle_MEAN,PREV_PRODUCT_COMBINATION_POS household with interest_MEAN,PREV_PRODUCT_COMBINATION_POS household without interest_MEAN,PREV_PRODUCT_COMBINATION_POS industry with interest_MEAN,PREV_PRODUCT_COMBINATION_POS industry without interest_MEAN,PREV_PRODUCT_COMBINATION_POS mobile with interest_MEAN,PREV_PRODUCT_COMBINATION_POS mobile without interest_MEAN,PREV_PRODUCT_COMBINATION_POS other with interest_MEAN,PREV_PRODUCT_COMBINATION_POS others without interest_MEAN,PREV_PRODUCT_COMBINATION_nan_MEAN,APPROVED_AMT_ANNUITY_MIN,APPROVED_AMT_ANNUITY_MAX,APPROVED_AMT_ANNUITY_MEAN,APPROVED_AMT_APPLICATION_MIN,APPROVED_AMT_APPLICATION_MAX,APPROVED_AMT_APPLICATION_MEAN,APPROVED_AMT_CREDIT_MIN,APPROVED_AMT_CREDIT_MAX,APPROVED_AMT_CREDIT_MEAN,APPROVED_APP_CREDIT_PERC_MIN,APPROVED_APP_CREDIT_PERC_MAX,APPROVED_APP_CREDIT_PERC_MEAN,APPROVED_APP_CREDIT_PERC_VAR,APPROVED_AMT_DOWN_PAYMENT_MIN,APPROVED_AMT_DOWN_PAYMENT_MAX,APPROVED_AMT_DOWN_PAYMENT_MEAN,APPROVED_AMT_GOODS_PRICE_MIN,APPROVED_AMT_GOODS_PRICE_MAX,APPROVED_AMT_GOODS_PRICE_MEAN,APPROVED_HOUR_APPR_PROCESS_START_MIN,APPROVED_HOUR_APPR_PROCESS_START_MAX,APPROVED_HOUR_APPR_PROCESS_START_MEAN,APPROVED_RATE_DOWN_PAYMENT_MIN,APPROVED_RATE_DOWN_PAYMENT_MAX,APPROVED_RATE_DOWN_PAYMENT_MEAN,APPROVED_DAYS_DECISION_MIN,APPROVED_DAYS_DECISION_MAX,APPROVED_DAYS_DECISION_MEAN,APPROVED_CNT_PAYMENT_MEAN,APPROVED_CNT_PAYMENT_SUM
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1
100001,3951.0,3951.0,3951.0,24835.5,24835.5,24835.5,23787.0,23787.0,23787.0,1.044079,1.044079,1.044079,,2520.0,2520.0,2520.0,24835.5,24835.5,24835.5,13,13,13.0,0.104326,0.104326,0.104326,-1740,-1740,-1740.0,8.0,8.0,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,3951.0,3951.0,3951.0,24835.5,24835.5,24835.5,23787.0,23787.0,23787.0,1.044079,1.044079,1.044079,,2520.0,2520.0,2520.0,24835.5,24835.5,24835.5,13.0,13.0,13.0,0.104326,0.104326,0.104326,-1740.0,-1740.0,-1740.0,8.0,8.0
100002,9251.775,9251.775,9251.775,179055.0,179055.0,179055.0,179055.0,179055.0,179055.0,1.0,1.0,1.0,,0.0,0.0,0.0,179055.0,179055.0,179055.0,9,9,9.0,0.0,0.0,0.0,-606,-606,-606.0,24.0,24.0,0.0,1.0,0.0,0.0,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,9251.775,9251.775,9251.775,179055.0,179055.0,179055.0,179055.0,179055.0,179055.0,1.0,1.0,1.0,,0.0,0.0,0.0,179055.0,179055.0,179055.0,9.0,9.0,9.0,0.0,0.0,0.0,-606.0,-606.0,-606.0,24.0,24.0
100003,6737.31,98356.995,56553.99,68809.5,900000.0,435436.5,68053.5,1035882.0,484191.0,0.868825,1.011109,0.949329,0.005324,0.0,6885.0,3442.5,68809.5,900000.0,435436.5,12,17,14.666667,0.0,0.100061,0.05003,-2341,-746,-1305.0,10.0,30.0,0.333333,0.666667,0.0,0.0,0,0.333333,0.0,0.333333,0.333333,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0.666667,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,6737.31,98356.995,56553.99,68809.5,900000.0,435436.5,68053.5,1035882.0,484191.0,0.868825,1.011109,0.949329,0.005324,0.0,6885.0,3442.5,68809.5,900000.0,435436.5,12.0,17.0,14.666667,0.0,0.100061,0.05003,-2341.0,-746.0,-1305.0,10.0,30.0
100004,5357.25,5357.25,5357.25,24282.0,24282.0,24282.0,20106.0,20106.0,20106.0,1.207699,1.207699,1.207699,,4860.0,4860.0,4860.0,24282.0,24282.0,24282.0,5,5,5.0,0.212008,0.212008,0.212008,-815,-815,-815.0,4.0,4.0,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,1.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,5357.25,5357.25,5357.25,24282.0,24282.0,24282.0,20106.0,20106.0,20106.0,1.207699,1.207699,1.207699,,4860.0,4860.0,4860.0,24282.0,24282.0,24282.0,5.0,5.0,5.0,0.212008,0.212008,0.212008,-815.0,-815.0,-815.0,4.0,4.0
100005,4813.2,4813.2,4813.2,0.0,44617.5,22308.75,0.0,40153.5,20076.75,1.111173,1.111173,1.111173,,4464.0,4464.0,4464.0,44617.5,44617.5,44617.5,10,11,10.5,0.108964,0.108964,0.108964,-757,-315,-536.0,12.0,12.0,0.5,0.5,0.0,0.0,0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,0.0,0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,4813.2,4813.2,4813.2,44617.5,44617.5,44617.5,40153.5,40153.5,40153.5,1.111173,1.111173,1.111173,,4464.0,4464.0,4464.0,44617.5,44617.5,44617.5,11.0,11.0,11.0,0.108964,0.108964,0.108964,-757.0,-757.0,-757.0,12.0,12.0


In [73]:
# 6.4 Similarly for refused applications perform aggregations of numerical features:

refused = prev[prev['NAME_CONTRACT_STATUS_Refused'] == 1]
refused_agg = refused.groupby('SK_ID_CURR').agg(num_aggregations)

In [74]:
# 6.4.1

refused_agg.shape      # (118277, 30)
refused_agg.head()

(118277, 30)

Unnamed: 0_level_0,AMT_ANNUITY,AMT_ANNUITY,AMT_ANNUITY,AMT_APPLICATION,AMT_APPLICATION,AMT_APPLICATION,AMT_CREDIT,AMT_CREDIT,AMT_CREDIT,APP_CREDIT_PERC,APP_CREDIT_PERC,APP_CREDIT_PERC,APP_CREDIT_PERC,AMT_DOWN_PAYMENT,AMT_DOWN_PAYMENT,AMT_DOWN_PAYMENT,AMT_GOODS_PRICE,AMT_GOODS_PRICE,AMT_GOODS_PRICE,HOUR_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,RATE_DOWN_PAYMENT,RATE_DOWN_PAYMENT,RATE_DOWN_PAYMENT,DAYS_DECISION,DAYS_DECISION,DAYS_DECISION,CNT_PAYMENT,CNT_PAYMENT
Unnamed: 0_level_1,min,max,mean,min,max,mean,min,max,mean,min,max,mean,var,min,max,mean,min,max,mean,min,max,mean,min,max,mean,min,max,mean,mean,sum
SK_ID_CURR,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2,Unnamed: 23_level_2,Unnamed: 24_level_2,Unnamed: 25_level_2,Unnamed: 26_level_2,Unnamed: 27_level_2,Unnamed: 28_level_2,Unnamed: 29_level_2,Unnamed: 30_level_2
100006,32696.1,32696.1,32696.1,688500.0,688500.0,688500.0,906615.0,906615.0,906615.0,0.759418,0.759418,0.759418,,,,,688500.0,688500.0,688500.0,15,15,15.0,,,,-181,-181,-181.0,48.0,48.0
100011,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,0.0,0.0,0.0,9,9,9.0,,,,-1162,-1162,-1162.0,,0.0
100027,22556.475,22556.475,22556.475,225000.0,225000.0,225000.0,239850.0,239850.0,239850.0,0.938086,0.938086,0.938086,,,,,225000.0,225000.0,225000.0,14,14,14.0,,,,-181,-181,-181.0,12.0,12.0
100030,2826.45,6176.925,4073.265,21969.224609,43870.5,33767.121094,21969.225,43870.5,32533.2225,1.0,1.20058,1.04228,0.005227,0.0,6714.0,1344.6,21969.225,43870.5,33767.1225,7,16,12.5,0.0,0.210919,0.042874,-2689,-840,-2053.9,9.833333,59.0
100035,22308.75,33238.8,27773.775,0.0,1260000.0,241875.0,0.0,1260000.0,241875.0,1.0,1.0,1.0,0.0,,,,675000.0,1260000.0,967500.0,11,14,12.375,,,,-160,-119,-143.375,54.0,108.0


In [75]:
# 6.5

refused_agg.columns = pd.Index(['REFUSED_' + e[0] + "_" + e[1].upper() for e in refused_agg.columns.tolist()])
refused_agg.head()
refused_agg.shape   # (118277, 30)

Unnamed: 0_level_0,REFUSED_AMT_ANNUITY_MIN,REFUSED_AMT_ANNUITY_MAX,REFUSED_AMT_ANNUITY_MEAN,REFUSED_AMT_APPLICATION_MIN,REFUSED_AMT_APPLICATION_MAX,REFUSED_AMT_APPLICATION_MEAN,REFUSED_AMT_CREDIT_MIN,REFUSED_AMT_CREDIT_MAX,REFUSED_AMT_CREDIT_MEAN,REFUSED_APP_CREDIT_PERC_MIN,REFUSED_APP_CREDIT_PERC_MAX,REFUSED_APP_CREDIT_PERC_MEAN,REFUSED_APP_CREDIT_PERC_VAR,REFUSED_AMT_DOWN_PAYMENT_MIN,REFUSED_AMT_DOWN_PAYMENT_MAX,REFUSED_AMT_DOWN_PAYMENT_MEAN,REFUSED_AMT_GOODS_PRICE_MIN,REFUSED_AMT_GOODS_PRICE_MAX,REFUSED_AMT_GOODS_PRICE_MEAN,REFUSED_HOUR_APPR_PROCESS_START_MIN,REFUSED_HOUR_APPR_PROCESS_START_MAX,REFUSED_HOUR_APPR_PROCESS_START_MEAN,REFUSED_RATE_DOWN_PAYMENT_MIN,REFUSED_RATE_DOWN_PAYMENT_MAX,REFUSED_RATE_DOWN_PAYMENT_MEAN,REFUSED_DAYS_DECISION_MIN,REFUSED_DAYS_DECISION_MAX,REFUSED_DAYS_DECISION_MEAN,REFUSED_CNT_PAYMENT_MEAN,REFUSED_CNT_PAYMENT_SUM
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1
100006,32696.1,32696.1,32696.1,688500.0,688500.0,688500.0,906615.0,906615.0,906615.0,0.759418,0.759418,0.759418,,,,,688500.0,688500.0,688500.0,15,15,15.0,,,,-181,-181,-181.0,48.0,48.0
100011,,,,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,,,0.0,0.0,0.0,9,9,9.0,,,,-1162,-1162,-1162.0,,0.0
100027,22556.475,22556.475,22556.475,225000.0,225000.0,225000.0,239850.0,239850.0,239850.0,0.938086,0.938086,0.938086,,,,,225000.0,225000.0,225000.0,14,14,14.0,,,,-181,-181,-181.0,12.0,12.0
100030,2826.45,6176.925,4073.265,21969.224609,43870.5,33767.121094,21969.225,43870.5,32533.2225,1.0,1.20058,1.04228,0.005227,0.0,6714.0,1344.6,21969.225,43870.5,33767.1225,7,16,12.5,0.0,0.210919,0.042874,-2689,-840,-2053.9,9.833333,59.0
100035,22308.75,33238.8,27773.775,0.0,1260000.0,241875.0,0.0,1260000.0,241875.0,1.0,1.0,1.0,0.0,,,,675000.0,1260000.0,967500.0,11,14,12.375,,,,-160,-119,-143.375,54.0,108.0


(118277, 30)

In [76]:
# 7.0 Join refused_agg with prev_agg:

prev_agg = prev_agg.join(                     # prev_agg: left
                         refused_agg,         # table on the right
                         how='left',
                         on='SK_ID_CURR'
                        )


In [77]:
# 7.1 Our final table:

prev_agg.shape     # 338857, 249)
prev_agg.head()

(338857, 249)

Unnamed: 0_level_0,PREV_AMT_ANNUITY_MIN,PREV_AMT_ANNUITY_MAX,PREV_AMT_ANNUITY_MEAN,PREV_AMT_APPLICATION_MIN,PREV_AMT_APPLICATION_MAX,PREV_AMT_APPLICATION_MEAN,PREV_AMT_CREDIT_MIN,PREV_AMT_CREDIT_MAX,PREV_AMT_CREDIT_MEAN,PREV_APP_CREDIT_PERC_MIN,PREV_APP_CREDIT_PERC_MAX,PREV_APP_CREDIT_PERC_MEAN,PREV_APP_CREDIT_PERC_VAR,PREV_AMT_DOWN_PAYMENT_MIN,PREV_AMT_DOWN_PAYMENT_MAX,PREV_AMT_DOWN_PAYMENT_MEAN,PREV_AMT_GOODS_PRICE_MIN,PREV_AMT_GOODS_PRICE_MAX,PREV_AMT_GOODS_PRICE_MEAN,PREV_HOUR_APPR_PROCESS_START_MIN,PREV_HOUR_APPR_PROCESS_START_MAX,PREV_HOUR_APPR_PROCESS_START_MEAN,PREV_RATE_DOWN_PAYMENT_MIN,PREV_RATE_DOWN_PAYMENT_MAX,PREV_RATE_DOWN_PAYMENT_MEAN,PREV_DAYS_DECISION_MIN,PREV_DAYS_DECISION_MAX,PREV_DAYS_DECISION_MEAN,PREV_CNT_PAYMENT_MEAN,PREV_CNT_PAYMENT_SUM,PREV_NAME_CONTRACT_TYPE_Cash loans_MEAN,PREV_NAME_CONTRACT_TYPE_Consumer loans_MEAN,PREV_NAME_CONTRACT_TYPE_Revolving loans_MEAN,PREV_NAME_CONTRACT_TYPE_XNA_MEAN,PREV_NAME_CONTRACT_TYPE_nan_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_FRIDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_MONDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_SATURDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_THURSDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_TUESDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_WEDNESDAY_MEAN,PREV_WEEKDAY_APPR_PROCESS_START_nan_MEAN,PREV_FLAG_LAST_APPL_PER_CONTRACT_N_MEAN,PREV_FLAG_LAST_APPL_PER_CONTRACT_Y_MEAN,PREV_FLAG_LAST_APPL_PER_CONTRACT_nan_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Building a house or an annex_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Business development_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Buying a garage_MEAN,PREV_NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land_MEAN,...,APPROVED_APP_CREDIT_PERC_MAX,APPROVED_APP_CREDIT_PERC_MEAN,APPROVED_APP_CREDIT_PERC_VAR,APPROVED_AMT_DOWN_PAYMENT_MIN,APPROVED_AMT_DOWN_PAYMENT_MAX,APPROVED_AMT_DOWN_PAYMENT_MEAN,APPROVED_AMT_GOODS_PRICE_MIN,APPROVED_AMT_GOODS_PRICE_MAX,APPROVED_AMT_GOODS_PRICE_MEAN,APPROVED_HOUR_APPR_PROCESS_START_MIN,APPROVED_HOUR_APPR_PROCESS_START_MAX,APPROVED_HOUR_APPR_PROCESS_START_MEAN,APPROVED_RATE_DOWN_PAYMENT_MIN,APPROVED_RATE_DOWN_PAYMENT_MAX,APPROVED_RATE_DOWN_PAYMENT_MEAN,APPROVED_DAYS_DECISION_MIN,APPROVED_DAYS_DECISION_MAX,APPROVED_DAYS_DECISION_MEAN,APPROVED_CNT_PAYMENT_MEAN,APPROVED_CNT_PAYMENT_SUM,REFUSED_AMT_ANNUITY_MIN,REFUSED_AMT_ANNUITY_MAX,REFUSED_AMT_ANNUITY_MEAN,REFUSED_AMT_APPLICATION_MIN,REFUSED_AMT_APPLICATION_MAX,REFUSED_AMT_APPLICATION_MEAN,REFUSED_AMT_CREDIT_MIN,REFUSED_AMT_CREDIT_MAX,REFUSED_AMT_CREDIT_MEAN,REFUSED_APP_CREDIT_PERC_MIN,REFUSED_APP_CREDIT_PERC_MAX,REFUSED_APP_CREDIT_PERC_MEAN,REFUSED_APP_CREDIT_PERC_VAR,REFUSED_AMT_DOWN_PAYMENT_MIN,REFUSED_AMT_DOWN_PAYMENT_MAX,REFUSED_AMT_DOWN_PAYMENT_MEAN,REFUSED_AMT_GOODS_PRICE_MIN,REFUSED_AMT_GOODS_PRICE_MAX,REFUSED_AMT_GOODS_PRICE_MEAN,REFUSED_HOUR_APPR_PROCESS_START_MIN,REFUSED_HOUR_APPR_PROCESS_START_MAX,REFUSED_HOUR_APPR_PROCESS_START_MEAN,REFUSED_RATE_DOWN_PAYMENT_MIN,REFUSED_RATE_DOWN_PAYMENT_MAX,REFUSED_RATE_DOWN_PAYMENT_MEAN,REFUSED_DAYS_DECISION_MIN,REFUSED_DAYS_DECISION_MAX,REFUSED_DAYS_DECISION_MEAN,REFUSED_CNT_PAYMENT_MEAN,REFUSED_CNT_PAYMENT_SUM
SK_ID_CURR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1
100001,3951.0,3951.0,3951.0,24835.5,24835.5,24835.5,23787.0,23787.0,23787.0,1.044079,1.044079,1.044079,,2520.0,2520.0,2520.0,24835.5,24835.5,24835.5,13,13,13.0,0.104326,0.104326,0.104326,-1740,-1740,-1740.0,8.0,8.0,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,1.044079,1.044079,,2520.0,2520.0,2520.0,24835.5,24835.5,24835.5,13.0,13.0,13.0,0.104326,0.104326,0.104326,-1740.0,-1740.0,-1740.0,8.0,8.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
100002,9251.775,9251.775,9251.775,179055.0,179055.0,179055.0,179055.0,179055.0,179055.0,1.0,1.0,1.0,,0.0,0.0,0.0,179055.0,179055.0,179055.0,9,9,9.0,0.0,0.0,0.0,-606,-606,-606.0,24.0,24.0,0.0,1.0,0.0,0.0,0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,1.0,1.0,,0.0,0.0,0.0,179055.0,179055.0,179055.0,9.0,9.0,9.0,0.0,0.0,0.0,-606.0,-606.0,-606.0,24.0,24.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
100003,6737.31,98356.995,56553.99,68809.5,900000.0,435436.5,68053.5,1035882.0,484191.0,0.868825,1.011109,0.949329,0.005324,0.0,6885.0,3442.5,68809.5,900000.0,435436.5,12,17,14.666667,0.0,0.100061,0.05003,-2341,-746,-1305.0,10.0,30.0,0.333333,0.666667,0.0,0.0,0,0.333333,0.0,0.333333,0.333333,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,1.011109,0.949329,0.005324,0.0,6885.0,3442.5,68809.5,900000.0,435436.5,12.0,17.0,14.666667,0.0,0.100061,0.05003,-2341.0,-746.0,-1305.0,10.0,30.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
100004,5357.25,5357.25,5357.25,24282.0,24282.0,24282.0,20106.0,20106.0,20106.0,1.207699,1.207699,1.207699,,4860.0,4860.0,4860.0,24282.0,24282.0,24282.0,5,5,5.0,0.212008,0.212008,0.212008,-815,-815,-815.0,4.0,4.0,0.0,1.0,0.0,0.0,0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,1.207699,1.207699,,4860.0,4860.0,4860.0,24282.0,24282.0,24282.0,5.0,5.0,5.0,0.212008,0.212008,0.212008,-815.0,-815.0,-815.0,4.0,4.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
100005,4813.2,4813.2,4813.2,0.0,44617.5,22308.75,0.0,40153.5,20076.75,1.111173,1.111173,1.111173,,4464.0,4464.0,4464.0,44617.5,44617.5,44617.5,10,11,10.5,0.108964,0.108964,0.108964,-757,-315,-536.0,12.0,12.0,0.5,0.5,0.0,0.0,0,0.5,0.0,0.0,0.0,0.5,0.0,0.0,0,0.0,1.0,0,0.0,0.0,0.0,0.0,...,1.111173,1.111173,,4464.0,4464.0,4464.0,44617.5,44617.5,44617.5,11.0,11.0,11.0,0.108964,0.108964,0.108964,-757.0,-757.0,-757.0,12.0,12.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [78]:
# 8.0 Save the results for subsequent use:
prev_agg.to_csv("processed_prev_agg.csv.zip", compression = "zip")

In [None]:
####################