# Settings

## Installation

In [1]:
!pip install --upgrade autocarver



## Setting up samples

This dataset can be found from the corresponding Kaggle competition at https://www.kaggle.com/competitions/GiveMeSomeCredit/

In [1]:
import pandas as pd

data_path = "GiveMeSomeCredit"

credit_data = pd.read_csv(f"{data_path}/cs-training.csv", index_col=0)
print(credit_data.shape)
credit_data.head()

(150000, 11)


Unnamed: 0,SeriousDlqin2yrs,RevolvingUtilizationOfUnsecuredLines,age,NumberOfTime30-59DaysPastDueNotWorse,DebtRatio,MonthlyIncome,NumberOfOpenCreditLinesAndLoans,NumberOfTimes90DaysLate,NumberRealEstateLoansOrLines,NumberOfTime60-89DaysPastDueNotWorse,NumberOfDependents
1,1,0.766127,45,2,0.802982,9120.0,13,0,6,0,2.0
2,0,0.957151,40,0,0.121876,2600.0,4,0,0,0,1.0
3,0,0.65818,38,1,0.085113,3042.0,2,1,0,0,0.0
4,0,0.23381,30,0,0.03605,3300.0,5,0,0,0,0.0
5,0,0.907239,49,1,0.024926,63588.0,7,0,1,0,0.0


In [2]:
from sklearn.model_selection import train_test_split

X_train, X_dev = train_test_split(credit_data, test_size=0.33, random_state=42)

## Picking up columns to Carve

In [3]:
X_train.dtypes

SeriousDlqin2yrs                          int64
RevolvingUtilizationOfUnsecuredLines    float64
age                                       int64
NumberOfTime30-59DaysPastDueNotWorse      int64
DebtRatio                               float64
MonthlyIncome                           float64
NumberOfOpenCreditLinesAndLoans           int64
NumberOfTimes90DaysLate                   int64
NumberRealEstateLoansOrLines              int64
NumberOfTime60-89DaysPastDueNotWorse      int64
NumberOfDependents                      float64
dtype: object

In [4]:
X_train.isna().mean()

SeriousDlqin2yrs                        0.000000
RevolvingUtilizationOfUnsecuredLines    0.000000
age                                     0.000000
NumberOfTime30-59DaysPastDueNotWorse    0.000000
DebtRatio                               0.000000
MonthlyIncome                           0.197383
NumberOfOpenCreditLinesAndLoans         0.000000
NumberOfTimes90DaysLate                 0.000000
NumberRealEstateLoansOrLines            0.000000
NumberOfTime60-89DaysPastDueNotWorse    0.000000
NumberOfDependents                      0.026129
dtype: float64

In [5]:
target = "SeriousDlqin2yrs"
quantitative_features = [feature for feature in X_train if feature != target]

# Feature processing with AutoCarver
## Fitting train samples and testint robustness

In [6]:
from AutoCarver import AutoCarver

auto_carver = AutoCarver(
    quantitative_features=quantitative_features,
    qualitative_features=[],
    sort_by='cramerv',  # Best combination according to Cramer's V
    dropna=False,  # don't want to groups nans with other values, leave that to XGBoost 
    min_freq=0.1,  # minimum frequency per modality
    max_n_mod=5,  # maximum number of modality per carved feature
    copy=True,  # in order not to modify X_train directly
    pretty_print=True,  # prints nice tables
)
x_discretized = auto_carver.fit_transform(
    # specifying dataset to carve
    X_train, X_train[target],
    # specifying a dataset to test robustness
    X_dev=X_dev, y_dev=X_dev[target]
)

------
[Discretizer] Fit Quantitative Features
---
 - [QuantileDiscretizer] Fit ['age', 'NumberOfDependents', 'DebtRatio', 'RevolvingUtilizationOfUnsecuredLines', 'NumberOfTimes90DaysLate', 'NumberOfTime60-89DaysPastDueNotWorse', 'NumberOfOpenCreditLinesAndLoans', 'NumberRealEstateLoansOrLines', 'NumberOfTime30-59DaysPastDueNotWorse', 'MonthlyIncome']
 - [BaseDiscretizer] Transform Quantitative ['age', 'NumberOfDependents', 'DebtRatio', 'RevolvingUtilizationOfUnsecuredLines', 'NumberOfTimes90DaysLate', 'NumberOfTime60-89DaysPastDueNotWorse', 'NumberOfOpenCreditLinesAndLoans', 'NumberRealEstateLoansOrLines', 'NumberOfTime30-59DaysPastDueNotWorse', 'MonthlyIncome']
 - [OrdinalDiscretizer] Fit ['NumberOfDependents', 'NumberOfTime30-59DaysPastDueNotWorse', 'DebtRatio', 'RevolvingUtilizationOfUnsecuredLines', 'NumberOfTimes90DaysLate', 'NumberOfTime60-89DaysPastDueNotWorse', 'NumberOfOpenCreditLinesAndLoans', 'NumberRealEstateLoansOrLines', 'age', 'MonthlyIncome']
------

 - [BaseDiscretize

Unnamed: 0_level_0,target_rate,frequency
NumberOfDependents,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 0.0,0.06,0.58
0.0 < x <= 1.0,0.073,0.175
1.0 < x,0.085,0.219
__NAN__,0.048,0.026

Unnamed: 0_level_0,target_rate,frequency
NumberOfDependents,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 0.0,0.057,0.579
0.0 < x <= 1.0,0.074,0.177
1.0 < x,0.086,0.218
__NAN__,0.042,0.026


Grouping modalities   : 100%|██████████| 3/3 [00:00<?, ?it/s]
Computing associations: 100%|██████████| 3/3 [00:00<00:00, 3078.76it/s]
Testing robustness    :   0%|          | 0/3 [00:00<?, ?it/s]


 - [AutoCarver] Carved feature distribution





Unnamed: 0,target_rate,frequency
x <= 0.0,0.06,0.58
0.0 < x <= 1.0,0.073,0.175
1.0 < x,0.085,0.219
__NAN__,0.048,0.026

Unnamed: 0,target_rate,frequency
x <= 0.0,0.057,0.579
0.0 < x <= 1.0,0.074,0.177
1.0 < x,0.086,0.218
__NAN__,0.042,0.026


------


------
[AutoCarver] Fit NumberOfTime30-59DaysPastDueNotWorse (2/10)
---

 - [AutoCarver] Raw feature distribution


Unnamed: 0_level_0,target_rate,frequency
NumberOfTime30-59DaysPastDueNotWorse,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 0.0,0.04,0.839
0.0 < x,0.208,0.161

Unnamed: 0_level_0,target_rate,frequency
NumberOfTime30-59DaysPastDueNotWorse,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 0.0,0.039,0.842
0.0 < x,0.207,0.158


Grouping modalities   : 100%|██████████| 1/1 [00:00<?, ?it/s]
Computing associations: 100%|██████████| 1/1 [00:00<00:00, 1001.98it/s]
Testing robustness    :   0%|          | 0/1 [00:00<?, ?it/s]


 - [AutoCarver] Carved feature distribution





Unnamed: 0,target_rate,frequency
x <= 0.0,0.04,0.839
0.0 < x,0.208,0.161

Unnamed: 0,target_rate,frequency
x <= 0.0,0.039,0.842
0.0 < x,0.207,0.158


------


------
[AutoCarver] Fit DebtRatio (3/10)
---

 - [AutoCarver] Raw feature distribution


Unnamed: 0_level_0,target_rate,frequency
DebtRatio,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 31.1m,0.052,0.1
31.1m < x <= 133.6m,0.07,0.1
133.6m < x <= 213.9m,0.061,0.1
213.9m < x <= 287.6m,0.054,0.1
287.6m < x <= 467.4m,0.061,0.2
467.4m < x <= 648.0m,0.088,0.1
648.0m < x <= 3.8,0.114,0.1
3.8 < x,0.056,0.2

Unnamed: 0_level_0,target_rate,frequency
DebtRatio,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 31.1m,0.056,0.101
31.1m < x <= 133.6m,0.064,0.099
133.6m < x <= 213.9m,0.058,0.101
213.9m < x <= 287.6m,0.055,0.1
287.6m < x <= 467.4m,0.062,0.199
467.4m < x <= 648.0m,0.077,0.099
648.0m < x <= 3.8,0.115,0.1
3.8 < x,0.054,0.202


Grouping modalities   : 100%|██████████| 98/98 [00:00<00:00, 7372.02it/s]
Computing associations: 100%|██████████| 98/98 [00:00<00:00, 3828.24it/s]
Testing robustness    :   2%|▏         | 2/98 [00:00<00:00, 799.98it/s]


 - [AutoCarver] Carved feature distribution





Unnamed: 0,target_rate,frequency
x <= 31.1m,0.059,0.4
287.6m < x <= 467.4m,0.061,0.2
467.4m < x <= 648.0m,0.088,0.1
648.0m < x <= 3.8,0.114,0.1
3.8 < x,0.056,0.2

Unnamed: 0,target_rate,frequency
x <= 31.1m,0.059,0.401
287.6m < x <= 467.4m,0.062,0.199
467.4m < x <= 648.0m,0.077,0.099
648.0m < x <= 3.8,0.115,0.1
3.8 < x,0.054,0.202


------


------
[AutoCarver] Fit RevolvingUtilizationOfUnsecuredLines (4/10)
---

 - [AutoCarver] Raw feature distribution


Unnamed: 0_level_0,target_rate,frequency
RevolvingUtilizationOfUnsecuredLines,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 3.0m,0.025,0.1
3.0m < x <= 19.2m,0.014,0.1
19.2m < x <= 43.6m,0.014,0.1
43.6m < x <= 83.8m,0.019,0.1
83.8m < x <= 155.4m,0.025,0.1
155.4m < x <= 273.6m,0.037,0.1
273.6m < x <= 447.4m,0.053,0.1
447.4m < x <= 700.7m,0.088,0.1
700.7m < x,0.199,0.2

Unnamed: 0_level_0,target_rate,frequency
RevolvingUtilizationOfUnsecuredLines,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 3.0m,0.025,0.1
3.0m < x <= 19.2m,0.012,0.099
19.2m < x <= 43.6m,0.014,0.103
43.6m < x <= 83.8m,0.019,0.103
83.8m < x <= 155.4m,0.022,0.1
155.4m < x <= 273.6m,0.031,0.1
273.6m < x <= 447.4m,0.052,0.099
447.4m < x <= 700.7m,0.09,0.098
700.7m < x,0.199,0.198


Grouping modalities   : 100%|██████████| 162/162 [00:00<00:00, 6461.12it/s]
Computing associations: 100%|██████████| 162/162 [00:00<00:00, 3819.84it/s]
Testing robustness    :   0%|          | 0/162 [00:00<?, ?it/s]


 - [AutoCarver] Carved feature distribution





Unnamed: 0,target_rate,frequency
x <= 3.0m,0.02,0.5
155.4m < x <= 273.6m,0.037,0.1
273.6m < x <= 447.4m,0.053,0.1
447.4m < x <= 700.7m,0.088,0.1
700.7m < x,0.199,0.2

Unnamed: 0,target_rate,frequency
x <= 3.0m,0.018,0.504
155.4m < x <= 273.6m,0.031,0.1
273.6m < x <= 447.4m,0.052,0.099
447.4m < x <= 700.7m,0.09,0.098
700.7m < x,0.199,0.198


------


------
[AutoCarver] Fit NumberOfTimes90DaysLate (5/10)
---

 - [AutoCarver] Raw feature distribution


Unnamed: 0_level_0,target_rate,frequency
NumberOfTimes90DaysLate,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= nan,0.067,1.0

Unnamed: 0_level_0,target_rate,frequency
NumberOfTimes90DaysLate,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= nan,0.066,1.0


 - [AutoCarver] No robust combination for feature 'NumberOfTimes90DaysLate' could be found. It will be ignored. You might have to increase the size of your test sample (test sample not representative of test sample for this feature) or you should consider dropping this features.
------


------
[AutoCarver] Fit NumberOfTime60-89DaysPastDueNotWorse (6/10)
---

 - [AutoCarver] Raw feature distribution


Unnamed: 0_level_0,target_rate,frequency
NumberOfTime60-89DaysPastDueNotWorse,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= nan,0.067,1.0

Unnamed: 0_level_0,target_rate,frequency
NumberOfTime60-89DaysPastDueNotWorse,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= nan,0.066,1.0


 - [AutoCarver] No robust combination for feature 'NumberOfTime60-89DaysPastDueNotWorse' could be found. It will be ignored. You might have to increase the size of your test sample (test sample not representative of test sample for this feature) or you should consider dropping this features.
------


------
[AutoCarver] Fit NumberOfOpenCreditLinesAndLoans (7/10)
---

 - [AutoCarver] Raw feature distribution


Unnamed: 0_level_0,target_rate,frequency
NumberOfOpenCreditLinesAndLoans,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 3.0,0.107,0.147
3.0 < x <= 5.0,0.064,0.163
5.0 < x <= 8.0,0.053,0.262
8.0 < x <= 10.0,0.061,0.141
10.0 < x <= 12.0,0.061,0.102
12.0 < x,0.067,0.185

Unnamed: 0_level_0,target_rate,frequency
NumberOfOpenCreditLinesAndLoans,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 3.0,0.107,0.147
3.0 < x <= 5.0,0.063,0.164
5.0 < x <= 8.0,0.053,0.265
8.0 < x <= 10.0,0.055,0.138
10.0 < x <= 12.0,0.057,0.102
12.0 < x,0.066,0.183


Grouping modalities   : 100%|██████████| 30/30 [00:00<00:00, 3600.36it/s]
Computing associations: 100%|██████████| 30/30 [00:00<00:00, 3562.24it/s]
Testing robustness    :   0%|          | 0/30 [00:00<?, ?it/s]



 - [AutoCarver] Carved feature distribution


Unnamed: 0,target_rate,frequency
x <= 3.0,0.107,0.147
3.0 < x <= 5.0,0.064,0.163
5.0 < x <= 8.0,0.053,0.262
8.0 < x <= 10.0,0.061,0.243
12.0 < x,0.067,0.185

Unnamed: 0,target_rate,frequency
x <= 3.0,0.107,0.147
3.0 < x <= 5.0,0.063,0.164
5.0 < x <= 8.0,0.053,0.265
8.0 < x <= 10.0,0.056,0.24
12.0 < x,0.066,0.183


------


------
[AutoCarver] Fit NumberRealEstateLoansOrLines (8/10)
---

 - [AutoCarver] Raw feature distribution


Unnamed: 0_level_0,target_rate,frequency
NumberRealEstateLoansOrLines,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 0.0,0.083,0.376
0.0 < x <= 1.0,0.053,0.348
1.0 < x,0.065,0.277

Unnamed: 0_level_0,target_rate,frequency
NumberRealEstateLoansOrLines,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 0.0,0.083,0.373
0.0 < x <= 1.0,0.052,0.352
1.0 < x,0.059,0.276


Grouping modalities   : 100%|██████████| 3/3 [00:00<00:00, 1506.03it/s]
Computing associations: 100%|██████████| 3/3 [00:00<00:00, 3008.83it/s]
Testing robustness    :   0%|          | 0/3 [00:00<?, ?it/s]


 - [AutoCarver] Carved feature distribution





Unnamed: 0,target_rate,frequency
x <= 0.0,0.083,0.376
0.0 < x <= 1.0,0.053,0.348
1.0 < x,0.065,0.277

Unnamed: 0,target_rate,frequency
x <= 0.0,0.083,0.373
0.0 < x <= 1.0,0.052,0.352
1.0 < x,0.059,0.276


------


------
[AutoCarver] Fit age (9/10)
---

 - [AutoCarver] Raw feature distribution


Unnamed: 0_level_0,target_rate,frequency
age,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 33.0,0.115,0.114
33.0 < x <= 39.0,0.097,0.1
39.0 < x <= 48.0,0.085,0.203
48.0 < x <= 56.0,0.071,0.193
56.0 < x <= 61.0,0.051,0.112
61.0 < x,0.028,0.277

Unnamed: 0_level_0,target_rate,frequency
age,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 33.0,0.11,0.114
33.0 < x <= 39.0,0.094,0.098
39.0 < x <= 48.0,0.082,0.204
48.0 < x <= 56.0,0.072,0.194
56.0 < x <= 61.0,0.048,0.113
61.0 < x,0.029,0.277


Grouping modalities   : 100%|██████████| 30/30 [00:00<00:00, 3313.21it/s]
Computing associations: 100%|██████████| 30/30 [00:00<00:00, 2360.64it/s]
Testing robustness    :   0%|          | 0/30 [00:00<?, ?it/s]


 - [AutoCarver] Carved feature distribution





Unnamed: 0,target_rate,frequency
x <= 33.0,0.115,0.114
33.0 < x <= 39.0,0.089,0.304
48.0 < x <= 56.0,0.071,0.193
56.0 < x <= 61.0,0.051,0.112
61.0 < x,0.028,0.277

Unnamed: 0,target_rate,frequency
x <= 33.0,0.11,0.114
33.0 < x <= 39.0,0.086,0.302
48.0 < x <= 56.0,0.072,0.194
56.0 < x <= 61.0,0.048,0.113
61.0 < x,0.029,0.277


------


------
[AutoCarver] Fit MonthlyIncome (10/10)
---

 - [AutoCarver] Raw feature distribution


Unnamed: 0_level_0,target_rate,frequency
MonthlyIncome,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 2.3K,0.085,0.101
2.3K < x <= 3.4K,0.097,0.101
3.4K < x <= 5.4K,0.08,0.201
5.4K < x <= 8.2K,0.061,0.199
8.2K < x <= 10.7K,0.051,0.1
10.7K < x,0.044,0.1
__NAN__,0.056,0.197

Unnamed: 0_level_0,target_rate,frequency
MonthlyIncome,Unnamed: 1_level_1,Unnamed: 2_level_1
x <= 2.3K,0.089,0.101
2.3K < x <= 3.4K,0.098,0.101
3.4K < x <= 5.4K,0.075,0.199
5.4K < x <= 8.2K,0.06,0.199
8.2K < x <= 10.7K,0.042,0.098
10.7K < x,0.047,0.102
__NAN__,0.056,0.2


Grouping modalities   : 100%|██████████| 30/30 [00:00<00:00, 3022.63it/s]
Computing associations: 100%|██████████| 30/30 [00:00<00:00, 2698.00it/s]
Testing robustness    :   0%|          | 0/30 [00:00<?, ?it/s]


 - [AutoCarver] Carved feature distribution





Unnamed: 0,target_rate,frequency
x <= 2.3K,0.085,0.101
2.3K < x <= 3.4K,0.097,0.101
3.4K < x <= 5.4K,0.08,0.201
5.4K < x <= 8.2K,0.061,0.199
8.2K < x <= 10.7K,0.047,0.201
__NAN__,0.056,0.197

Unnamed: 0,target_rate,frequency
x <= 2.3K,0.089,0.101
2.3K < x <= 3.4K,0.098,0.101
3.4K < x <= 5.4K,0.075,0.199
5.4K < x <= 8.2K,0.06,0.199
8.2K < x <= 10.7K,0.044,0.2
__NAN__,0.056,0.2


------

 - [BaseDiscretizer] Transform Quantitative ['age', 'NumberOfDependents', 'DebtRatio', 'RevolvingUtilizationOfUnsecuredLines', 'NumberOfOpenCreditLinesAndLoans', 'NumberRealEstateLoansOrLines', 'NumberOfTime30-59DaysPastDueNotWorse', 'MonthlyIncome']


## Inspecting Discretization

In [7]:
x_discretized[quantitative_features].head()

Unnamed: 0,RevolvingUtilizationOfUnsecuredLines,age,NumberOfTime30-59DaysPastDueNotWorse,DebtRatio,MonthlyIncome,NumberOfOpenCreditLinesAndLoans,NumberOfTimes90DaysLate,NumberRealEstateLoansOrLines,NumberOfTime60-89DaysPastDueNotWorse,NumberOfDependents
87936,0.0,4,0,4.0,,2,0,1,0,0.0
3893,3.0,0,0,2.0,2.0,1,0,0,0,0.0
41405,4.0,1,0,4.0,,2,0,1,0,0.0
91125,2.0,2,0,2.0,3.0,3,0,0,0,2.0
67373,3.0,2,3,4.0,,2,2,1,1,


In [7]:
auto_carver.summary()

Unnamed: 0_level_0,Unnamed: 1_level_0,label,content
feature,dtype,Unnamed: 2_level_1,Unnamed: 3_level_1
DebtRatio,float,0,[x <= 287.6m]
DebtRatio,float,1,[287.6m < x <= 467.4m]
DebtRatio,float,2,[467.4m < x <= 648.0m]
DebtRatio,float,3,[648.0m < x <= 3.8]
DebtRatio,float,4,[3.8 < x]
MonthlyIncome,float,0,[x <= 2.3K]
MonthlyIncome,float,1,[2.3K < x <= 3.4K]
MonthlyIncome,float,2,[3.4K < x <= 5.4K]
MonthlyIncome,float,3,[5.4K < x <= 8.2K]
MonthlyIncome,float,4,[8.2K < x]


## Saving for later uses

In [9]:
import json

# storing as json file
with open('my_carver.json', 'w') as my_carver_json:
    json.dump(auto_carver.to_json(), my_carver_json)

# Feature Selection

### Setting up measures and filters

In [10]:
from AutoCarver.feature_selection import FeatureSelector

n_best = 10  # number of features to select

feature_selector = FeatureSelector(
    quantitative_features=quantitative_features, 
    n_best=n_best,
    pretty_print=True,
)
best_features = feature_selector.select(x_discretized, x_discretized[target])

------
[FeatureSelector] Selecting from Features: ['age', 'NumberOfDependents', 'DebtRatio', 'RevolvingUtilizationOfUnsecuredLines', 'NumberOfTimes90DaysLate', 'NumberOfTime60-89DaysPastDueNotWorse', 'NumberOfOpenCreditLinesAndLoans', 'NumberRealEstateLoansOrLines', 'NumberOfTime30-59DaysPastDueNotWorse', 'MonthlyIncome']
---

 - Association between X and y


Unnamed: 0,dtype,pct_nan,pct_mode,mode,kruskal_measure
NumberOfTimes90DaysLate,int64,0.0,0.94396,0.0,11854.250713
NumberOfTime60-89DaysPastDueNotWorse,int64,0.0,0.949254,0.0,7636.377475
RevolvingUtilizationOfUnsecuredLines,float64,0.0,0.5,0.0,6158.299923
NumberOfTime30-59DaysPastDueNotWorse,int64,0.0,0.839035,0.0,6076.412797
age,int64,0.0,0.303592,1.0,1370.032527
MonthlyIncome,float64,0.197383,0.200945,2.0,337.382149
NumberOfDependents,float64,0.026129,0.579741,0.0,175.384642
NumberOfOpenCreditLinesAndLoans,int64,0.0,0.261602,2.0,121.575254
NumberRealEstateLoansOrLines,int64,0.0,0.375512,0.0,120.839911
DebtRatio,float64,0.0,0.4,0.0,47.123007



 - Association between X and y, filtered for inter-feature assocation


Unnamed: 0,dtype,pct_nan,pct_mode,mode,kruskal_measure
NumberOfTimes90DaysLate,int64,0.0,0.94396,0.0,11854.250713
NumberOfTime60-89DaysPastDueNotWorse,int64,0.0,0.949254,0.0,7636.377475
RevolvingUtilizationOfUnsecuredLines,float64,0.0,0.5,0.0,6158.299923
NumberOfTime30-59DaysPastDueNotWorse,int64,0.0,0.839035,0.0,6076.412797
age,int64,0.0,0.303592,1.0,1370.032527
MonthlyIncome,float64,0.197383,0.200945,2.0,337.382149
NumberOfDependents,float64,0.026129,0.579741,0.0,175.384642
NumberOfOpenCreditLinesAndLoans,int64,0.0,0.261602,2.0,121.575254
NumberRealEstateLoansOrLines,int64,0.0,0.375512,0.0,120.839911
DebtRatio,float64,0.0,0.4,0.0,47.123007


------



['NumberOfTimes90DaysLate',
 'NumberOfTime60-89DaysPastDueNotWorse',
 'RevolvingUtilizationOfUnsecuredLines',
 'NumberOfTime30-59DaysPastDueNotWorse',
 'age',
 'MonthlyIncome',
 'NumberOfDependents',
 'NumberOfOpenCreditLinesAndLoans',
 'NumberRealEstateLoansOrLines',
 'DebtRatio']

### Enjoy modeling!