# Objective

Register the Credit Risk Features to the FEAST Registry Database.

## Dataset

Features of credit risk.

```
'purpose_business',
'purpose_car',
'purpose_domestic appliances',
'purpose_education',
'purpose_furniture/equipment',
'purpose_radio/TV',
'purpose_repairs',
'purpose_vacation/others',
'gender_female',
'gender_male',
'property_free',
'property_own',
'property_rent',
'savings_little',
'savings_moderate',
'savings_no_inf',
'savings_quite rich',
'savings_rich',
'check_little',
'check_moderate',
'check_no_inf',
'check_rich',
'generation_Student',
'generation_Young',
'generation_Adult',
'generation_Senior',
'job_0',
'job_1',
'job_2',
'job_3',
'amount_0',
'amount_1',
'amount_2',
'amount_3'
```

## Note

Need to install psycopg2-binary on Mac.

# Setup

In [13]:
import json
import logging
import os
import sys
import warnings
from datetime import datetime

import pandas as pd
import numpy as np

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
warnings.filterwarnings('ignore')

# Show full arrays without truncation
pd.set_option('display.max_colwidth', None)   # Prevent shortening of column values
pd.set_option('display.max_seq_items', None)  # Prevent truncation of sequences
pd.set_option('display.expand_frame_repr', False)  # Keep everything in one line per column

In [18]:
%load_ext autoreload
%autoreload 2

from psql import (
    batch_insert_with_progress,
    exists_table,
    get_all_tables,
    truncate,
    select_one,
)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Features for Model Consumption

Verify the features to be consumed by the Model Training


In [3]:
df = pd.read_csv(
    "../data/processed/customer_credit_risk_features.csv"
)
df.head()

Unnamed: 0,purpose_business,purpose_car,purpose_domestic appliances,purpose_education,purpose_furniture/equipment,purpose_radio/TV,purpose_repairs,purpose_vacation/others,gender_female,gender_male,...,generation_Adult,generation_Senior,job_0,job_1,job_2,job_3,amount_0,amount_1,amount_2,amount_3
0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,...,0.0,1.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
2,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0
4,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,...,1.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0


In [4]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 34 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   purpose_business             1000 non-null   float64
 1   purpose_car                  1000 non-null   float64
 2   purpose_domestic appliances  1000 non-null   float64
 3   purpose_education            1000 non-null   float64
 4   purpose_furniture/equipment  1000 non-null   float64
 5   purpose_radio/TV             1000 non-null   float64
 6   purpose_repairs              1000 non-null   float64
 7   purpose_vacation/others      1000 non-null   float64
 8   gender_female                1000 non-null   float64
 9   gender_male                  1000 non-null   float64
 10  property_free                1000 non-null   float64
 11  property_own                 1000 non-null   float64
 12  property_rent                1000 non-null   float64
 13  savings_little     