# SQL TABLES SETUP
This project utilizes a series of tables from a Kaggle data set meant to examine default risk status of mortgage applicants.

This notebook will be used to set up the SQL Statements for creating new database tables based on these existing .csv files. Some of these files contain 100+ fields. To avoid manually typing out the table schemas, this notebook imports small segments of each .csv file and helps compose the CREATE TABLE statements. This is done to work with PostgreSQL via command line and not via a GUI.

In [16]:
## Common Python Modules:
import pandas as pd
import numpy as np

## 1) APPLICATION TABLE

### Import Data into Pandas:
Errors occured with trying to import .csv file directly to PostgreSQL. For this reason, Pandas is useful for preprocessing data and casting values correctly before loading into SQL.

In [17]:
## Read application_train .csv file
application_path = 'data_files/home-credit-default-risk/application_train.csv'
application_df = pd.read_csv(application_path)

In [18]:
## Create Pandas Series of all field data types
application_dtypes = pd.Series(application_df.dtypes)

## Create a list of tuples with every field in `application_df` and it's corresponding data type
application_schema = list(zip(application_dtypes.index.values, application_dtypes.values))

### SQL code for dataframe fields:
Print list of fields in `application_df` with their respective data types. This output will be used in for creating a new table in PostgreSQL.

In [19]:
for i in range(len(application_schema)):
    
    if application_schema[i][1] == 'int64':
        print(application_schema[i][0], "INT, ")
        
    elif application_schema[i][1] == 'O':
        print(application_schema[i][0], "VARCHAR, ")
        
    elif application_schema[i][1] == 'float64':
        print(application_schema[i][0], "DECIMAL, ")

SK_ID_CURR INT, 
TARGET INT, 
NAME_CONTRACT_TYPE VARCHAR, 
CODE_GENDER VARCHAR, 
FLAG_OWN_CAR VARCHAR, 
FLAG_OWN_REALTY VARCHAR, 
CNT_CHILDREN INT, 
AMT_INCOME_TOTAL DECIMAL, 
AMT_CREDIT DECIMAL, 
AMT_ANNUITY DECIMAL, 
AMT_GOODS_PRICE DECIMAL, 
NAME_TYPE_SUITE VARCHAR, 
NAME_INCOME_TYPE VARCHAR, 
NAME_EDUCATION_TYPE VARCHAR, 
NAME_FAMILY_STATUS VARCHAR, 
NAME_HOUSING_TYPE VARCHAR, 
REGION_POPULATION_RELATIVE DECIMAL, 
DAYS_BIRTH INT, 
DAYS_EMPLOYED INT, 
DAYS_REGISTRATION DECIMAL, 
DAYS_ID_PUBLISH INT, 
OWN_CAR_AGE DECIMAL, 
FLAG_MOBIL INT, 
FLAG_EMP_PHONE INT, 
FLAG_WORK_PHONE INT, 
FLAG_CONT_MOBILE INT, 
FLAG_PHONE INT, 
FLAG_EMAIL INT, 
OCCUPATION_TYPE VARCHAR, 
CNT_FAM_MEMBERS DECIMAL, 
REGION_RATING_CLIENT INT, 
REGION_RATING_CLIENT_W_CITY INT, 
WEEKDAY_APPR_PROCESS_START VARCHAR, 
HOUR_APPR_PROCESS_START INT, 
REG_REGION_NOT_LIVE_REGION INT, 
REG_REGION_NOT_WORK_REGION INT, 
LIVE_REGION_NOT_WORK_REGION INT, 
REG_CITY_NOT_LIVE_CITY INT, 
REG_CITY_NOT_WORK_CITY INT, 
LIVE_CITY_NOT_

## 2) BUREAU DATA TABLE

In [20]:
## Read bureau.csv file
## Limit load to first 10 rows
## Only need this data for creating SQL schema

bureau_path = 'data_files/home-credit-default-risk/bureau.csv'
bureau_df = pd.read_csv(bureau_path, nrows=10)

In [21]:
## Create Pandas Series of all field data types
bureau_dtypes = pd.Series(bureau_df.dtypes)

## Create a list of tuples with every field in `application_df` and it's corresponding data type
bureau_schema = list(zip(bureau_dtypes.index.values, bureau_dtypes.values))

In [22]:
for i in range(len(bureau_schema)):
    
    if bureau_schema[i][1] == 'int64':
        print(bureau_schema[i][0], "INT, ")
        
    elif bureau_schema[i][1] == 'O':
        print(bureau_schema[i][0], "VARCHAR, ")
        
    elif bureau_schema[i][1] == 'float64':
        print(bureau_schema[i][0], "DECIMAL, ")

SK_ID_CURR INT, 
SK_ID_BUREAU INT, 
CREDIT_ACTIVE VARCHAR, 
CREDIT_CURRENCY VARCHAR, 
DAYS_CREDIT INT, 
CREDIT_DAY_OVERDUE INT, 
DAYS_CREDIT_ENDDATE DECIMAL, 
DAYS_ENDDATE_FACT DECIMAL, 
AMT_CREDIT_MAX_OVERDUE DECIMAL, 
CNT_CREDIT_PROLONG INT, 
AMT_CREDIT_SUM DECIMAL, 
AMT_CREDIT_SUM_DEBT DECIMAL, 
AMT_CREDIT_SUM_LIMIT DECIMAL, 
AMT_CREDIT_SUM_OVERDUE DECIMAL, 
CREDIT_TYPE VARCHAR, 
DAYS_CREDIT_UPDATE INT, 
AMT_ANNUITY DECIMAL, 


## 1.3 Additional Tables
Use the above code for import process of the following additional tables:
- bureau_balance.csv
- credit_card_balance.csv
- installments_payments.csv
- POS_CASH_balance.csv
- previous_application.csv