**Store HMDA Data in PostgreSQL by Year**

**This script creates a function which will enable the team to store each year of HMDA Data consistently**

Requires initial storage of raw csv file in PostgreSQL first.  Keeping that as a separate script due to PostgreSQL issue with loading null fields with '' from original CSV file.  Reference Link: https://www.postgresql.org/message-id/4C882E8E.6080301%40postnewspapers.com.au

_______
*To-Do:* 
Create the following objects, and implement the following syntax.
    - schema v__macro_economic_indicators
    - table populationestimates__usda_ers_2010_to_2018 
    - table education__acs_1970_to_2017_5yravgs 

    GRANT USAGE ON SCHEMA v__macro_economic_indicators TO reporting_user;
    GRANT SELECT ON TABLE v__macro_economic_indicators.populationestimates__usda_ers_2010_to_2018 TO reporting_user;
    GRANT SELECT ON TABLE v__macro_economic_indicators.education__acs_1970_to_2017_5yravgs        TO reporting_user;

*Notes from initial SQL File, determine whether to keep these in:
--> A. Finish uploading the rest of "main" dataset into schema "usa_mortgage_market";
--> B.1 Write a script that keeps only the variables we want (i.e. lables and codes);
--> B.2 Cast these variables and combine into one time series dataset.*

In [None]:
import os
import psycopg2
import pandas as pd
import pandas.io.sql as psql
import sqlalchemy
from sqlalchemy import create_engine

**Create PostgreSQL engine**

In [None]:
# Postgres username, password, and database name.
postgres_host = ''  
postgres_port = '5432' 
postgres_username = '' 
postgres_password = ''
postgres_dbname = ''
postgres_str = ('postgresql://{username}:{password}@{host}:{port}/{dbname}'
                .format(username = postgres_username,
                        password = postgres_password,
                        host = postgres_host,
                        port = postgres_port,
                        dbname = postgres_dbname)
               )


# Creating the connection.
cnx = create_engine(postgres_str)

**Create Python objects for key tables**

In [None]:
hmda_schema = 'public'
tables_a = ['17', '16', '15', '14']
tables_t = ['13', '12', '11', '10']
tables_b = ['09','08']
year = tables_a[0]
test_table = f'hmda_lar_20{year}_allrecords'

path = ''
sourcedata = f'{path}{test_table}.csv'

print(sourcedata)
print(test_table)

**I. Store codes with labels in a "usa_mortgage_code_keys" schema for HMDA data tables.**

Original demarcation:
END I. Store codes with labels in a "code_keys" schema [ 2010 - 2017 ]

In [None]:
cnx.execute('DROP SCHEMA IF EXISTS usa_mortgage_code_keys CASCADE;')

In [None]:
sql_code_keys = '''

            CREATE SCHEMA usa_mortgage_code_keys ;

            CREATE TABLE
                usa_mortgage_code_keys.acts_codes
                ( code int PRIMARY KEY,
                  name varchar(60) NOT NULL
                )
            ;
            INSERT INTO usa_mortgage_code_keys.acts_codes (code, name)
            VALUES (1, 'Loan originated'),
                   (2, 'Application approved but not accepted'),
                   (3, 'Application denied by financial institution'),
                   (4, 'Application withdrawn by applicant'),
                   (5, 'File closed for incompleteness'),
                   (6, 'Loan purchased by the institution'),
                   (7, 'Preapproval request denied by financial institution'),
                   (8, 'Preapproval request approved but not accepted')
            ;
           
            CREATE TABLE
                usa_mortgage_code_keys.agency_codes
                ( code int PRIMARY KEY,
                  name varchar(60) NOT NULL,
                  abbr varchar(5) NOT NULL
                )
            ;
            INSERT INTO usa_mortgage_code_keys.agency_codes (code, name, abbr )
            VALUES (1, 'Office of the Comptroller of the Currency', '0CC'),
                   (2, 'Federal Reserve System', 'FRS'),
                   (3, 'Federal Deposit Insurance Corporation', 'FDIC'),
                   (5, 'National Credit Union Administration', 'NCUA'),
                   (7, 'Department of Housing and Urban Development', 'HUD'),
                   (9, 'Consumer Financial Protection Bureau', 'CFPB')
            ;
            '''

cnx.execute(sql_code_keys)

In [None]:
list_schemas = 'SELECT nspname FROM pg_catalog.pg_namespace;'
result = pd.read_sql_query(list_schemas,cnx)
print(result)

**ISSUE: Note sure how to handle the next two cells; the select statements need supporting syntax for agency_code, action_taken_codes to link back to the new schema above.  Confirm how best to handle this.**

In [None]:
#To-Do: Determine whether we need this cell.  Note agency_code designation.
sql_test_schema = f'''select distinct {test_table}.agency_code, 
{test_table}.agency_abbr, 
{test_table}.agency_name 
from public.{test_table};'''
result = pd.read_sql_query(sql_test_schema, cnx)
print(df_test_schema)

In [None]:
#To-Do: Determine how to handle this cell.  We need this one for the action_taken_name codes.
sql_action_taken_codes = f'''SELECT Distinct {test_table}.action_taken_name As code, 
                CAST({test_table}.action_taken_name As varchar(60)) As nm, 
                'action_taken' As cat
                FROM public.{test_table}
                ) select * from action_taken_codes order by cat, code Asc;

                SELECT Distinct act_code, action_taken_nm
                INTO usa_mortgage_code_keys.hmda_act_taken_codes
                FROM action_taken_codes;'''
cnx.execute(sql_action_taken_codes)

**II. Re-structuring: Assess and Execute Variable Changes -- Casting and UNION ALLs**

**NOTE: need to review how to handle the first, second and third cells in this section.  It seems like we don't need them.**

In [None]:
#To-Do: Determine if we need this cell. If so, put python bindings around the following:

SELECT us07.action_taken As act_taken, us07.
    FROM usa_mortgage_market.hmda_lar_2007 us07

Select * From paddle_loan_canoe.usa_mortgage_market.hmda_lar__2017 go;

In [None]:
#To-Do: Determine if we can delete this cell.
sql_var_formatting = f'''SELECT CAST({test_table}.action_taken_name As varchar(56)) As outcome, {test_table}.as_of_year As year,
       CAST({test_table}.denial_reason_name_1 As varchar(56)) dn_reason1 , CAST({test_table}.agency_name As varchar(56)) As agency,
       CAST({test_table}.state_name As varchar(28)) As state,         CAST({test_table}.county_name As varchar(56)) As county,
       CAST({test_table}.loan_type_name As varchar(56)) As ln_type,   CAST({test_table}.loan_purpose_name As varchar(56)) As ln_purp, 
       {test_table}.loan_amount_000s As ln_amt_000s, {test_table}.hud_median_family_income As hud_med_fm_inc, population as pop,
       CAST ( CAST ( CASE
                         WHEN {test_table}.rate_spread = '' Then '0'
                         ELSE {test_table}.rate_spread
                      END As varchar(5)
                   ) As numeric
             )
       As rt_spread
From public.{test_table}
;'''
cnx.execute(sql_var_formatting)

In [None]:
#To-Do: Determine if we need this cell. If so, put python bindings around the following:
select distinct action_taken_name from usa_mortgage_market.{test_table};
select distinct loan_type_name, property_type_name from usa_mortgage_market.{test_table};

**Create Roles**

Reference link for creating roles:
https://aws.amazon.com/blogs/database/managing-postgresql-users-and-roles/

In [None]:
sql_user_roles = f'''CREATE ROLE reporting_user WITH LOGIN PASSWORD 'team_loan_canoe2019' ;
GRANT CONNECT ON DATABASE postgres TO reporting_user;
GRANT USAGE ON SCHEMA public   TO reporting_user;
GRANT SELECT ON TABLE public.{test_table} TO reporting_user;'''
cnx.execute(sql_user_roles)

**Cast Select Variables**

AK Note: changed county_nm to varchar(56)

*Note for last varable in cell below:*
This is three embedded fuctions all in one: assigns hierarchy in CASE, and converts to numeric in two steps


In [None]:
sql_var_formatting = f'''SELECT
       CAST({test_table}.county_name As varchar(56))                               As county_nm,
       CAST({test_table}.agency_name As varchar(128))                              As agency_nm,
       CAST({test_table}.loan_type_name As varchar(128))                           As loan_type_nm,
       CAST({test_table}.loan_purpose_name As varchar(128))                        As loan_purpose_nm,
       CAST({test_table}.action_taken_name As varchar(128))                        As action_taken_nm,
       {test_table}.applicant_income_000s                                          As applicant_income_000s,
       {test_table}.hud_median_family_income                                       As hud_median_fam_inc,
       {test_table}.loan_amount_000s                                               As loan_amt_000s,
       CAST ( CAST ( CASE
                         WHEN us17.rate_spread = '' Then '0'
                         ELSE us17.rate_spread
                     END As varchar(5)
                   ) As numeric
             )
       As rt_spread,'''
cnx.execute(sql_var_formatting)

In [None]:
result.close()