## ETL of Census Data

The "Escape the Bay" project was a success,  but generated 11 separate CSV Files!

Since each analysis was performed separately,  it was difficult to be able to draw correlations between the datasets.

The purpose of this ETL Homework is create a database were all the data can be stored, and queries can be written from dataset to dataset.

Thus the Tasks will be:

### 1) Extract data from 6 CSVs (there is some duplication of information) and import into Pandas

### 2) Transform
  #### A. Eliminate un-needed data and missing data
  #### B. Harmonize the naming of the key cities in the analysis so the tables can be joined more easily
  #### C. Based on the dataset size, determine whether to join the data in Pandas,  or in SQL
  #### D. Output csv files into the SQL_data folder
  
### 3) Load
   
 #### A. Create the Schema for the Escape-The-Bay Database using quickDBD

 #### B.  Create the tables in SQL

 #### C. Upload transformed data csv (from the SQL_Data folder) into a POSTGRESQL database

 #### D. Check the database and write a few sample queries using SQLALchemy;  

### Document!
   
References:  The original data sources comes from Vanessa Oakes, Emily Todd, Stefan Zobrist and Rebecca Mih

## Extract

In [4]:
import csv
from matplotlib.ticker import FuncFormatter
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import requests
import json

In [5]:
# Census Flows Mapper outputs
csv_path = "./Resources/SF_All_OUT.csv"
sf_out_df = pd.read_csv(csv_path)

# Census Quick Facts CSV
# The "counties" CSV contain information regarding:
# Population demographics, Number of owner occupied housing, Median Value of owner occupied housing, Median Gross rent
# Median Income, Total # of Employer Establishments, Total annual payroll, FIPs code

CA_counties = pd.read_csv("./Resources/CA_counties.csv")
non_CA_counties = pd.read_csv("./Resources/non_CA_counties.csv")
marital_ca_df = pd.read_csv('./Resources/California - Marital 3.csv')
marital_out_df = pd.read_csv('./Resources/Out of State - Marital.csv')

# Census Advanced Fact Finder CSV from the American Community Survey (ACS)  
#The "income and mortgage" CSV is a unique dataset that contains the distributions (bins) of the Total Household Income
# the mortgage values, and the debt to income rati0,  by county
ACS_data = pd.read_csv("./Resources/2017_income_mortgage.csv")

# Census API call and CSV creation
# This API call creates the dataset for Median home values, Median Rental costs
base_url = "https://api.census.gov/data/2017/acs/acs1/profile"



# Transform 

## Census Flows Mapper Data

The U.S. Census has a very handy tool called Census Flows Mapper which automatically determines the outbound and inbound migrants from any given county

https://flowsmapper.geo.census.gov/

The data in the CSVs came from the output of that website

Activities
- Keep in mind that this csv will have the primary key through the FIPs code for the Sequel Database



In [9]:
#  First determine where people migrate to,  from San Francisco.  Look at those moving within CA

#Top 5 in-CA counties
ranked_sf_out_df = sf_out_df.sort_values(by='Total', ascending=False)
sf_to_ca = ranked_sf_out_df.loc[ranked_sf_out_df["State Name"] == "California",:]


#Transorm the data -- Only take the top 5 destinations

sftoca = sf_to_ca.iloc[:5,]

sftoca = sftoca.rename(columns={"Total": "# Migrated from SF County (2017)"})

sftoca['County Name'] = sftoca['County Name'].replace(
    {'Alameda County': 'Alameda', 'San Mateo County': 'San Mateo', "Contra Costa County":'Contra Costa',
     "Los Angeles County":'Los Angeles', 'Santa Clara County':'Santa Clara'})


#sftoca = sftoca.set_index(["County Name"])

sftoca.head()


Unnamed: 0,State/County FIPS,State FIPS,County FIPS,County Name,State Name,# Migrated from SF County (2017),Margin of Error (+/-)
19,'06001','06','001',Alameda,California,10791,1127
52,'06081','06','081',San Mateo,California,8995,1054
25,'06013','06','013',Contra Costa,California,4085,631
34,'06037','06','037',Los Angeles,California,3726,547
54,'06085','06','085',Santa Clara,California,3383,447


In [10]:
#  Top destinations moving outside of CA

#Top 5 non-CA counties
sf_not_ca = ranked_sf_out_df.loc[ranked_sf_out_df["State Name"] != "California",:]
sf_not_ca.head(5)

#Transorm the data -- only take top 5 destinations
sfnotca = sf_not_ca.iloc[:5,]
sfnotca = sfnotca.rename(columns={"Total": "# Migrated from SF County (2017)"})

# No filtering needed, keep the FIPs code for the SQL database sfnotca_summary = sf_not_ca.iloc[:5,3:7]
sfnotca['County Name'] = sfnotca['County Name'].replace(
    {'New York County': 'NY (Manhattan)', 'King County': 'King', "Multnomah County":'Multnomah',
     "Kings County":'Kings (Brooklyn)', 'Cook County':'Cook'})

#sfnotca = sfnotca.set_index(["County Name"])

sfnotca.head()

Unnamed: 0,State/County FIPS,State FIPS,County FIPS,County Name,State Name,# Migrated from SF County (2017),Margin of Error (+/-)
259,'36061','36','061',NY (Manhattan),New York,1419,657
413,'53033','53','033',King,Washington,1293,336
326,'41051','41','051',Multnomah,Oregon,1094,282
255,'36047','36','047',Kings (Brooklyn),New York,887,300
131,'17031','17','031',Cook,Illinois,635,223


In [14]:
# Join the two tables together for a single destinations file

Destinations = pd.merge(sftoca,sfnotca,how='outer')


#Destinations = Destinations.set_index(["State/County FIPs"])

# Keep the FIPs ids as strings, so that they don't lose the 0 at the beginning

# Add in San Francisco County to the table ??
Destinations.index.name = 'Index'
Destinations = Destinations.rename(columns = {
                'State FIPS': 'State_FIPs', 'County FIPS':'County_FIPS',
                'County Name':'County_Name', 'State Name':'State_Name',
                '# Migrated from SF County (2017)':'#_Migrated_(2017)',
                'Margin of Error (+/-)': 'Margin_of_Error',
                })
#Destinations.to_csv('./SQL_data/Destinations.csv')

Destinations.head(10)

Unnamed: 0_level_0,State/County FIPS,State_FIPs,County_FIPS,County_Name,State_Name,#_Migrated_(2017),Margin_of_Error
Index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,'06001','06','001',Alameda,California,10791,1127
1,'06081','06','081',San Mateo,California,8995,1054
2,'06013','06','013',Contra Costa,California,4085,631
3,'06037','06','037',Los Angeles,California,3726,547
4,'06085','06','085',Santa Clara,California,3383,447
5,'36061','36','061',NY (Manhattan),New York,1419,657
6,'53033','53','033',King,Washington,1293,336
7,'41051','41','051',Multnomah,Oregon,1094,282
8,'36047','36','047',Kings (Brooklyn),New York,887,300
9,'17031','17','031',Cook,Illinois,635,223


# Transform From US Census "QuickFacts Utility"

## Demographics, Age, Income, Median Housing, Median Rents, Commute, Employers


Documentation of the Journey

1. Use the graphical interface to input up to 6 locations (by city, county, state, etc) 
Reference:  https://www.census.gov/quickfacts/fact/table/US/PST045218

** The only data cleaning done in Excel was to add (for out-of-state),  Travis County TX, (Austin is located there),  which was added manually to the "non_CA_counties.csv" files

### Transform Data CleanUp Steps

* Reduce the data size and clean up the naming (for easier reference later on)

* df.drop(columns = ['column name'], inplace = True)-    Drop columns which have no important data
* df.dropna() - Drop rows with NaN
* df.reset_index() - Reset the index because we had dropped out a few rows
* df[:x] - Drop rows, only keep x rows
* df.rename()  - Rename the colums with shorter names so the plots look ok
* df.join all the destinations where people move to from San Francisco county, both within CA and also destinations out of CA

Output to SQL_data

In [2]:
CAcols = list(CA_counties.columns.values)
CAcols 

NameError: name 'CA_counties' is not defined

In [3]:
# Clean up the raw data 
# Select the columns wanted

#CA_df = CA_counties[['Fact',  'San Francisco County, California','Alameda County, California',
#                    'San Mateo County, California', 'Contra Costa County, California',
#                    'Los Angeles County, California','Santa Clara County, California',]]

CA_counties.drop(columns = ['Fact Note'], inplace=True)
CA_counties.drop(columns = ['Value Note for San Francisco County, California'], inplace=True)
CA_counties.drop(columns = ['Value Note for Alameda County, California'], inplace=True)
CA_counties.drop(columns = ['Value Note for San Mateo County, California'], inplace=True)
CA_counties.drop(columns = ['Value Note for Contra Costa County, California'], inplace=True)
CA_counties.drop(columns = ['Value Note for Los Angeles County, California'], inplace=True)
CA_counties.drop(columns = ['Value Note for Santa Clara County, California'], inplace=True)

non_CA_counties.drop(columns = ['Fact Note'], inplace=True)
non_CA_counties.drop(columns = ['San Francisco County, California'], inplace=True)
non_CA_counties.drop(columns = ['Value Note for San Francisco County, California'], inplace=True)
non_CA_counties.drop(columns = ['Value Note for King County, Washington'], inplace=True)
non_CA_counties.drop(columns = ['Value Note for New York County (Manhattan Borough), New York'], inplace=True)
non_CA_counties.drop(columns = ['Value Note for Multnomah County, Oregon'], inplace=True)
non_CA_counties.drop(columns = ['Value Note for Kings County (Brooklyn Borough), New York'], inplace=True)
non_CA_counties.drop(columns = ['Value Note for Cook County, Illinois'], inplace=True)
non_CA_counties.drop(columns = ['Value Note for Travis County, Texas'], inplace=True)
non_CA_counties.drop(columns = ['Travis County, Texas'], inplace=True)


# Remove the rows which have NaNs,  doing inplace needed
CA_counties.dropna(inplace=True)
non_CA_counties.dropna(inplace=True)

# Reset the index to keep everything in order, drop = True means that the original index will be discarded
# Do this because we need to have one DF that shows the row number as a reference (later code)
# Reference:  https://stackoverflow.com/questions/33165734/update-index-after-sorting-data-frame

CA_counties.reset_index(drop=True, inplace=True)
non_CA_counties.reset_index(drop=True, inplace=True)


# Only keep the top 62 rows of data

CA_counties = CA_counties[:62]
non_CA_counties = non_CA_counties[:62]

non_CA_counties

NameError: name 'CA_counties' is not defined

In [None]:
CA_counties.head()

In [None]:
cols = list(non_CA_counties.columns.values)
cols

In [None]:
CA_df = CA_counties.rename(columns = { 'San Francisco County, California': 'San Francisco',
                                      'Alameda County, California': 'Alameda',
                                      'San Mateo County, California': 'San Mateo', 
                                      'Contra Costa County, California': 'Contra Costa',
                                      'Los Angeles County, California':'Los Angeles',
                                      'Santa Clara County, California':'Santa Clara',})


non_CA_df = non_CA_counties.rename(columns = {'New York County (Manhattan Borough), New York': 'NY (Manhattan)',
                                             'King County, Washington': 'King', 'Multnomah County, Oregon': 'Multnomah',
                                             'Kings County (Brooklyn Borough), New York': 'Kings (Brooklyn)',
                                             'Cook County, Illinois': 'Cook', })


CA_df.head()

In [None]:
non_CA_df.head()

## Inspect the DataFrame to see what data to remove

In [None]:
Demo_data = pd.merge(CA_df,non_CA_df, how = "outer" )
#Demo_data2 = Demo_data.set_index(['Fact'])
Demo_data.head


In [None]:
#Drop Rows
# Reference https://chrisalbon.com/python/data_wrangling/pandas_dropping_column_and_rows/
# proper usage of reset_index https://stackoverflow.com/questions/40755680/how-to-reset-index-pandas-dataframe-after-dropna-pandas-dataframe
# drop = True re-assigns the same dataframe the values, with a new index

Demo_summary = Demo_data.drop(Demo_data.index[[1,2,7,8,9,10,11,12,13,14,15,16,17,18,19,24,25,26,27,28,
                                               29,30,31,32,33,34,37,38,39,40,41,42,45,46,53,54,55,56,57,58,59,60]])

Demo_summary = Demo_summary.reset_index(drop=True)

Demo_summary.head()




In [None]:
#Rename the columns
Demographics = Demo_summary.set_index('Fact')

Demographics.head()



In [None]:
demoT = Demographics.T
demoT
demoT_cols=list(demoT.columns.values)
demoT_cols


In [None]:
# Update the column titles to be easier to 

demoT_summary = demoT.rename(columns={'Population estimates, July 1, 2017,  (V2017)': 'Population estimate, 2017', 
                                        'Population, Census, April 1, 2010': 'Population, Census, 2010',
                                        'Persons under 5 years, percent': 'Age <5 yrs, %',
                                        'Persons under 18 years, percent': 'Age <18 yrs, %',
                                        'Persons 65 years and over, percent': 'Age 65 yrs+, %',
                                        'Median value of owner-occupied housing units, 2013-2017': 'Home Median Value, $',
                                        'Median selected monthly owner costs -with a mortgage, 2013-2017': 'Med. Monthly Costs with mortgage, $',
                                        'Median selected monthly owner costs -without a mortgage, 2013-2017': 'Monthly Costs, no mortgage, $',
                                        'Median gross rent, 2013-2017': 'Median gross rent, $',
                                        'In civilian labor force, total, percent of population age 16 years+, 2013-2017': 'Employment, %',
                                        'In civilian labor force, female, percent of population age 16 years+, 2013-2017': 'Employment, Females, %',
                                        'Mean travel time to work (minutes), workers age 16 years+, 2013-2017': "Travel time to work, mean",
                                        'Median household income (in 2017 dollars), 2013-2017': 'Median Household income $',
                                        'Total employer establishments, 2016': 'Total # of employers, 2016',
                                        'Total employment, 2016':'Total Employed, 2016',
                                        'Total annual payroll, 2016 ($1,000)':'Annual payroll, 2016 ($1K)',
                                        'Total nonemployer establishments, 2016': 'Total nonemployers',
                                        'All firms, 2012':'Total # of employers, 2012',
                                        'FIPS Code': 'State/County FIPS'
                          
                                            })
demoT_summary

## Transform to numerical values
Because the raw data in the csv is formatted with $, %, or ',' Pandas will read all data as objects into the DataFrame

Clean the entire table, the user can specify the specific data fact (row) they wish to use
df.replace - Replace the %, $, , in the data to blank
df.apply(pd.to_numeric()) -- now change the objects in each column into numerics, "apply" will apply to all cols
Use errors = 'coerce' to force to an number. If there are alphanumerics, they will become 'NaN's and you will lose the text. In that case use errors = 'ignore'


In [None]:
# Remove non-numerics in the dataframe
cols = demoT_summary.columns

demoT_summary[cols] = demoT_summary[cols].replace({'\$': '', ',': '', '\%':'', '\"': ''}, regex=True)


In [None]:
Demographics = demoT_summary.apply(pd.to_numeric, errors='coerce')

Demographics.index.name ='County_Name'

Demographics

# Reference for how to set the index name https://stackoverflow.com/questions/18022845/pandas-index-column-title-or-name

In [None]:
Demographics.to_csv('./SQL_data/Demographics.csv')

## Income and Mortgage data from census FactFinder Advanced Search utility
Reference:
Use the US Census FactFinder - Advanced Search functionality to get detailed in the area of Employment (including income), Housing (including Mortgage information), and Population demographics.

The utility is fairly easy to use -- but Warning - there is a LOT of data, and often times the data is repeated

Recommendations: Use the filtering and editing functions on the Advanced Search, BEFORE creating your CSV file.

Select all counties of interest first as a filter.
Use the graphical interface to input as many locations as you want (by city, county, state, etc)
Once you have selected all the key locations, you can save the query, which saves time if you are going to do other analyses later on

Edit out to the minimal data you need.

The census provides the data with calculated error, or percent error. Those can be filtered
Remove any columns of data you don't need. It's difficult to change column names with very large datasets, so better to minimize the number of columns if possible

The Income and Mortgage CSV in this file was filtered and edited on the Census website, with some additional description cleaning in Excel.

In [None]:
# Files to load
ACS_data = pd.read_csv("./Resources/2017_income_mortgage.csv")

ACS_data.head()



In [None]:
ACS_data.columns = ['ID', 
                    'County',
                    '% of Mortgages Valued at <$50K',
                         '% of Mortgages Valued at $50-$99K',
                         '% of Mortgages Valued at $100K-$299K',
                         '% of Mortgages Valued at $300K-$499K',
                         '% of Mortgages Valued at $500K-$749K',
                         '% of Mortgages Valued at $750K-$999K',
                         '% of Mortgages Valued at >$1M',
                        'Median Value of Mortgages ($)',
                        '% Household income <$10K',
                        '% Household income $10K-$24K',
                        '% Household income $25K-34K',
                        '% Household income $35K-$49K',
                        '% Household income $50K-$74K',
                        '% Household income $75K-$99K',
                        '% Household income $100K-$150K',
                       '% Household income >$150K',
                       '2017 Household Median Income ($)',
                       'Ratio of Mortgage Value to Income, % <2',
                       'Ratio of Mortgage Value to Income, % 2-2.9',
                       'Ratio of Mortgage Value to Income %, 3-3.9',
                        'Ratio of Mortgage Value to Income, % > 4.0',
                        'ID2']
ACS_data.drop(columns = ['ID'], inplace=True)  
ACS_data.set_index('County', inplace=True)                 

ACS_data.head()

In [None]:
ACS_dataT = ACS_data.T

# Reset the index to keep everything in order, drop = True means that the original index will be discarded
# Do this because we need to have one DF that shows the row number as a reference (later code)
# Reference:  https://stackoverflow.com/questions/33165734/update-index-after-sorting-data-frame

#ACS_dataT.reset_index(inplace=True)

# Remove the rows which have NaNs,  doing inplace needed
ACS_dataT.dropna(inplace=True)

# Only keep the top 25 rows of data
ACS_dataT = ACS_dataT[:25]

ACS_dataT.drop(columns = ['Geography'], inplace=True)  

ACS_dataT.head()

In [None]:
# Rename the columns, look at the DataFrame

ACS_cleanup = ACS_dataT.drop(columns = ['Travis County, Texas'])
ACS_cleanup = ACS_cleanup.rename(columns={
                                      "San Francisco County, California": "San Francisco",
                                 "Alameda County, California":"Alameda",
                                 "San Mateo County, California":"San Mateo",
                                 "Contra Costa County, California":"Contra Costa",
                                "Los Angeles County, California":"Los Angeles",
                                "Santa Clara County, California":"Santa Clara",
                                      "New York County, New York": "NY (Manhattan)",
                                 "King County, Washington":"King",
                                "Multnomah County, Oregon":"Multnomah",
                                "Kings County, New York":"Kings (Brooklyn)",
                                "Cook County, Illinois":"Cook",
                                            })


ACS_cleanup.head()

In [None]:
Financial = ACS_cleanup.T
Financial.index.name = 'County_Name'
Financial.head()


In [None]:
# Move the Facts into the index to get it out of the way since we don't need to clean the numbers in that column
# Making a new DF ca_data,  so  you can always refer to ca_df to see the line number of the row
#ca_data = CA_counties.set_index("Fact")

# Clean the $ and % and " signs from multiple columns, first put the columns put them in a list
# Reference:  https://stackoverflow.com/questions/38516481/trying-to-remove-commas-and-dollars-signs-with-pandas-in-python

cols = Financial.columns

# pass cols to df.replace(), specifying $,%" and , to be replaced by blanks

Financial[cols] = Financial[cols].replace({'\$': '', ',': '', '\%':'', '\"': ''}, regex=True)


# convert all objects to numerics
# reference:  https://stackoverflow.com/questions/36814100/pandas-to-numeric-for-multiple-columns
# https://pandas.pydata.org/pandas-docslist(/stable/reference/api/pandas.to_numeric.html
#cols = ACS_dataT.columns[ACS_dataT.dtypes.eq('object')]
Financial = Financial[cols].apply(pd.to_numeric, errors='ignore')
Financial.head()


In [None]:
Financial.to_csv('./SQL_data/Financial.csv')

## Using US Census API to extract Home Ownership and Home Rental data


In [None]:
#dictionary for in-CA counties

base_url = "https://api.census.gov/data/2017/acs/acs1/profile"

ca_cty_name = ["San Francisco","Alameda","San Mateo","Contra Costa","Los Angeles","Santa Clara"]
ca_st_fips = ["06","06","06","06","06","06"]
ca_cty_fips = ["075","001","081","013","037","085"]


in_ca_dict = {
    "County Name": ca_cty_name,
    "State_FIPS": ca_st_fips,
    "County_FIPS": ca_cty_fips
}

in_ca_df = pd.DataFrame(in_ca_dict)

#dictionary for non-CA counties

nonca_cty_name = ['NY (Manhattan)',"King","Multnomah","Kings (Brooklyn)","Cook"]
nonca_st_fips = ["36","53","41","36","17"]
nonca_cty_fips = ["061","033","051","047","031"]

non_ca_dict = {
    "County Name": nonca_cty_name,
    "State_FIPS": nonca_st_fips,
    "County_FIPS": nonca_cty_fips
    
}



non_ca_df = pd.DataFrame(non_ca_dict)



In [None]:
#collect median home values by county and add to data frames

ca_med_home_val = []
med_home_var = "DP04_0089E"
    
for county_id, state_id in zip(ca_cty_fips, ca_st_fips):
    med_home_val = requests.get(f"{base_url}?get={med_home_var}&for=county:{county_id}&in=state:{state_id}").json()
    ca_med_home_val.append(int(med_home_val[1][0]))
    
in_ca_df["Med_Home_Value"] = ca_med_home_val
in_ca_df.to_csv('./Extra/ca_home_value.csv')
print(in_ca_df)

non_ca_med_home_val = []
med_home_var = "DP04_0089E"

for county_id, state_id in zip(nonca_cty_fips, nonca_st_fips):
    med_home_val = requests.get(f"{base_url}?get={med_home_var}&for=county:{county_id}&in=state:{state_id}").json()
    non_ca_med_home_val.append(int(med_home_val[1][0]))
    
non_ca_df["Med_Home_Value"] = non_ca_med_home_val
non_ca_df.to_csv('./Extra/nonca_home_value.csv')
print(non_ca_df)



In [None]:
#follow the same process for median gross rents

ca_med_rent = []
med_rent_var = "DP04_0134E"
    
for county_id, state_id in zip(ca_cty_fips, ca_st_fips):
    med_rent = requests.get(f"{base_url}?get={med_rent_var}&for=county:{county_id}&in=state:{state_id}").json()
    ca_med_rent.append(int(med_rent[1][0]))
    
in_ca_df["Med_Rent"] = ca_med_rent

in_ca_df.to_csv('./Extra/ca_rents.csv')
print(in_ca_df)

non_ca_med_rent = []
med_rent_var = "DP04_0134E"

for county_id, state_id in zip(nonca_cty_fips, nonca_st_fips):
    med_rent= requests.get(f"{base_url}?get={med_rent_var}&for=county:{county_id}&in=state:{state_id}").json()
    non_ca_med_rent.append(int(med_rent[1][0]))
    
non_ca_df["Med_Rent"] = non_ca_med_rent
non_ca_df.to_csv('./Extra/nonca_rents.csv')
print(non_ca_df)

In [None]:
#follow the same process for home owner rate

ca_own_rate = []
home_own_var = "DP04_0046PE"
    
for county_id, state_id in zip(ca_cty_fips, ca_st_fips):
    own_rate = requests.get(f"{base_url}?get={home_own_var}&for=county:{county_id}&in=state:{state_id}").json()
    ca_own_rate.append(float(own_rate[1][0]))
    
in_ca_df["Home Own Rate"] = ca_own_rate
in_ca_df.to_csv('./Extra/ca_homeowner_rates.csv')
print(in_ca_df)

nonca_own_rate = []
home_own_var = "DP04_0046PE"
    
for county_id, state_id in zip(nonca_cty_fips, nonca_st_fips):
    own_rate = requests.get(f"{base_url}?get={home_own_var}&for=county:{county_id}&in=state:{state_id}").json()
    nonca_own_rate.append(float(own_rate[1][0]))
    
non_ca_df["Home Own Rate"] = nonca_own_rate
non_ca_df.to_csv('./Extra/nonca_homeowner_rates.csv')
print(non_ca_df)

In [None]:
Home = pd.merge(in_ca_df, non_ca_df, how = 'outer')

# Rename the columns to harmonize to other tables
Home = Home.rename(columns = {
        'County Name':'County_Name', "Home Own Rate":"Home Ownership Rate, %",       
})

Home.index.name = 'Index'
Home.to_csv('./SQL_data/Home.csv')
            
Home

# Rename the columns to harmonize with other tables