<a href="https://colab.research.google.com/github/nina-adhikari/enjoyment-maximizing-maps/blob/main/citydata.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data cleaning + preprocessing**
In this notebook, we import data from a few spreadsheets, strip it and combine it. The code here is written to run in Google Colab (files must be stored in the Google Drive folder drive/MyDrive/walkability) but it can be modified to run locally.

The datasets are:

*   **500 Cities: Local Data for Better Health**: This dataset contains estimates for 27 measures of chronic disease related to unhealthy behaviors (5), health outcomes (13), and use of preventive services (9). It includes estimates for the 500 largest US cities and approximately 28,000 census tracts within these cities.  Data sources used to generate these measures include Behavioral Risk Factor Surveillance System (BRFSS) data (2017, 2016), Census Bureau 2010 census population data, and American Community Survey (ACS) 2013-2017, 2012-2016 estimates. More information about the methodology can be found at www.cdc.gov/500cities. Link to dataset: https://data.cdc.gov/500-Cities-Places/500-Cities-Local-Data-for-Better-Health-2019-relea/6vp6-wxuq/about_data
*   **Smart Location Database (SLD)**: summarizes more than 90 different indicators associated with the built environment and location efficiency. Indicators include density of development, diversity of land use, street network design, and accessibility to destinations as well as various demographic and employment statistics. Most attributes are available for all U.S. block groups. Link to direct download of dataset: https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
*   **Median household income** Median household income in the past 12 months (in 2022 inflation-adjusted dollars), for the District of Columbia only, year 2017. Link to dataset:https://data.census.gov/table/ACSDT5Y2017.B19013?q=b19013&g=040XX00US11$1400000&moe=false&tp=true

We focus our analysis in the **District of Columbia only**. To do so, it is neccesary to filter out other cities. We also filter out factors that we do not wish to include in the analysis.



In [None]:
#import packages to handle data
import geopandas as gpd #extends the datatypes used by pandas to allow spatial operations on geometric types
import pandas as pd
!pip install mapclassify #Choropleth map classification

DIR = 'drive/MyDrive/walkability/'   # directory where all files are stored

In [2]:
#allow colab to access your Google Drive, an authentication window will pop-up
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

Mounted at /content/drive


In [20]:
#import datasets stored in drive/MyDrive/walkability/
health = DIR + 'health.csv'      # 500 Cities: Local Data for Better Health https://data.cdc.gov/500-Cities-Places/500-Cities-Local-Data-for-Better-Health-2019-relea/6vp6-wxuq/about_data
epacsv = DIR + 'epdownload.csv'  # Smart Location Database https://edg.epa.gov/EPADataCommons/public/OA/EPA_SmartLocationDatabase_V3_Jan_2021_Final.csv
incomesource = DIR + 'ACSDT5Y2017.B19013-2024-03-26T221220.csv'      # Median household income https://data.census.gov/table/ACSDT5Y2017.B19013?q=b19013&g=040XX00US11$1400000&moe=false&tp=true
income = DIR + 'income.csv'      # we will store the modified csv here
walk_gdb = DIR + 'walk_index.gdb'

Clean up income dataset. Important: the ID of each row, for all files, is to be the census tract ID.

In [None]:
#reformat the income file to have two columns: tract and income
with open(incomesource, 'r') as incomefile:
  #remove all unnecessary wording and spaces
  incomestr = incomefile.read().replace(
      ', District of Columbia, District of Columbia",""\n"    Estimate', '').replace(
          'Median household income in the past 12 months (in 2017 inflation-adjusted dollars)', 'Income').replace(
              '\ufeff"Label (Grouping)"', 'Tract').replace(
                  'Census Tract ', '')

#write reformatted data into new file
with open(income, 'w') as incomefile:
  incomefile.write(incomestr)

inc = gpd.read_file(income).drop('geometry', axis=1) #converts GeoDataFrame to a normal DataFrame by dropping the geometry column?
inc['Income'] = inc['Income'].str.replace(',','') #replace comma format for numbers
inc = inc.apply(pd.to_numeric, errors='coerce') #convert to numeric, coerce makes invalid parsing be set as NaN
inc['Tract'] = round(inc['Tract'].astype(float)*100).astype(int) #format census tract id as integer
inc.set_index('Tract', inplace=True) #tract id is the index of this dataframe
inc #sanity check

Clean up the SLD dataset. We are keeping only the rows corresponding to the District of Columbia (state FIPS code 11), and the columns that are not derived from other variables in the dataset (see Documentation: https://www.epa.gov/system/files/documents/2023-10/epa_sld_3.0_technicaldocumentationuserguide_may2021_0.pdf)

In [61]:
columns_to_drop = ['D1A', 'D1B', 'D1C', 'D1C5_RET', 'D1C5_OFF', 'D1C5_IND', 'D1C5_SVC',
                   'D1C5_ENT', 'D1C8_RET', 'D1C8_OFF', 'D1C8_IND', 'D1C8_SVC', 'D1C8_ENT',
                   'D1C8_ED', 'D1C8_HLTH', 'D1C8_PUB', 'D1D', 'D1_FLAG', 'D2A_JPHH',
                   'D2B_E5MIX', 'D2B_E5MIXA', 'D2B_E8MIX', 'D2B_E8MIXA', 'D2A_EPHHM',
                   'D2C_TRPMX1', 'D2C_TRPMX2', 'D2C_TRIPEQ', 'D2R_JOBPOP', 'D2R_WRKEMP',
                   'D2A_WRKEMP', 'D2C_WREMLX', 'D4D', 'D4E', 'D5CR', 'D5CRI', 'D5CE',
                   'D5CEI', 'D5DR', 'D5DRI', 'D5DE', 'D5DEI',
                   'D2A_Ranked', 'D2B_Ranked', 'D3B_Ranked', 'D4A_Ranked',
                   'GEOID20'
                   ]
epa_init = gpd.read_file(epacsv, where="STATEFP='11'").drop(columns=columns_to_drop)   # only interested in DC

# GEO ID data is corrupted in csv because it was stored as a float, so we generate it using other data and padding
epa_init['GEOID10'] = epa_init['STATEFP'].str.zfill(2) + epa_init['COUNTYFP'].str.zfill(3) + epa_init['TRACTCE'].str.zfill(6) + epa_init['BLKGRPCE']

epa_init['TRACTCE'] = epa_init['TRACTCE'].astype(int) # format as integer to use for merging later
epa_init.set_index('OBJECTID', inplace=True)

Since the CSV doesn't have geometry data for plotting, we import it from another file (which comes from the same EPA dataset):

In [62]:
gdb = gpd.read_file(walk_gdb, where="STATEFP='11'")[['GEOID10', 'geometry']]
gdb['GEOID10'] = gdb['GEOID10'].astype(int)
gdb.set_index('GEOID10', inplace=True)
epa_init['GEOID10'] = epa_init['GEOID10'].astype(int)
epa = epa_init.join(other=gdb, on='GEOID10', how='left', validate='1:1', lsuffix='epa')
epa.drop(columns=['geometryepa'], inplace=True)
epa

Unnamed: 0_level_0,GEOID10,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,CSA,CSA_Name,CBSA,CBSA_Name,CBSA_POP,...,D4B050,D4C,D5AR,D5AE,D5BR,D5BE,NatWalkInd,Shape_Length,Shape_Area,geometry
OBJECTID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
61344,110010040023,11,1,4002,3,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,0.965849027,66.33,358602,236111,564752,295441,17.16666667,1201.674704,68587.96738,"MULTIPOLYGON (((1617361.678 1927352.465, 16173..."
61345,110010041001,11,1,4100,1,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,0.999999674,97,373993,242930,569837,296090,17.83333333,1328.83334,103759.0251,"MULTIPOLYGON (((1617028.696 1927124.163, 16170..."
61346,110010042022,11,1,4202,2,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,1,113.33,402080,260847,622134,324820,17.5,1450.239008,96448.75188,"MULTIPOLYGON (((1617427.574 1927158.696, 16174..."
61347,110010053014,11,1,5301,4,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,1,132.33,397964,251302,652515,363001,15.66666667,1158.771615,68329.30242,"MULTIPOLYGON (((1617857.365 1926906.416, 16178..."
61348,110010050021,11,1,5002,1,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,1,99.67,395250,254008,855234,380231,13.16666667,801.1621232,39703.77108,"MULTIPOLYGON (((1618967.176 1926639.840, 16189..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61789,110010007022,11,1,702,2,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,0,32.67,332374,223170,460099,169268,12.33333333,1338.001954,73175.60627,"MULTIPOLYGON (((1614569.609 1927830.272, 16146..."
61790,110010007012,11,1,701,2,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,0,24,327595,220368,400177,172329,12,1175.108163,80152.05213,"MULTIPOLYGON (((1614183.231 1928298.295, 16141..."
61791,110010013023,11,1,1302,3,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,0.505727605,62.67,292105,210160,611544,312928,13.16666667,4543.930515,508923.4088,"MULTIPOLYGON (((1615366.380 1930018.785, 16153..."
61792,110010055004,11,1,5500,4,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,1,113.67,428670,251279,731570,359141,16.33333333,908.923,49850.70702,"MULTIPOLYGON (((1617341.161 1926369.481, 16173..."


Clean up the Local Data for Better Health dataset. Keep also only DC.

In [63]:
hl = gpd.read_file(health, where="StateAbbr='DC' AND GeographicLevel='Census Tract'", include_fields=['StateAbbr', 'GeographicLevel', 'UniqueID', 'MeasureId', 'CityFIPS', 'TractFIPS', 'Data_Value'])
hl['TractCE'] = hl['TractFIPS'].str.removeprefix('110010').astype(int) #format index column
hlp = hl.pivot(index='TractCE', columns='MeasureId', values='Data_Value') #pivot table to have the columns be the MeasureId
hlp

MeasureId,ACCESS2,ARTHRITIS,BINGE,BPHIGH,BPMED,CANCER,CASTHMA,CHD,CHECKUP,CHOLSCREEN,...,KIDNEY,LPA,MAMMOUSE,MHLTH,OBESITY,PAPTEST,PHLTH,SLEEP,STROKE,TEETHLOST
TractCE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100,3.6,15.8,26.0,21.8,73.1,6.8,7.7,3.7,75.5,85.0,...,2.0,14.6,80.5,7.4,17.3,88.6,5.9,27.4,1.8,4.2
201,12.9,5.1,23.9,11.7,25.0,1.0,12.3,1.6,65.6,44.3,...,1.9,31.4,77.1,26.2,22.2,70.7,9.9,34.9,1.1,26.9
202,4.8,10.6,30.3,15.5,64.8,4.0,8.7,2.6,71.4,73.2,...,1.6,15.1,79.3,10.3,15.8,82.4,5.5,29.0,1.3,7.4
300,4.5,10.3,30.1,15.2,61.7,3.9,8.5,2.3,70.4,76.1,...,1.5,15.0,80.6,9.6,17.3,86.3,5.5,29.2,1.3,5.7
400,4.8,18.4,22.7,25.4,75.8,7.3,8.1,4.6,77.0,85.3,...,2.4,17.9,79.7,8.3,18.8,86.8,7.6,28.7,2.3,6.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10700,5.1,8.7,30.0,14.1,58.8,3.1,8.4,2.1,69.6,72.7,...,1.4,15.3,81.6,10.0,16.6,84.8,5.2,29.7,1.2,6.2
10800,8.5,4.4,30.4,8.8,28.1,1.1,10.9,1.1,66.5,53.8,...,1.2,21.0,77.3,18.5,16.4,74.6,6.1,33.3,0.7,19.5
10900,14.9,17.1,19.0,34.3,69.3,3.3,13.6,4.8,79.6,71.9,...,3.4,36.4,82.6,18.7,36.1,85.2,14.0,47.5,4.3,37.5
11000,4.4,18.9,21.2,30.4,78.7,7.3,8.6,4.5,80.7,87.5,...,2.7,17.6,82.3,7.5,21.7,89.6,7.1,31.2,2.7,5.7


Finally, merge the datasets on the Census tract FIPS code.

In [64]:
firstjoin = epa.join(other=hlp, on='TRACTCE', how='left', validate='m:1')           # merge EPA and CDC
secondjoin = firstjoin.join(other=inc, on='TRACTCE', how='left', validate='m:1')    # merge above with census income data # m:1 checks if join keys are unique in right dataset
secondjoin

Unnamed: 0_level_0,GEOID10,STATEFP,COUNTYFP,TRACTCE,BLKGRPCE,CSA,CSA_Name,CBSA,CBSA_Name,CBSA_POP,...,LPA,MAMMOUSE,MHLTH,OBESITY,PAPTEST,PHLTH,SLEEP,STROKE,TEETHLOST,Income
OBJECTID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
61344,110010040023,11,1,4002,3,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,12.6,82.0,7.5,16.5,88.9,4.4,29.2,1.1,4.0,102455.0
61345,110010041001,11,1,4100,1,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,14.3,81.3,7.0,17.3,88.4,5.7,27.8,1.7,4.0,143586.0
61346,110010042022,11,1,4202,2,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,12.7,81.8,7.8,16.1,88.2,4.4,29.3,1.0,4.1,105978.0
61347,110010053014,11,1,5301,4,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,13.6,81.3,8.0,16.9,88.5,4.7,29.6,1.1,4.3,90402.0
61348,110010050021,11,1,5002,1,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,19.7,82.4,10.3,22.2,87.1,6.9,33.6,1.7,10.2,87969.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
61789,110010007022,11,1,702,2,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,16.4,80.2,10.0,18.2,87.2,5.9,31.2,1.3,7.2,71671.0
61790,110010007012,11,1,701,2,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,17.1,79.8,8.1,18.3,87.8,7.0,28.1,2.3,5.8,89889.0
61791,110010013023,11,1,1302,3,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,14.6,80.5,8.0,17.0,87.9,5.5,29.2,1.5,5.1,113300.0
61792,110010055004,11,1,5500,4,548,"Washington-Baltimore-Arlington, DC-MD-VA-WV-PA",47900,"Washington-Arlington-Alexandria, DC-VA-MD-WV",6151521,...,15.8,81.3,9.0,18.5,87.3,5.8,30.3,1.6,5.9,79306.0


Print resulting columns, and create a dictionary for the columns.

In [65]:
print('there are',len(secondjoin.columns.to_list()),'columns')
print(secondjoin.columns.to_list())

there are 100 columns
['GEOID10', 'STATEFP', 'COUNTYFP', 'TRACTCE', 'BLKGRPCE', 'CSA', 'CSA_Name', 'CBSA', 'CBSA_Name', 'CBSA_POP', 'CBSA_EMP', 'CBSA_WRK', 'Ac_Total', 'Ac_Water', 'Ac_Land', 'Ac_Unpr', 'TotPop', 'CountHU', 'HH', 'P_WrkAge', 'AutoOwn0', 'Pct_AO0', 'AutoOwn1', 'Pct_AO1', 'AutoOwn2p', 'Pct_AO2p', 'Workers', 'R_LowWageWk', 'R_MedWageWk', 'R_HiWageWk', 'R_PCTLOWWAGE', 'TotEmp', 'E5_Ret', 'E5_Off', 'E5_Ind', 'E5_Svc', 'E5_Ent', 'E8_Ret', 'E8_off', 'E8_Ind', 'E8_Svc', 'E8_Ent', 'E8_Ed', 'E8_Hlth', 'E8_Pub', 'E_LowWageWk', 'E_MedWageWk', 'E_HiWageWk', 'E_PctLowWage', 'D3A', 'D3AAO', 'D3AMM', 'D3APO', 'D3B', 'D3BAO', 'D3BMM3', 'D3BMM4', 'D3BPO3', 'D3BPO4', 'D4A', 'D4B025', 'D4B050', 'D4C', 'D5AR', 'D5AE', 'D5BR', 'D5BE', 'NatWalkInd', 'Shape_Length', 'Shape_Area', 'geometry', 'ACCESS2', 'ARTHRITIS', 'BINGE', 'BPHIGH', 'BPMED', 'CANCER', 'CASTHMA', 'CHD', 'CHECKUP', 'CHOLSCREEN', 'COLON_SCREEN', 'COPD', 'COREM', 'COREW', 'CSMOKING', 'DENTAL', 'DIABETES', 'HIGHCHOL', 'KIDNEY', 'L

In [66]:
#create dictionary of columns
cols_dict_data = 'drive/MyDrive/walkability/column-dictionary.csv'

# import csv
import csv

#create dictionary with the meaning of the cols names
with open(cols_dict_data, mode='r', encoding='utf-8-sig') as infile:
    reader = csv.reader(infile)
    cols_dict = {rows[0]:rows[1] for rows in reader}

cols_dict

{'GEOID10': 'Census block group 12-digit FIPS code (2010)',
 'GEOID20': 'Census block group 12-digit FIPS code (2018)',
 'STATEFP': 'State FIPS code',
 'COUNTYFP': 'County FIPS cod',
 'TRACTCE': 'Census tract FIPS code in which CBG resides',
 'BLKGRPCE': 'Census block group FIPS code in which CBG resides',
 'CSA': 'Combined Statistical Area (CSA) Code',
 'CSA_Name': 'Name of CSA in which CBG resides',
 'CBSA': 'FIPS for Core-Based Statistical Area (CBSA) in which\nCBG resides',
 'CBSA_Name': 'Name of CBSA in which CBG resides',
 'CBSA_POP': 'Total population in CBSA',
 'CBSA_EMP': 'Total employment in CBSA',
 'CBSA_WRK': 'Total number of workers that live in CBSA',
 'Ac_Total': 'Total geometric area (acres) of the CBG',
 'Ac_Water': 'Total water area (acres)',
 'Ac_Land': 'Total land area (acres)',
 'Ac_Unpr': 'Total land area (acres) that is not protected from\ndevelopment (i.e., not a park, natural area or conservation\narea)',
 'TotPop': 'Population, 2018',
 'CountHU': 'Housing unit

Save the dictionary so we don't have to run the above script every time:

In [67]:
import pickle

picklefile = DIR + 'cols_dict.pickle'
with open(picklefile, 'wb') as handle:
    pickle.dump(cols_dict, handle, protocol=pickle.HIGHEST_PROTOCOL)

Saving the dataframe to a GeoPackage file so we don't have to run the above scripts every time we want the data. We can load it directly from the file.

In [69]:
secondjoin = secondjoin.apply(pd.to_numeric, errors='ignore')
secondjoin.to_file("drive/MyDrive/walkability/maindata.gpkg", driver="GPKG")

In [70]:
finaljoin = gpd.read_file("drive/MyDrive/walkability/maindata.gpkg")
finaljoin.set_index('OBJECTID', inplace=True)
finaljoin = finaljoin.apply(pd.to_numeric, errors='ignore')

Checking that what we load from the file is the same as what we stored:

In [71]:
finaljoin.sort_index(axis=1).equals(secondjoin.sort_index(axis=1))

True

Check that map features work:

In [73]:
finaljoin.explore('GEOID10')

In [74]:
dict(finaljoin.loc['61476'])

{'GEOID10': 110010073011,
 'STATEFP': 11,
 'COUNTYFP': 1,
 'TRACTCE': 7301,
 'BLKGRPCE': 1,
 'CSA': 548,
 'CSA_Name': 'Washington-Baltimore-Arlington, DC-MD-VA-WV-PA',
 'CBSA': 47900,
 'CBSA_Name': 'Washington-Arlington-Alexandria, DC-VA-MD-WV',
 'CBSA_POP': 6151521,
 'CBSA_EMP': 2806497,
 'CBSA_WRK': 2713860,
 'Ac_Total': 2427.426139,
 'Ac_Water': 1269.851595,
 'Ac_Land': 1157.574544,
 'Ac_Unpr': 247.02735,
 'TotPop': 4746,
 'CountHU': 1383,
 'HH': 1280,
 'P_WrkAge': 0.609,
 'AutoOwn0': 88,
 'Pct_AO0': 0.06875,
 'AutoOwn1': 497,
 'Pct_AO1': 0.38828125,
 'AutoOwn2p': 695,
 'Pct_AO2p': 0.54296875,
 'Workers': 717,
 'R_LowWageWk': 179,
 'R_MedWageWk': 226,
 'R_HiWageWk': 312,
 'R_PCTLOWWAGE': 0.249651325,
 'TotEmp': 566,
 'E5_Ret': 4,
 'E5_Off': 44,
 'E5_Ind': 10,
 'E5_Svc': 496,
 'E5_Ent': 12,
 'E8_Ret': 4,
 'E8_off': 44,
 'E8_Ind': 10,
 'E8_Svc': 437,
 'E8_Ent': 12,
 'E8_Ed': 43,
 'E8_Hlth': 16,
 'E8_Pub': 0,
 'E_LowWageWk': 52,
 'E_MedWageWk': 82,
 'E_HiWageWk': 432,
 'E_PctLowWage': 