# Enrich

Frquently the areas for analysis needing enrichment are not standard geographies. Rather, these gographies come from other data sources, and we need to add demographics to these areas for subsequent analysis. Enrichment enables selecting, retrieving demographic variables for analysis.

In [1]:
from arcgis.features import FeatureSet
from dm import Country, utils
from pathlib import Path

## Load the Geographies

Many times the geographies are delineated or come from some other source. In this case, we are loading trade areas from a saved JSON file. We start the process by loading the data into a Spatialy Enabled DataFrame.

In [2]:
dir_prj = Path('./').absolute().parent

dir_data = dir_prj / 'data'
dir_test = dir_data / 'test'

ta_pth = dir_test / 'trade_areas.json'

ta_pth

WindowsPath('D:/projects/demographic-modeling-module/data/test/trade_areas.json')

Loading the JSON from a file previously exported using ArcGIS Pro, all we are doing here is loading first into a FeatureSet, exporting to a Spatially Enabled DataFrame, and removing extemperanious columns.

In [3]:
drop_cols = ['OBJECTID', 'AREA_ID', 'AREA_DESC', 'AREA_DESC2', 'AREA_DESC3', 'RING',
             'RING_DEFN', 'STORE_LAT', 'STORE_LON', 'STORE_ID', 'LOCNUM', 'CONAME',
             'STREET', 'CITY', 'STATE', 'STATE_NAME', 'ZIP', 'ZIP4', 'NAICS', 'SIC',
             'SALESVOL', 'HDBRCH', 'ULTNUM', 'PUBPRV', 'EMPNUM', 'FRNCOD', 'ISCODE',
             'SQFTCODE', 'LOC_NAME', 'STATUS', 'SCORE', 'SOURCE', 'REC_TYPE']

with open(ta_pth) as ta_file:
    ta_df = FeatureSet.from_json(ta_file.read()).sdf.drop(columns=drop_cols)
    ta_df.spatial.set_geometry('SHAPE')
    
ta_df.head()

Unnamed: 0,id,brand_name,brand_name_category,Shape_Length,Shape_Area,SHAPE
0,216082099,SOUTH END ACE HARDWARE,SOUTH END ACE HARDWARE,2.074207,0.011872,"{""rings"": [[[-122.46810973099997, 47.156592059..."
1,371889957,GRAHAM ACE HARDWARE,GRAHAM ACE HARDWARE,1.883413,0.011683,"{""rings"": [[[-122.29338820699996, 47.116617212..."
2,460556608,OAKBROOK ACE HARDWARE,OAKBROOK ACE HARDWARE,0.947195,0.006806,"{""rings"": [[[-122.52649995699994, 47.223291664..."
3,405129289,GIG HARBOR ACE HARDWARE,GIG HARBOR ACE HARDWARE,1.774549,0.006318,"{""rings"": [[[-122.62508960799994, 47.410589544..."
4,404324160,AGRISHOP ACE HARDWARE,AGRISHOP ACE HARDWARE,1.362285,0.009509,"{""rings"": [[[-122.49640653299997, 47.291563313..."


If we want to see where these trade areas are, we can takea look at them in a map.

In [4]:
ta_df.spatial.plot()

MapView(layout=Layout(height='400px', width='100%'))

## Enrich

From here, we can begin the process of enrichment using a Country object instance. First though, we need to decide which enrichment varialbes to use. Using an introspection property on the Country, `enrich_variables`, we can investigate what is available.

In [5]:
cntry = Country('USA')

In [6]:
e_vars = cntry.enrich_variables

e_vars

Unnamed: 0,name,alias,type,vintage,data_collection,enrich_str,enrich_field_name
0,AGE0_CY,2020 Population Age <1,COUNT,2020,1yearincrements,1yearincrements.AGE0_CY,F1yearincrements_AGE0_CY
1,AGE1_CY,2020 Population Age 1,COUNT,2020,1yearincrements,1yearincrements.AGE1_CY,F1yearincrements_AGE1_CY
2,AGE2_CY,2020 Population Age 2,COUNT,2020,1yearincrements,1yearincrements.AGE2_CY,F1yearincrements_AGE2_CY
3,AGE3_CY,2020 Population Age 3,COUNT,2020,1yearincrements,1yearincrements.AGE3_CY,F1yearincrements_AGE3_CY
4,AGE4_CY,2020 Population Age 4,COUNT,2020,1yearincrements,1yearincrements.AGE4_CY,F1yearincrements_AGE4_CY
...,...,...,...,...,...,...,...
9054,ACSRMV2000,2018 RHHs/Moved In: 2000-2009 (ACS 5-Yr),COUNT,2014-2018,yearmovedin,yearmovedin.ACSRMV2000,yearmovedin_ACSRMV2000
9055,ACSRMV1990,2018 RHHs/Moved In: 1990-1999 (ACS 5-Yr),COUNT,2014-2018,yearmovedin,yearmovedin.ACSRMV1990,yearmovedin_ACSRMV1990
9056,ACSRMV1989,2018 RHHs/Moved In: 1989/Before (ACS 5-Yr),COUNT,2014-2018,yearmovedin,yearmovedin.ACSRMV1989,yearmovedin_ACSRMV1989
9057,ACSMEDYRMV,2018 Median Year Householder Moved In (ACS 5-Yr),COUNT,2014-2018,yearmovedin,yearmovedin.ACSMEDYRMV,yearmovedin_ACSMEDYRMV


Frequently, I like to start my analysis and demostrate using current year key variables in the Key Facts data collection. Using Pandas selection methods, we can identify these variables.

In [7]:
key_vars = e_vars[(e_vars.data_collection.str.startswith('Key')) & (e_vars.name.str.endswith('CY'))]

key_vars

Unnamed: 0,name,alias,type,vintage,data_collection,enrich_str,enrich_field_name
6445,TOTPOP_CY,2020 Total Population,COUNT,2020,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY
6447,GQPOP_CY,2020 Group Quarters Population,COUNT,2020,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY
6448,DIVINDX_CY,2020 Diversity Index,COUNT,2020,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY
6451,TOTHH_CY,2020 Total Households,COUNT,2020,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY
6453,AVGHHSZ_CY,2020 Average Household Size,COUNT,2020,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY
6454,MEDHINC_CY,2020 Median Household Income,CURRENCY,2020,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY
6456,AVGHINC_CY,2020 Average Household Income,CURRENCY,2020,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY
6458,PCI_CY,2020 Per Capita Income,CURRENCY,2020,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY
6462,TOTHU_CY,2020 Total Housing Units,COUNT,2020,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY
6464,OWNER_CY,2020 Owner Occupied HUs,COUNT,2020,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY


These variables can now be used to enrich the trade areas loaded into a Spatially Enabled DataFrame. Once retrieved, we also can view the metadata and preview the returned tabular data.

In [8]:
enrich_df = ta_df.dm.enrich(key_vars)

enrich_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33 entries, 0 to 32
Data columns (total 26 columns):
 #   Column                 Non-Null Count  Dtype   
---  ------                 --------------  -----   
 0   id                     33 non-null     object  
 1   brand_name             33 non-null     object  
 2   brand_name_category    33 non-null     object  
 3   Shape_Length           33 non-null     float64 
 4   Shape_Area             33 non-null     float64 
 5   KeyUSFacts_TOTPOP_CY   33 non-null     float64 
 6   KeyUSFacts_GQPOP_CY    33 non-null     float64 
 7   KeyUSFacts_DIVINDX_CY  33 non-null     float64 
 8   KeyUSFacts_TOTHH_CY    33 non-null     float64 
 9   KeyUSFacts_AVGHHSZ_CY  33 non-null     float64 
 10  KeyUSFacts_MEDHINC_CY  33 non-null     float64 
 11  KeyUSFacts_AVGHINC_CY  33 non-null     float64 
 12  KeyUSFacts_PCI_CY      33 non-null     float64 
 13  KeyUSFacts_TOTHU_CY    33 non-null     float64 
 14  KeyUSFacts_OWNER_CY    33 non-null     float

In [9]:
enrich_df.head()

Unnamed: 0,id,brand_name,brand_name_category,Shape_Length,Shape_Area,KeyUSFacts_TOTPOP_CY,KeyUSFacts_GQPOP_CY,KeyUSFacts_DIVINDX_CY,KeyUSFacts_TOTHH_CY,KeyUSFacts_AVGHHSZ_CY,...,KeyUSFacts_VACANT_CY,KeyUSFacts_MEDVAL_CY,KeyUSFacts_AVGVAL_CY,KeyUSFacts_POPGRW10CY,KeyUSFacts_HHGRW10CY,KeyUSFacts_FAMGRW10CY,KeyUSFacts_DPOP_CY,KeyUSFacts_DPOPWRK_CY,KeyUSFacts_DPOPRES_CY,SHAPE
0,216082099,SOUTH END ACE HARDWARE,SOUTH END ACE HARDWARE,2.074207,0.011872,69019.0,639.0,69.3,23970.0,2.85,...,1081.0,261541.0,309639.0,1.68,1.59,1.58,57337.0,17350.0,39987.0,"{""rings"": [[[-122.46810973099997, 47.156592059..."
1,371889957,GRAHAM ACE HARDWARE,GRAHAM ACE HARDWARE,1.883413,0.011683,54916.0,105.0,56.3,17893.0,3.06,...,595.0,302252.0,347842.0,1.95,1.86,1.79,43429.0,12835.0,30594.0,"{""rings"": [[[-122.29338820699996, 47.116617212..."
2,460556608,OAKBROOK ACE HARDWARE,OAKBROOK ACE HARDWARE,0.947195,0.006806,75414.0,1443.0,74.5,30795.0,2.4,...,2271.0,316336.0,395257.0,0.79,0.76,0.65,71701.0,28211.0,43490.0,"{""rings"": [[[-122.52649995699994, 47.223291664..."
3,405129289,GIG HARBOR ACE HARDWARE,GIG HARBOR ACE HARDWARE,1.774549,0.006318,40174.0,1045.0,43.1,17006.0,2.3,...,954.0,461563.0,569335.0,1.56,1.49,1.53,41425.0,19148.0,22277.0,"{""rings"": [[[-122.62508960799994, 47.410589544..."
4,404324160,AGRISHOP ACE HARDWARE,AGRISHOP ACE HARDWARE,1.362285,0.009509,146166.0,5533.0,67.7,60959.0,2.31,...,5686.0,304730.0,369759.0,1.0,1.04,0.84,172244.0,92722.0,79522.0,"{""rings"": [[[-122.49640653299997, 47.291563313..."


## Save Results

Finally, it can be exported to an Esri Feature Class, Esri's spatial tabular format. Since supporting field name aliases, we can use a utility function to add these much more human readable names to the output data. This way, instead of seeing `KeyUSFacts_FAMGRW10CY` when viewing the data in ArcGIS, we will see `2010-2020 Growth Rate: Families`, which is much easier to understand.

In [10]:
ta_enrich_fc = '../data/interim/interim.gdb/seattle_trade_areas_enriched'

out_fc = enrich_df.spatial.to_featureclass(ta_enrich_fc)

utils.add_enrich_aliases(out_fc, cntry)

out_fc

'D:\\projects\\demographic-modeling-module\\data\\interim\\interim.gdb\\seattle_trade_areas_enriched'