# Introspection

Discovering what is available and distilling this down to something usable is the first step in analysis. Consequently, this is the first piece of functionality we added support for, _introspection_. This provides the ability to discover what countries are available, and within a country, what enrichment variables are available.

In [1]:
import os

from arcgis.gis import GIS
from arcgis.geoenrichment import get_countries, Country
from dotenv import find_dotenv, load_dotenv

load_dotenv(find_dotenv())

True

## GIS _Source_

The source GIS being used determines the countries available. Business Analyst can be accessed either locally (ArcGIS Pro with Business Analyst and Data) or through a connection to a Web GIS (ArcGIS Enterprise or ArcGIS Online). In this case, we are connecting to an instance of ArcGIS Online.

In [2]:
gis_agol = GIS(
    url=os.getenv('ESRI_GIS_URL'), 
    username=os.getenv('ESRI_GIS_USERNAME'),
    password=os.getenv('ESRI_GIS_PASSWORD')
)

gis_agol

## Discovering Countries

Since the data is organized into countries, this is the first instrospection step, discovering countries.

### Country Source - local

A local `gis` source can be used by passing in an instance of the `GIS` object created using the `'pro'` keyword. As you can see, I have quite a few datasets installed on my machine.

In [3]:
get_countries(GIS('pro'))

Unnamed: 0,iso2,iso3,country_name,vintage,country_id,data_source_id
0,CA,CAN,Canada,2020,CAN_ESRI_2019,LOCAL;;CAN_ESRI_2019
1,JP,JPN,Japan,2020,JAPAN2020,LOCAL;;JAPAN2020
2,US,USA,United States,2019,USA_ESRI_2019,LOCAL;;USA_ESRI_2019
3,US,USA,United States,2020,USA_ESRI_2020,LOCAL;;USA_ESRI_2020
4,US,USA,United States,2021,USA_ESRI_2021,LOCAL;;USA_ESRI_2021


### Country Source - Web GIS

Similarly, we can access the countries available on the Web GIS through the `arcpy.gis.GIS` object. Obviously, if this is ArcGIS Online, this is a _lot_ of countries.

In [4]:
get_countries(gis_agol)

Unnamed: 0,iso2,iso3,country_name,datasets,default_dataset,alt_name,continent
0,AL,ALB,Albania,[ALB_MBR_2020],ALB_MBR_2020,ALBANIA,Europe
1,DZ,DZA,Algeria,[DZA_MBR_2019],DZA_MBR_2019,ALGERIA,Africa
2,AD,AND,Andorra,[AND_MBR_2020],AND_MBR_2020,ANDORRA,Europe
3,AO,AGO,Angola,[AGO_MBR_2019],AGO_MBR_2019,ANGOLA,Africa
4,AI,AIA,Anguilla,[AIA_MBR_2020],AIA_MBR_2020,ANGUILLA,North America
...,...,...,...,...,...,...,...
149,UZ,UZB,Uzbekistan,[UZB_MBR_2020],UZB_MBR_2020,UZBEKISTAN,Asia
150,VE,VEN,Venezuela,[VEN_MBR_2020],VEN_MBR_2020,"VENEZUELA, BOLIVARIAN REPUBLIC OF",South America
151,VN,VNM,Vietnam,[VNM_MBR_2020],VNM_MBR_2020,VIET NAM,Asia
152,VI,VIR,Virgin Islands,[VIR_MBR_2020],VIR_MBR_2020,UNITED STATES VIRGIN ISLANDS,North America


## Creating a `Country`

Before digging into enrichment variables, we need to create a `Country` object instance. A `Country` is created using the ISO3 code displayed in the data frame above along with the corresponding `gis` source.

In [5]:
usa = Country('USA', gis=GIS('pro'))

usa

<Country - United States 2021 ('local')>

#### Country - Explicit `year`

If recalling from the introspection previously, three vintages of data are available on my machine for the USA; 2019, 2020, and 2021. If a model was developed against a specific country and data vintage, being able to specifically reference this data vintage is possible using the `year` parameter.

In [6]:
usa2019 = Country('USA', gis=GIS('pro'), year=2019)

usa2019

<Country - United States 2019 ('local')>

## Enrichment Variables

Discovering enrichment variables available is possible through the `Country` object's `enrich_variables` property.

In [7]:
usa.enrich_variables

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,CHILD_CY,2021 Child Population,AgeDependency,AgeDependency.CHILD_CY,AgeDependency_CHILD_CY
1,WORKAGE_CY,2021 Working-Age Population,AgeDependency,AgeDependency.WORKAGE_CY,AgeDependency_WORKAGE_CY
2,SENIOR_CY,2021 Senior Population,AgeDependency,AgeDependency.SENIOR_CY,AgeDependency_SENIOR_CY
3,CHLDDEP_CY,2021 Child Dependency Ratio,AgeDependency,AgeDependency.CHLDDEP_CY,AgeDependency_CHLDDEP_CY
4,AGEDEP_CY,2021 Age Dependency Ratio,AgeDependency,AgeDependency.AGEDEP_CY,AgeDependency_AGEDEP_CY
...,...,...,...,...,...
17958,MOEMEDYRMV,2019 Median Year Householder Moved In MOE (ACS...,yearmovedin,yearmovedin.MOEMEDYRMV,yearmovedin_MOEMEDYRMV
17959,RELMEDYRMV,2019 Median Year Householder Moved In REL (ACS...,yearmovedin,yearmovedin.RELMEDYRMV,yearmovedin_RELMEDYRMV
17960,ACSOWNER,2019 Owner Households (ACS 5-Yr),yearmovedin,yearmovedin.ACSOWNER,yearmovedin_ACSOWNER
17961,MOEOWNER,2019 Owner Households MOE (ACS 5-Yr),yearmovedin,yearmovedin.MOEOWNER,yearmovedin_MOEOWNER


## Filtering Variables

The usefulness of relevant metadata, especially categorical data, cannot be overstated. Using relevant criteria we can quickly identify variables to use for enrichment. To make this easier, we save the dataframe into an easily accessible variable, `ev`.

In [8]:
ev = usa.enrich_variables

ev.head()

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,CHILD_CY,2021 Child Population,AgeDependency,AgeDependency.CHILD_CY,AgeDependency_CHILD_CY
1,WORKAGE_CY,2021 Working-Age Population,AgeDependency,AgeDependency.WORKAGE_CY,AgeDependency_WORKAGE_CY
2,SENIOR_CY,2021 Senior Population,AgeDependency,AgeDependency.SENIOR_CY,AgeDependency_SENIOR_CY
3,CHLDDEP_CY,2021 Child Dependency Ratio,AgeDependency,AgeDependency.CHLDDEP_CY,AgeDependency_CHLDDEP_CY
4,AGEDEP_CY,2021 Age Dependency Ratio,AgeDependency,AgeDependency.AGEDEP_CY,AgeDependency_AGEDEP_CY


### Get Current Income Metrics

Since a Pandas DataFrame, finding income indicies for use in analysis is relatively straightforward.

In [9]:
inc_vars = ev[
    (ev.alias.str.lower().str.contains('income'))
    & (ev.name.str.endswith('CY'))
].drop_duplicates('name').reset_index(drop=True)

inc_vars

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,MEDHINC_CY,2021 Median Household Income,Health,Health.MEDHINC_CY,Health_MEDHINC_CY
1,AVGIA55UCY,2021 Avg HH Income: HHr 55+,Age_50_Profile_rep,Age_50_Profile_rep.AVGIA55UCY,Age_50_Profile_rep_AVGIA55UCY
2,IA55UBASCY,2021 HH Income Base: HHr 55+,Age_50_Profile_rep,Age_50_Profile_rep.IA55UBASCY,Age_50_Profile_rep_IA55UBASCY
3,AVGHINC_CY,2021 Average Household Income,AtRisk,AtRisk.AVGHINC_CY,AtRisk_AVGHINC_CY
4,HINC0_CY,2021 HH Income <$15000,Policy,Policy.HINC0_CY,Policy_HINC0_CY
5,HINC15_CY,2021 HH Income $15000-24999,Policy,Policy.HINC15_CY,Policy_HINC15_CY
6,HINC25_CY,2021 HH Income $25000-34999,Policy,Policy.HINC25_CY,Policy_HINC25_CY
7,HINC35_CY,2021 HH Income $35000-49999,Policy,Policy.HINC35_CY,Policy_HINC35_CY
8,HINC50_CY,2021 HH Income $50000-74999,Policy,Policy.HINC50_CY,Policy_HINC50_CY
9,HINC75_CY,2021 HH Income $75000-99999,Policy,Policy.HINC75_CY,Policy_HINC75_CY


### Get Current Key Metrics

One of the more common example datasets I use when exploring if an idea is vialbe are the current year key metrics.

In [10]:
kv = ev[
    (ev.name.str.contains('CY'))
    & (ev.data_collection.str.lower().str.contains('key'))
].reset_index(drop=True)

kv

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,TOTPOP_CY,2021 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY
1,GQPOP_CY,2021 Group Quarters Population,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY
2,DIVINDX_CY,2021 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY
3,TOTHH_CY,2021 Total Households,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY
4,AVGHHSZ_CY,2021 Average Household Size,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY
5,MEDHINC_CY,2021 Median Household Income,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY
6,AVGHINC_CY,2021 Average Household Income,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY
7,PCI_CY,2021 Per Capita Income,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY
8,TOTHU_CY,2021 Total Housing Units,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY
9,OWNER_CY,2021 Owner Occupied HUs,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY


## Onto Enrich

From here the next step is enriching data using the retrieved variables. Which values from the retrieved Dataframe need to be used for input into the local (`arcpy.ba.EnrichLayer`) versus the online (`arcgis.geoenrichment.Country.enrich`) enrich methods are slightly different, but can easily be formatted for input into the respective functions.

__NOTE:__ Fortunately, this will not be necessary once the updated `enrich` method gets released, which is already in review. (28 Jan 2022)

### Enrich Variables for Web GIS

If planning to enrich using online, we need the variable `name` column joined by semicolons to use with the `arcgis.geoenrichment.Country.enrich` method.

In [11]:
var_str_gis = ";".join(kv.name)

var_str_gis

'TOTPOP_CY;GQPOP_CY;DIVINDX_CY;TOTHH_CY;AVGHHSZ_CY;MEDHINC_CY;AVGHINC_CY;PCI_CY;TOTHU_CY;OWNER_CY;RENTER_CY;VACANT_CY;MEDVAL_CY;AVGVAL_CY;POPGRW10CY;HHGRW10CY;FAMGRW10CY;POPGRWCYFY;HHGRWCYFY;FAMGRWCYFY;MHIGRWCYFY;PCIGRWCYFY;DPOP_CY;DPOPWRK_CY;DPOPRES_CY'

### Enrich Variables for Local

...and if planning to enrich using local (ArcGIS Pro) in the `arcpy.ba.EnrichLayer` method, we need the variable `enrich_name` column joined by semicolon.

In [12]:
var_str_lcl = ';'.join(kv.enrich_name)

var_str_lcl

'KeyUSFacts.TOTPOP_CY;KeyUSFacts.GQPOP_CY;KeyUSFacts.DIVINDX_CY;KeyUSFacts.TOTHH_CY;KeyUSFacts.AVGHHSZ_CY;KeyUSFacts.MEDHINC_CY;KeyUSFacts.AVGHINC_CY;KeyUSFacts.PCI_CY;KeyUSFacts.TOTHU_CY;KeyUSFacts.OWNER_CY;KeyUSFacts.RENTER_CY;KeyUSFacts.VACANT_CY;KeyUSFacts.MEDVAL_CY;KeyUSFacts.AVGVAL_CY;KeyUSFacts.POPGRW10CY;KeyUSFacts.HHGRW10CY;KeyUSFacts.FAMGRW10CY;KeyUSFacts.POPGRWCYFY;KeyUSFacts.HHGRWCYFY;KeyUSFacts.FAMGRWCYFY;KeyUSFacts.MHIGRWCYFY;KeyUSFacts.PCIGRWCYFY;KeyUSFacts.DPOP_CY;KeyUSFacts.DPOPWRK_CY;KeyUSFacts.DPOPRES_CY'