# Business Analyst Introspection

The first step in using ArcGIS Business Analyst for data enrichment is figuring out what is possible, what you *can* do. This begins with selecting a soruce, figuring out which countries are available from the given source, and discovering what variables are available in the country you are interested in working in.

## The `BusinessAnalyst` object

Business Analyst functionality can be accessed throughout the ArcGIS Platform from a variety of sources. These sources are either locally or through a Web GIS. The `BusinessAnalyst` object, when instantiated, selects the source automatically or honors the `source` input parameter to set it explicitly.

In [1]:
from arcgis.geoenrichment import BusinessAnalyst as BA

### Local Business Analyst Source

The `BusinessAnalyst` object can explicitly be set to reference local resources using the `local` keyword for the first input parameter, `source`. Utilizing local resources requires the Python environment to have `arcpy` installed referencing ArcGIS Pro with Business Analyst and local data for at least one Country installed.

In [2]:
ba_lcl = BA('local')

ba_lcl

<BusinessAnalyst (local)>

### Web GIS `BusinessAnalyst` Source

A Web GIS can also be used with the `BusinessAnalyst` object in the form of a `GIS` object instance enabling access to either an Enterprise GIS or ArcGIS Online. If using an Enterprise GIS, Enrichment must be configured in the settings. This can either reference ArcGIS Online or a configured Business Analyst server.

In [3]:
from arcgis.gis import GIS
gis = GIS()  # connecting to ArcGIS Online as an anonymous user

ba_gis = BA(gis)

ba_gis

<BusinessAnalyst (GIS @ https://www.arcgis.com version:9.1)>

### Implicit `BusinessAnalyst` Source

When instantiating the `BusinessAnalyst` object, it is not imperative to explicitly set the `source`. If left blank, it will attempt to set the source to `local` if Pro with Business Analyst is available, or utlize the `active_gis` if this has been set in the Python session.

In [4]:
ba_impl = BA()  # since I have Pro with BA, Enrichment defaults to local

ba_impl

<BusinessAnalyst (local)>

## Discovering Countries

Both configurations enable discovering what countries are available using idential syntax.

### Local Country Introspection

In [5]:
ba_lcl.countries

Unnamed: 0,iso2,iso3,country_name,vintage,country_id,data_source_id
0,CA,CAN,Canada,2020,CAN_ESRI_2019,LOCAL;;CAN_ESRI_2019
1,US,USA,United States,2019,USA_ESRI_2019,LOCAL;;USA_ESRI_2019
2,US,USA,United States,2020,USA_ESRI_2020,LOCAL;;USA_ESRI_2020


### Web GIS Country Introspection

In [6]:
ba_gis.countries

Unnamed: 0,iso2,iso3,country_name,country_id,alt_name,continent
0,AL,ALB,Albania,ALB_MBR_2019,ALBANIA,Europe
1,DZ,DZA,Algeria,DZA_MBR_2019,ALGERIA,Africa
2,AD,AND,Andorra,AND_MBR_2019,ANDORRA,Europe
3,AO,AGO,Angola,AGO_MBR_2019,ANGOLA,Africa
4,AR,ARG,Argentina,ARG_MBR_2020,ARGENTINA,South America
...,...,...,...,...,...,...
131,UY,URY,Uruguay,URY_MBR_2020,URUGUAY,South America
132,UZ,UZB,Uzbekistan,UZB_MBR_2020,UZBEKISTAN,Asia
133,VE,VEN,Venezuela,VEN_MBR_2020,"VENEZUELA, BOLIVARIAN REPUBLIC OF",South America
134,VN,VNM,Vietnam,VNM_MBR_2020,VIET NAM,Asia


## Enrich Variable Introspection

Currently the `Country` object is how most data is organized in Business Analyst. Consequently, before discovering what variables are available for enrichment, a country must be retrieved from the instantiated `BusinessAnalyst` object. The syntax is idential whether using `local` or a`GIS` object instance as the source.

### Local Country Instantiation

In [7]:
usa_lcl = ba_lcl.get_country('USA')

usa_lcl

<Country - USA 2020 (local)>

### Web GIS Country Introspection

In [8]:
usa_gis = ba_gis.get_country('USA')

usa_gis

<Country - USA (GIS @ https://www.arcgis.com version:9.1)>

### Local Coutntry with Year Instantiation

With a local source, clients also periodically a need to explicitly access older data to use with models built against a specific year's data. Explicitly setting the year is supported using the `year` parameter.

In [9]:
usa_lcl_2019 = ba_lcl.get_country('USA', year=2019)

usa_lcl_2019

<Country - USA 2019 (local)>

## Enrich Variable Introspection

Once a country is instantiated, data available for enrichment can be discovered through the `Country` object instance. Whether the source is local or a GIS instance, the method is identical.

### Local Variable Introspection

In [10]:
usa_lcl.variables

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,CHILD_CY,2020 Child Population,AgeDependency,AgeDependency.CHILD_CY,AgeDependency_CHILD_CY
1,WORKAGE_CY,2020 Working-Age Population,AgeDependency,AgeDependency.WORKAGE_CY,AgeDependency_WORKAGE_CY
2,SENIOR_CY,2020 Senior Population,AgeDependency,AgeDependency.SENIOR_CY,AgeDependency_SENIOR_CY
3,CHLDDEP_CY,2020 Child Dependency Ratio,AgeDependency,AgeDependency.CHLDDEP_CY,AgeDependency_CHLDDEP_CY
4,AGEDEP_CY,2020 Age Dependency Ratio,AgeDependency,AgeDependency.AGEDEP_CY,AgeDependency_AGEDEP_CY
...,...,...,...,...,...
16869,MOEMEDYRMV,2018 Median Year Householder Moved In MOE (ACS...,yearmovedin,yearmovedin.MOEMEDYRMV,yearmovedin_MOEMEDYRMV
16870,RELMEDYRMV,2018 Median Year Householder Moved In REL (ACS...,yearmovedin,yearmovedin.RELMEDYRMV,yearmovedin_RELMEDYRMV
16871,ACSOWNER,2018 Owner Households (ACS 5-Yr),yearmovedin,yearmovedin.ACSOWNER,yearmovedin_ACSOWNER
16872,MOEOWNER,2018 Owner Households MOE (ACS 5-Yr),yearmovedin,yearmovedin.MOEOWNER,yearmovedin_MOEOWNER


### Web GIS Variable Introspection

In [11]:
usa_gis.variables

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
0,AGE0_CY,2020 Population Age <1,1yearincrements,1yearincrements.AGE0_CY,F1yearincrements_AGE0_CY,2020 Total Population Age <1 (Esri),2020,count
1,AGE1_CY,2020 Population Age 1,1yearincrements,1yearincrements.AGE1_CY,F1yearincrements_AGE1_CY,2020 Total Population Age 1 (Esri),2020,count
2,AGE2_CY,2020 Population Age 2,1yearincrements,1yearincrements.AGE2_CY,F1yearincrements_AGE2_CY,2020 Total Population Age 2 (Esri),2020,count
3,AGE3_CY,2020 Population Age 3,1yearincrements,1yearincrements.AGE3_CY,F1yearincrements_AGE3_CY,2020 Total Population Age 3 (Esri),2020,count
4,AGE4_CY,2020 Population Age 4,1yearincrements,1yearincrements.AGE4_CY,F1yearincrements_AGE4_CY,2020 Total Population Age 4 (Esri),2020,count
...,...,...,...,...,...,...,...,...
37,MOEMEDYRMV,2019 Median Year Householder Moved In MOE (ACS...,yearmovedin,yearmovedin.MOEMEDYRMV,yearmovedin_MOEMEDYRMV,2019 Median Year Householder Moved into Unit M...,2015-2019,count
38,RELMEDYRMV,2019 Median Year Householder Moved In REL (ACS...,yearmovedin,yearmovedin.RELMEDYRMV,yearmovedin_RELMEDYRMV,2019 Median Year Householder Moved into Unit R...,2015-2019,count
39,ACSOWNER,2019 Owner Households (ACS 5-Yr),yearmovedin,yearmovedin.ACSOWNER,yearmovedin_ACSOWNER,2019 Owner Households (ACS 5-Yr),2015-2019,count
40,MOEOWNER,2019 Owner Households MOE (ACS 5-Yr),yearmovedin,yearmovedin.MOEOWNER,yearmovedin_MOEOWNER,2019 Owner Households MOE (ACS 5-Yr),2015-2019,count


## Selecting Variables for Enrichment

Since a Pandas DataFrame, this facilitates quick discovery of different variable combinations based on analysis needs.

### Unique Variables

Since variables can be repeated due to being used in multiple Data Collections, we can easily remove duplicates using the functionality of the DataFrame.

In [12]:
uniq_df = usa_lcl.variables.drop_duplicates('name').reset_index(drop=True)

uniq_df

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,CHILD_CY,2020 Child Population,AgeDependency,AgeDependency.CHILD_CY,AgeDependency_CHILD_CY
1,WORKAGE_CY,2020 Working-Age Population,AgeDependency,AgeDependency.WORKAGE_CY,AgeDependency_WORKAGE_CY
2,SENIOR_CY,2020 Senior Population,AgeDependency,AgeDependency.SENIOR_CY,AgeDependency_SENIOR_CY
3,CHLDDEP_CY,2020 Child Dependency Ratio,AgeDependency,AgeDependency.CHLDDEP_CY,AgeDependency_CHLDDEP_CY
4,AGEDEP_CY,2020 Age Dependency Ratio,AgeDependency,AgeDependency.AGEDEP_CY,AgeDependency_AGEDEP_CY
...,...,...,...,...,...
14660,RELRMV1989,2018 RHHs/Moved In: 1989/Before REL (ACS 5-Yr),yearmovedin,yearmovedin.RELRMV1989,yearmovedin_RELRMV1989
14661,ACSMEDYRMV,2018 Median Year Householder Moved In (ACS 5-Yr),yearmovedin,yearmovedin.ACSMEDYRMV,yearmovedin_ACSMEDYRMV
14662,MOEMEDYRMV,2018 Median Year Householder Moved In MOE (ACS...,yearmovedin,yearmovedin.MOEMEDYRMV,yearmovedin_MOEMEDYRMV
14663,RELMEDYRMV,2018 Median Year Householder Moved In REL (ACS...,yearmovedin,yearmovedin.RELMEDYRMV,yearmovedin_RELMEDYRMV


### Unique Current Year Variables

One of the things I like to do is grab all the current year demographics. Due to the naming convention, all of these variables' names end with `CY`. This enables us to quickly find them.

In [19]:
cy_df = uniq_df[uniq_df['name'].str.endswith('CY')].drop_duplicates('name').reset_index(drop=True)

cy_df

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,CHILD_CY,2020 Child Population,AgeDependency,AgeDependency.CHILD_CY,AgeDependency_CHILD_CY
1,WORKAGE_CY,2020 Working-Age Population,AgeDependency,AgeDependency.WORKAGE_CY,AgeDependency_WORKAGE_CY
2,SENIOR_CY,2020 Senior Population,AgeDependency,AgeDependency.SENIOR_CY,AgeDependency_SENIOR_CY
3,CHLDDEP_CY,2020 Child Dependency Ratio,AgeDependency,AgeDependency.CHLDDEP_CY,AgeDependency_CHLDDEP_CY
4,AGEDEP_CY,2020 Age Dependency Ratio,AgeDependency,AgeDependency.AGEDEP_CY,AgeDependency_AGEDEP_CY
...,...,...,...,...,...
1318,NHSPASN_CY,2020 Non-Hispanic Asian Pop,raceandhispanicorigin,raceandhispanicorigin.NHSPASN_CY,raceandhispanicorigin_NHSPASN_CY
1319,NHSPPI_CY,2020 Non-Hispanic Pacific Islander Pop,raceandhispanicorigin,raceandhispanicorigin.NHSPPI_CY,raceandhispanicorigin_NHSPPI_CY
1320,NHSPOTH_CY,2020 Non-Hispanic Other Race Pop,raceandhispanicorigin,raceandhispanicorigin.NHSPOTH_CY,raceandhispanicorigin_NHSPOTH_CY
1321,NHSPMLT_CY,2020 Non-Hispanic Multiple Race Pop,raceandhispanicorigin,raceandhispanicorigin.NHSPMLT_CY,raceandhispanicorigin_NHSPMLT_CY


### Current Year Sample Variables

Frequently, when quickly demonstrating analysis ad-hoc, I take advantage of Data Collections to grab a few useful current year variables.

In [18]:
var_df = usa_lcl.variables

smpl_df = var_df[
    (var_df['name'].str.endswith('CY'))                  # current year
    & (var_df['data_collection'].str.startswith('Key'))  # key variables
].reset_index(drop=True)

smpl_df

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,TOTPOP_CY,2020 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY
1,GQPOP_CY,2020 Group Quarters Population,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY
2,DIVINDX_CY,2020 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY
3,TOTHH_CY,2020 Total Households,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY
4,AVGHHSZ_CY,2020 Average Household Size,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY
5,MEDHINC_CY,2020 Median Household Income,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY
6,AVGHINC_CY,2020 Average Household Income,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY
7,PCI_CY,2020 Per Capita Income,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY
8,TOTHU_CY,2020 Total Housing Units,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY
9,OWNER_CY,2020 Owner Occupied HUs,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY


## Enrichment

There are two ways to access enrichment from Python. The ArcGIS Pro Geoprocessing Enrich Layer tool enables enrichment data using local resources. The Python API already includes a wrapper around the enrich REST endpoint to use a Web GIS. Each method requires the variable input in a slightly different format.

### Local Enrichment

Enrichment varialbes must be concantenated into a single semicolon separated string for input into the Enrich Layer Geoprocessing tool. This can be accommplished using a little string concantenation in Python.

In [20]:
enrich_str = ';'.join(smpl_df.enrich_name)

print(enrich_str)

KeyUSFacts.TOTPOP_CY;KeyUSFacts.GQPOP_CY;KeyUSFacts.DIVINDX_CY;KeyUSFacts.TOTHH_CY;KeyUSFacts.AVGHHSZ_CY;KeyUSFacts.MEDHINC_CY;KeyUSFacts.AVGHINC_CY;KeyUSFacts.PCI_CY;KeyUSFacts.TOTHU_CY;KeyUSFacts.OWNER_CY;KeyUSFacts.RENTER_CY;KeyUSFacts.VACANT_CY;KeyUSFacts.MEDVAL_CY;KeyUSFacts.AVGVAL_CY;KeyUSFacts.POPGRW10CY;KeyUSFacts.HHGRW10CY;KeyUSFacts.FAMGRW10CY;KeyUSFacts.DPOP_CY;KeyUSFacts.DPOPWRK_CY;KeyUSFacts.DPOPRES_CY


### Web GIS Enrichment

Enriching using the Python API calling the rest endpoint requires slightly different syntax, a semicolon separated list of variable names.

In [21]:
enrich_str = ';'.join(smpl_df.name)

print(enrich_str)

TOTPOP_CY;GQPOP_CY;DIVINDX_CY;TOTHH_CY;AVGHHSZ_CY;MEDHINC_CY;AVGHINC_CY;PCI_CY;TOTHU_CY;OWNER_CY;RENTER_CY;VACANT_CY;MEDVAL_CY;AVGVAL_CY;POPGRW10CY;HHGRW10CY;FAMGRW10CY;DPOP_CY;DPOPWRK_CY;DPOPRES_CY


## Next Step - Enrich Method

The next step is adding `enrich` method to support both local and remote resources, and handle the variable introspection dataframes as input. This enables a single workflow for discovering what data is available and retrieving the data whether the data is being retrieved from a local or Web GIS source.

In [None]:
# this does not run, but will soon...
in_pth = 'C:/path/to/data.gdb/features'
in_df = pd.DataFrame.spatial.from_featureclass(in_pth)

# local enrichment
enrich_lcl_df = usa_lcl.enrich(in_df, key_vars)

# web gis enrichment
enrich_gis_df = usa_gis.enrich(in_df, key_vars)