# Modeling Modlule

The modeling module provides a single Python API for data scientists to take advantage of the capabilities of ArcGIS as part of demographic geographic data science workflows.

This first cell is largely a bunch of bubblegum and duct tape tying all the things together so I can use this notebook for prototyping and testing.

In [1]:
import importlib
import os
from pathlib import Path
import re
import sys

from dotenv import load_dotenv, find_dotenv
import pandas as pd

# load the "autoreload" extension so that code can change, & always reload modules so that as you change code in src, it gets loaded
%load_ext autoreload
%autoreload 2

# load environment variables from .env
load_dotenv(find_dotenv())

dir_src = Path.cwd().parent.parent.parent/'geosaurus'/'src'

sys.path.insert(0, str(dir_src))

Next, this is where things start looking more like they should. This is where we import the `GIS` object so we can connect to ArcGIS Online and ArcGIS Enterprise environments. We also import the `Country` object and the `get_countries` method from the new `modeling` module.

In [2]:
from arcgis.gis import GIS
from arcgis.modeling import get_countries, Country

## Working Locally

If working in an environment with ArcGIS Pro with Business Analyst and local data, we can introspectively interrogate the local environment to see what is available to work with. As you can see, on my machine, I have a couple of countres with a couple of years worth of data.

In [3]:
get_countries()

Unnamed: 0,iso2,iso3,country_name,vintage,country_id,data_source_id
0,CA,CAN,Canada,2019,CAN_ESRI_2019,LOCAL;;CAN_ESRI_2019
1,US,USA,United States,2019,USA_ESRI_2019,LOCAL;;USA_ESRI_2019
2,US,USA,United States,2020,USA_ESRI_2020,LOCAL;;USA_ESRI_2020


Next, we can instantiate a `Country` object, which will be used for most of the rest of the workflow using the `iso3` identifier. When working with local data, a year can also be optionally specified to work with a specific vintage of data. If not specified, the default is the most current year.

In [4]:
can = Country('CAN')

can

<modeling.Country - CAN (local 2019)>

Next, is is extremely useful to be able to discover the heirarchial geographic levels in a country.

In [5]:
can.levels

Unnamed: 0,geo_name,geo_alias,col_id,col_name,feature_class_path
0,disseminationareas,DisseminationAreas,ID,NAME,D:\arcgis\ba_data\can_2019\Data\Demographic Da...
1,censustracts,CensusTracts,ID,NAME,D:\arcgis\ba_data\can_2019\Data\Demographic Da...
2,fsas,FSAs,ID,NAME,D:\arcgis\ba_data\can_2019\Data\Demographic Da...
3,censussubdivisions,CensusSubdivisions,ID,NAME,D:\arcgis\ba_data\can_2019\Data\Demographic Da...
4,feds,FEDs,ID,NAME,D:\arcgis\ba_data\can_2019\Data\Demographic Da...
5,cmacas,CMACAs,ID,NAME,D:\arcgis\ba_data\can_2019\Data\Demographic Da...
6,censusdivisions,CensusDivisions,ID,NAME,D:\arcgis\ba_data\can_2019\Data\Demographic Da...
7,provinceterritories,ProvinceTerritories,ID,NAME,D:\arcgis\ba_data\can_2019\Data\Demographic Da...
8,country,Country,ID,NAME,D:\arcgis\ba_data\can_2019\Data\Demographic Da...


Also, access to discover the enrichment variables along with all the ways to reference these variables is extremely useful.

In [6]:
can_ev = can.enrich_variables

can_ev

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,A16AITFNAT,2016 First Nations Single Ident,AboriginalIdentity,AboriginalIdentity.A16AITFNAT,AboriginalIdentity_A16AITFNAT
1,A16AITIDT,2016 Aboriginal Identity,AboriginalIdentity,AboriginalIdentity.A16AITIDT,AboriginalIdentity_A16AITIDT
2,A16AITIDX,2016 Aboriginal Identities,AboriginalIdentity,AboriginalIdentity.A16AITIDX,AboriginalIdentity_A16AITIDX
3,A16AITINUK,2016 Inuk Single Identity,AboriginalIdentity,AboriginalIdentity.A16AITINUK,AboriginalIdentity_A16AITINUK
4,A16AITMETI,2016 Metis Single Identity,AboriginalIdentity,AboriginalIdentity.A16AITMETI,AboriginalIdentity_A16AITMETI
...,...,...,...,...,...
4332,ECYVISKOR,2019 VM: Korean,VisibleMinorityStatus,VisibleMinorityStatus.ECYVISKOR,VisibleMinorityStatus_ECYVISKOR
4333,ECYVISJAPA,2019 VM: Japanese,VisibleMinorityStatus,VisibleMinorityStatus.ECYVISJAPA,VisibleMinorityStatus_ECYVISJAPA
4334,ECYVISOVM,2019 VM: All Other VM,VisibleMinorityStatus,VisibleMinorityStatus.ECYVISOVM,VisibleMinorityStatus_ECYVISOVM
4335,ECYVISMVM,2019 VM: Multiple VM,VisibleMinorityStatus,VisibleMinorityStatus.ECYVISMVM,VisibleMinorityStatus_ECYVISMVM


For instance, if we are only interested in key variables for the current observed year (2019), we can reduce our subset to just these variables. Typically, this is a very useful way to get started and see what type of results a preliminary exploratory modeling effort can discover.

In [7]:
can_ev[
    (can_ev.data_collection.str.lower().str.contains('key'))  # get the key variables
    & (can_ev.name.str.startswith('ECY'))                     # just current year (2019) variables
].reset_index(drop=True)

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,ECYDAYPOP,2019 Daytime Pop Total Pop,KeyCanFacts,KeyCanFacts.ECYDAYPOP,KeyCanFacts_ECYDAYPOP
1,ECYWORKPOP,2019 Daytime Pop at Work,KeyCanFacts,KeyCanFacts.ECYWORKPOP,KeyCanFacts_ECYWORKPOP
2,ECYHOMEPOP,2019 Daytime Pop at Home,KeyCanFacts,KeyCanFacts.ECYHOMEPOP,KeyCanFacts_ECYHOMEPOP
3,ECYPTAPOP,2019 Total Population,KeyCanFacts,KeyCanFacts.ECYPTAPOP,KeyCanFacts_ECYPTAPOP
4,ECYCFSCF,2019 Total Census Families,KeyCanFacts,KeyCanFacts.ECYCFSCF,KeyCanFacts_ECYCFSCF
5,ECYHSZHHD,2019 HH Size - Total HHs,KeyCanFacts,KeyCanFacts.ECYHSZHHD,KeyCanFacts_ECYHSZHHD
6,ECYHNIAVG,2019 HH Inc Average Curr$,KeyCanFacts,KeyCanFacts.ECYHNIAVG,KeyCanFacts_ECYHNIAVG
7,ECYHNIMED,2019 HH Inc Median Curr$,KeyCanFacts,KeyCanFacts.ECYHNIMED,KeyCanFacts_ECYHNIMED
8,ECYTENHHD,2019 Tenure - Total HHs,KeyCanFacts,KeyCanFacts.ECYTENHHD,KeyCanFacts_ECYTENHHD
9,ECYTENOWN,2019 Tenure - Owned,KeyCanFacts,KeyCanFacts.ECYTENOWN,KeyCanFacts_ECYTENOWN


## Using ArcGIS Online

While it is common to work with a primary country locally, for quite a few large corporations, once they have a successful model running for their primary country, typically the United States, then they want to start experimenting with their international markets. The `modeling` module provides a single interface to explore these hypotheticals without having to install the data locally or learn a new API or interface.

Using the `GIS` object instance connected to ArcGIS Online, we can investigate what countries are available.

In [8]:
agol = GIS(
    os.getenv('ESRI_GIS_URL'),
    username=os.getenv('ESRI_GIS_USERNAME'),
    password=os.getenv('ESRI_GIS_PASSWORD')
)

cntry_df = get_countries(agol)

cntry_df

Unnamed: 0,iso2,iso3,country_name,country_id,alt_name,continent
0,AL,ALB,Albania,ALB_MBR_2019,ALBANIA,Europe
1,DZ,DZA,Algeria,DZA_MBR_2019,ALGERIA,Africa
2,AD,AND,Andorra,AND_MBR_2019,ANDORRA,Europe
3,AO,AGO,Angola,AGO_MBR_2019,ANGOLA,Africa
4,AR,ARG,Argentina,ARG_MBR_2020,ARGENTINA,South America
...,...,...,...,...,...,...
131,UY,URY,Uruguay,URY_MBR_2020,URUGUAY,South America
132,UZ,UZB,Uzbekistan,UZB_MBR_2020,UZBEKISTAN,Asia
133,VE,VEN,Venezuela,VEN_MBR_2020,"VENEZUELA, BOLIVARIAN REPUBLIC OF",South America
134,VN,VNM,Vietnam,VNM_MBR_2020,VIET NAM,Asia


With 136 countires available, rather than scanning the entire dataframe, we can simply see if `GBR` the ISO3 code for Great Britan is available.

In [9]:
cntry_df.iso3.str.contains('GBR').any()

True

In [10]:
cntry_df[cntry_df.iso3.str.contains('GBR')]

Unnamed: 0,iso2,iso3,country_name,country_id,alt_name,continent
129,GB,GBR,United Kingdom,GBR_MBR_2019,UNITED KINGDOM,Europe


Since it is, we can create a `Country` object instance just like before, and find the key facts for Great Britian. Obviously each country is going to have a different set of key facts based on what is available. Also, there are a few more columns displaying simply because there are a few more columns available from the REST endpoint than are easily available through local introspection. This, though, is something we are working on to expose locally as well.

In [11]:
gbr = Country('GBR', agol)

gbr_ev = gbr.enrich_variables

gbr_ev[
    (gbr_ev.data_collection.str.lower().str.contains('keyfacts'))
    & (gbr_ev.name.str.endswith('CY'))
].reset_index(drop=True)

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
0,TOTPOP_CY,2019 Total Population,KeyFacts,KeyFacts.TOTPOP_CY,KeyFacts_TOTPOP_CY,2019 Total Population,2019,count
1,POPDENS_CY,2019 Population Density (per sq. km),KeyFacts,KeyFacts.POPDENS_CY,KeyFacts_POPDENS_CY,2019 Population Density (Population per Square...,2019,count
2,POPPRM_CY,2019 Population Per Mill,KeyFacts,KeyFacts.POPPRM_CY,KeyFacts_POPPRM_CY,2019 Population Per Mill,2019,count
3,MALES_CY,2019 Total Male Population,KeyFacts,KeyFacts.MALES_CY,KeyFacts_MALES_CY,2019 Total Male Population,2019,count
4,FEMALES_CY,2019 Total Female Population,KeyFacts,KeyFacts.FEMALES_CY,KeyFacts_FEMALES_CY,2019 Total Female Population,2019,count
5,TOTHH_CY,2019 Total Households,KeyFacts,KeyFacts.TOTHH_CY,KeyFacts_TOTHH_CY,2019 Total Households,2019,count
6,AVGHHSZ_CY,2019 Average Household Size,KeyFacts,KeyFacts.AVGHHSZ_CY,KeyFacts_AVGHHSZ_CY,2019 Average Household Size,2019,count
7,PAGE01_CY,2019 Total Population Age 0-14,KeyFacts,KeyFacts.PAGE01_CY,KeyFacts_PAGE01_CY,2019 Total Population Age 0-14,2019,count
8,PAGE02_CY,2019 Total Population Age 15-29,KeyFacts,KeyFacts.PAGE02_CY,KeyFacts_PAGE02_CY,2019 Total Population Age 15-29,2019,count
9,PAGE03_CY,2019 Total Population Age 30-44,KeyFacts,KeyFacts.PAGE03_CY,KeyFacts_PAGE03_CY,2019 Total Population Age 30-44,2019,count


## ArcGIS Enterprise (Busines Analyst Server)

Within an organization, if more than just a few data scientists are working with the data, it begins to make sense to migrate local analysis workflows to reference Business Analyst Server as part of an ArcGIS Enterprise installation instead of using ArcGIS Pro with Business Analyst or ArcGIS Online. ArcGIS Pro with Busines Analyst simply does not scale out very well technically or fiscally. ArcGIS Online simply does not scale fiscally. Every time enrichment is run, it costs credits. Thus, it makes sense to quickly be able to move to Business Analyst Server to scale out. The modeling module makes this as easy as simply passing in a GIS object instance connected to an ArcGIS Enterprise instance with Business Analyst Server.

In [12]:
prtl = GIS(
    os.getenv('ESRI_PORTAL_URL'),
    username=os.getenv('ESRI_PORTAL_USERNAME'),
    password=os.getenv('ESRI_PORTAL_PASSWORD')
)

get_countries(prtl)

Unnamed: 0,iso2,iso3,country_name,country_id,alt_name,continent
0,US,USA,United States,USA_ESRI_2020,UNITED STATES,North America


In [13]:
usa = Country('USA', prtl)

usa

<modeling.Country - USA (GIS at https://geoai-ent.bd.esri.com/portal/ logged in as jmccune)>

In [14]:
usa_ev = usa.enrich_variables

usa_ev

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
0,AGE0_CY,2020 Population Age <1,1yearincrements,1yearincrements.AGE0_CY,F1yearincrements_AGE0_CY,2020 Total Population Age <1 (Esri),2020,count
1,AGE1_CY,2020 Population Age 1,1yearincrements,1yearincrements.AGE1_CY,F1yearincrements_AGE1_CY,2020 Total Population Age 1 (Esri),2020,count
2,AGE2_CY,2020 Population Age 2,1yearincrements,1yearincrements.AGE2_CY,F1yearincrements_AGE2_CY,2020 Total Population Age 2 (Esri),2020,count
3,AGE3_CY,2020 Population Age 3,1yearincrements,1yearincrements.AGE3_CY,F1yearincrements_AGE3_CY,2020 Total Population Age 3 (Esri),2020,count
4,AGE4_CY,2020 Population Age 4,1yearincrements,1yearincrements.AGE4_CY,F1yearincrements_AGE4_CY,2020 Total Population Age 4 (Esri),2020,count
...,...,...,...,...,...,...,...,...
37,MOEMEDYRMV,2018 Median Year Householder Moved In MOE (ACS...,yearmovedin,yearmovedin.MOEMEDYRMV,yearmovedin_MOEMEDYRMV,2018 Median Year Householder Moved into Unit M...,2014-2018,count
38,RELMEDYRMV,2018 Median Year Householder Moved In REL (ACS...,yearmovedin,yearmovedin.RELMEDYRMV,yearmovedin_RELMEDYRMV,2018 Median Year Householder Moved into Unit R...,2014-2018,count
39,ACSOWNER,2018 Owner Households (ACS 5-Yr),yearmovedin,yearmovedin.ACSOWNER,yearmovedin_ACSOWNER,2018 Owner Households (ACS 5-Yr),2014-2018,count
40,MOEOWNER,2018 Owner Households MOE (ACS 5-Yr),yearmovedin,yearmovedin.MOEOWNER,yearmovedin_MOEOWNER,2018 Owner Households MOE (ACS 5-Yr),2014-2018,count


In [15]:
usa_ev[
    (usa_ev.data_collection.str.lower().str.contains('key'))
    & (usa_ev.name.str.endswith('CY'))
].reset_index(drop=True)

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
0,TOTPOP_CY,2020 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY,2020 Total Population (Esri),2020,count
1,GQPOP_CY,2020 Group Quarters Population,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY,2020 Group Quarters Population (Esri),2020,count
2,DIVINDX_CY,2020 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY,2020 Diversity Index (Esri),2020,count
3,TOTHH_CY,2020 Total Households,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY,2020 Total Households (Esri),2020,count
4,AVGHHSZ_CY,2020 Average Household Size,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY,2020 Average Household Size (Esri),2020,count
5,MEDHINC_CY,2020 Median Household Income,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY,2020 Median Household Income (Esri),2020,currency
6,AVGHINC_CY,2020 Average Household Income,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY,2020 Average Household Income (Esri),2020,currency
7,PCI_CY,2020 Per Capita Income,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY,2020 Per Capita Income (Esri),2020,currency
8,TOTHU_CY,2020 Total Housing Units,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY,2020 Total Housing Units (Esri),2020,count
9,OWNER_CY,2020 Owner Occupied HUs,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY,2020 Owner Occupied Housing Units (Esri),2020,count


## Flexiblity Illustrated

While each of the above examples illustrated different data sources, just to illustrate the ease of moving between these. Here is the same workflow for the United States in ArcGIS Online and with local data.

In [16]:
usa = Country('USA', agol)
usa_ev = usa.enrich_variables
usa_ev[
    (usa_ev.data_collection.str.lower().str.contains('key'))
    & (usa_ev.name.str.endswith('CY'))
].reset_index(drop=True)

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
0,TOTPOP_CY,2020 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY,2020 Total Population (Esri),2020,count
1,GQPOP_CY,2020 Group Quarters Population,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY,2020 Group Quarters Population (Esri),2020,count
2,DIVINDX_CY,2020 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY,2020 Diversity Index (Esri),2020,count
3,TOTHH_CY,2020 Total Households,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY,2020 Total Households (Esri),2020,count
4,AVGHHSZ_CY,2020 Average Household Size,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY,2020 Average Household Size (Esri),2020,count
5,MEDHINC_CY,2020 Median Household Income,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY,2020 Median Household Income (Esri),2020,currency
6,AVGHINC_CY,2020 Average Household Income,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY,2020 Average Household Income (Esri),2020,currency
7,PCI_CY,2020 Per Capita Income,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY,2020 Per Capita Income (Esri),2020,currency
8,TOTHU_CY,2020 Total Housing Units,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY,2020 Total Housing Units (Esri),2020,count
9,OWNER_CY,2020 Owner Occupied HUs,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY,2020 Owner Occupied Housing Units (Esri),2020,count


In [17]:
usa = Country('USA', 'local')
usa_ev = usa.enrich_variables
usa_ev[
    (usa_ev.data_collection.str.lower().str.contains('key'))
    & (usa_ev.name.str.endswith('CY'))
].reset_index(drop=True)

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name
0,TOTPOP_CY,2020 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY
1,GQPOP_CY,2020 Group Quarters Population,KeyUSFacts,KeyUSFacts.GQPOP_CY,KeyUSFacts_GQPOP_CY
2,DIVINDX_CY,2020 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY
3,TOTHH_CY,2020 Total Households,KeyUSFacts,KeyUSFacts.TOTHH_CY,KeyUSFacts_TOTHH_CY
4,AVGHHSZ_CY,2020 Average Household Size,KeyUSFacts,KeyUSFacts.AVGHHSZ_CY,KeyUSFacts_AVGHHSZ_CY
5,MEDHINC_CY,2020 Median Household Income,KeyUSFacts,KeyUSFacts.MEDHINC_CY,KeyUSFacts_MEDHINC_CY
6,AVGHINC_CY,2020 Average Household Income,KeyUSFacts,KeyUSFacts.AVGHINC_CY,KeyUSFacts_AVGHINC_CY
7,PCI_CY,2020 Per Capita Income,KeyUSFacts,KeyUSFacts.PCI_CY,KeyUSFacts_PCI_CY
8,TOTHU_CY,2020 Total Housing Units,KeyUSFacts,KeyUSFacts.TOTHU_CY,KeyUSFacts_TOTHU_CY
9,OWNER_CY,2020 Owner Occupied HUs,KeyUSFacts,KeyUSFacts.OWNER_CY,KeyUSFacts_OWNER_CY
