# Modeling Modlule

The modeling module provides a single Python API for data scientists to take advantage of the capabilities of ArcGIS as part of demographic geographic data science workflows.

This first cell is largely a bunch of bubblegum and duct tape tying all the things together so I can use this notebook for prototyping and testing.

In [1]:
import importlib
import os
from pathlib import Path
import re
import sys

from dotenv import load_dotenv, find_dotenv
import pandas as pd

# load the "autoreload" extension so that code can change, & always reload modules so that as you change code in src, it gets loaded
%load_ext autoreload
%autoreload 2

# load environment variables from .env
load_dotenv(find_dotenv())

dir_src = Path.cwd().parent.parent/'src'

sys.path.insert(0, str(dir_src))

In [2]:
from arcgis.gis import GIS
from modeling import Country

In [3]:
gis_agol = GIS(os.getenv('ESRI_GIS_URL'), username=os.getenv('ESRI_GIS_USERNAME'), password=os.getenv('ESRI_GIS_PASSWORD'))

gis_agol

In [4]:
gis_ent= GIS(os.getenv('ESRI_PORTAL_URL'), username=os.getenv('ESRI_PORTAL_USERNAME'), password=os.getenv('ESRI_PORTAL_PASSWORD'))

gis_ent

In [5]:
ent_usa = Country('USA', gis_ent)

ent_usa

<modeling.Country - USA (GIS at https://geoai-ent.bd.esri.com/portal/ logged in as jmccune)>

In [11]:
%%time
ent_df = ent_usa.cbsas.get('seattle').mdl.counties.get()

ent_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   ID      3 non-null      object  
 1   NAME    3 non-null      object  
 2   SHAPE   3 non-null      geometry
dtypes: geometry(1), object(2)
memory usage: 200.0+ bytes
Wall time: 838 ms


In [12]:
ev = ent_usa.enrich_variables

enrich_variables = ev[
    (ev.data_collection.str.lower().str.contains('key'))  # get the key variables
    & (ev.name.str.endswith('CY'))                     # just current year (2019) variables
].reset_index(drop=True)

enrich_variables.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   name               20 non-null     object
 1   alias              20 non-null     object
 2   data_collection    20 non-null     object
 3   enrich_name        20 non-null     object
 4   enrich_field_name  20 non-null     object
 5   description        20 non-null     object
 6   vintage            20 non-null     object
 7   units              20 non-null     object
dtypes: object(8)
memory usage: 1.4+ KB


In [13]:
enrich_variables.sample(5)

Unnamed: 0,name,alias,data_collection,enrich_name,enrich_field_name,description,vintage,units
2,DIVINDX_CY,2020 Diversity Index,KeyUSFacts,KeyUSFacts.DIVINDX_CY,KeyUSFacts_DIVINDX_CY,2020 Diversity Index (Esri),2020,count
15,HHGRW10CY,2010-2020 Growth Rate: Households,KeyUSFacts,KeyUSFacts.HHGRW10CY,KeyUSFacts_HHGRW10CY,2010-2020 Households: Annual Growth Rate (Esri),2020,pct
0,TOTPOP_CY,2020 Total Population,KeyUSFacts,KeyUSFacts.TOTPOP_CY,KeyUSFacts_TOTPOP_CY,2020 Total Population (Esri),2020,count
12,MEDVAL_CY,2020 Median Home Value,KeyUSFacts,KeyUSFacts.MEDVAL_CY,KeyUSFacts_MEDVAL_CY,2020 Median Home Value (Esri),2020,currency
10,RENTER_CY,2020 Renter Occupied HUs,KeyUSFacts,KeyUSFacts.RENTER_CY,KeyUSFacts_RENTER_CY,2020 Renter Occupied Housing Units (Esri),2020,count


In [14]:
%%time
out_df = ent_df.mdl.enrich(enrich_variables)

out_df

Wall time: 2.44 s


Unnamed: 0,ID,NAME,TOTPOP_CY,GQPOP_CY,DIVINDX_CY,TOTHH_CY,AVGHHSZ_CY,MEDHINC_CY,AVGHINC_CY,PCI_CY,...,VACANT_CY,MEDVAL_CY,AVGVAL_CY,POPGRW10CY,HHGRW10CY,FAMGRW10CY,DPOP_CY,DPOPWRK_CY,DPOPRES_CY,SHAPE
0,53053,Pierce County,917565,20565,60.3,342092,2.62,77326,99077,37149,...,23485,343546,417408,1.41,1.29,1.24,884081,376219,507862,"{""rings"": [[[-122.41766497932484, 47.320186386..."
1,53033,King County,2271785,41334,65.4,924539,2.41,100598,135093,55065,...,61820,614306,716039,1.6,1.56,1.49,2381091,1239788,1141303,"{""rings"": [[[-121.36751200012111, 47.780137999..."
2,53061,Snohomish County,834034,9847,56.9,311214,2.65,89662,111835,41812,...,15524,438450,509596,1.54,1.46,1.41,779826,340646,439180,"{""rings"": [[[-121.68612849977673, 48.298988999..."


In [19]:
url = f'{gis_ent.properties.helperServices("geoenrichment").url}/Geoenrichment/ServiceLimits'

url

'https://geoai-ent.bd.esri.com/baserver/rest/services/World/GeoEnrichmentServer/Geoenrichment/ServiceLimits'

In [26]:
res = gis_ent._con.get(url)

max_record_count = [v['value'] for v in res['serviceLimits']['value'] if v['paramName'] == 'maxRecordCount'][0]

1000