# Authenticate & Example API Call (geocode)
An example of authenticating using UMICH license and performing a basic geocode operation using the Python API.

Requires:

- ArcGIS Pro installed
- Logged into ArcGIS Pro using organization account (UMICH)
- ArcGIS Pro does not need to be running; your authentication just has to not have expired.

## Import required modules and packages

In [1]:
# Required modules
import arcpy
import pandas as pd

# ArcGIS packages
from arcgis.gis import GIS
from arcgis.geocoding import get_geocoders, batch_geocode, geocode
gis = GIS('pro')

## EDA of Data

### Read & Examine Data
Using the Tecumseh, MI data from OpenAddresses data set (midwest data set).

In [2]:
df_mi_tecumseh = pd.read_csv("./example-data/city_of_tecumseh.csv", keep_default_na=False)
print("Initial dataframe shape:", df_mi_tecumseh.shape)

Initial dataframe shape: (3706, 11)


In [3]:
df_mi_tecumseh.head()

Unnamed: 0,LON,LAT,NUMBER,STREET,UNIT,CITY,DISTRICT,REGION,POSTCODE,ID,HASH
0,-83.930694,42.020842,4587,MACON RD,,,,,,,0f1d71c71bb84f25
1,-83.93137,42.020581,952,MACON RD,,,,,,,e063eaa401c7e1d5
2,-83.932255,42.020563,944,MACON RD,,,,,,,c6b1c6f444a0684c
3,-83.933133,42.020526,940,MACON RD,,,,,,,83e2941141f61fba
4,-83.9356,42.020365,700,MACON RD BLK,,,,,,,40ffdb3c081d2003


### Add state variable with default value

In [4]:
df_mi_tecumseh['State'] = "MI"
df_mi_tecumseh.head()

Unnamed: 0,LON,LAT,NUMBER,STREET,UNIT,CITY,DISTRICT,REGION,POSTCODE,ID,HASH,State
0,-83.930694,42.020842,4587,MACON RD,,,,,,,0f1d71c71bb84f25,MI
1,-83.93137,42.020581,952,MACON RD,,,,,,,e063eaa401c7e1d5,MI
2,-83.932255,42.020563,944,MACON RD,,,,,,,c6b1c6f444a0684c,MI
3,-83.933133,42.020526,940,MACON RD,,,,,,,83e2941141f61fba,MI
4,-83.9356,42.020365,700,MACON RD BLK,,,,,,,40ffdb3c081d2003,MI


In [5]:
df_mi_tecumseh.shape

(3706, 12)

### Populate CITY
We'll assume that the city column values should be Tecumseh.

In [6]:
df_mi_tecumseh['CITY'] = 'Tecumseh'
df_mi_tecumseh.head()

Unnamed: 0,LON,LAT,NUMBER,STREET,UNIT,CITY,DISTRICT,REGION,POSTCODE,ID,HASH,State
0,-83.930694,42.020842,4587,MACON RD,,Tecumseh,,,,,0f1d71c71bb84f25,MI
1,-83.93137,42.020581,952,MACON RD,,Tecumseh,,,,,e063eaa401c7e1d5,MI
2,-83.932255,42.020563,944,MACON RD,,Tecumseh,,,,,c6b1c6f444a0684c,MI
3,-83.933133,42.020526,940,MACON RD,,Tecumseh,,,,,83e2941141f61fba,MI
4,-83.9356,42.020365,700,MACON RD BLK,,Tecumseh,,,,,40ffdb3c081d2003,MI


### Simplify address data
Drop columns that aren't needed for geocoding.

In [7]:
df_mi_tecumseh.drop(columns=['LON', 'LAT', 'UNIT', 'DISTRICT', 'REGION', 'ID', 'HASH'], inplace=True)
df_mi_tecumseh.head()

Unnamed: 0,NUMBER,STREET,CITY,POSTCODE,State
0,4587,MACON RD,Tecumseh,,MI
1,952,MACON RD,Tecumseh,,MI
2,944,MACON RD,Tecumseh,,MI
3,940,MACON RD,Tecumseh,,MI
4,700,MACON RD BLK,Tecumseh,,MI


Create an address column from the `NUMBER` and `STREET` columns. This is how the geocoder is expecting to get the address information.

In [8]:
df_mi_tecumseh['Address'] = df_mi_tecumseh['NUMBER'] + ' ' + df_mi_tecumseh['STREET']
df_mi_tecumseh.head()

Unnamed: 0,NUMBER,STREET,CITY,POSTCODE,State,Address
0,4587,MACON RD,Tecumseh,,MI,4587 MACON RD
1,952,MACON RD,Tecumseh,,MI,952 MACON RD
2,944,MACON RD,Tecumseh,,MI,944 MACON RD
3,940,MACON RD,Tecumseh,,MI,940 MACON RD
4,700,MACON RD BLK,Tecumseh,,MI,700 MACON RD BLK


Reorder the dataframe and remove `NUMBER` and `STREET` columns.

In [9]:
df_mi_tecumseh = df_mi_tecumseh[['Address', 'CITY', 'POSTCODE', 'State']]
df_mi_tecumseh.head()

Unnamed: 0,Address,CITY,POSTCODE,State
0,4587 MACON RD,Tecumseh,,MI
1,952 MACON RD,Tecumseh,,MI
2,944 MACON RD,Tecumseh,,MI
3,940 MACON RD,Tecumseh,,MI
4,700 MACON RD BLK,Tecumseh,,MI


## Sample 100 addresses to geocode

In [10]:
df_mi_tecumseh_100 = df_mi_tecumseh.sample(n=100, replace=True, random_state=1)
print("Sampled dataframe shape:", df_mi_tecumseh_100.shape)

Sampled dataframe shape: (100, 4)


In [11]:
df_mi_tecumseh_100.head()

Unnamed: 0,Address,CITY,POSTCODE,State
1061,315 W SHAWNEE ST,Tecumseh,,MI
235,907 N UNION ST,Tecumseh,,MI
1096,301 W SHAWNEE ST,Tecumseh,,MI
905,404 BURT ST,Tecumseh,,MI
2763,605 ADRIAN ST,Tecumseh,,MI


## Geocode Sample Addresses
Use the Esri World Geocoder Server to geocode the sample addresses.

This service has a max and suggested batch size. If the number of addresses exceeds the max batch size, the addresses will need to be processed in multiple batches.

More info: https://developers.arcgis.com/python/guide/batch-geocoding/

Accessing & Creating Content: https://developers.arcgis.com/python/guide/accessing-and-creating-content/

In [12]:
# Use the first of GIS's configured geocoders
geocoder = get_geocoders(gis)[0]
print("Geocoder:", geocoder)

# Display batch size settings for Geocoder
print("MaxBatchSize : " + str(geocoder.properties.locatorProperties.MaxBatchSize))
print("SuggestedBatchSize : " + str(geocoder.properties.locatorProperties.SuggestedBatchSize))

Geocoder: <Geocoder url:"https://geocode.arcgis.com/arcgis/rest/services/World/GeocodeServer">
MaxBatchSize : 1000
SuggestedBatchSize : 150


### Convert to dictionary
The geocoder needs to be passed a list or dictionary.

In [13]:
geocode_addresses = df_mi_tecumseh_100.to_dict('r')

### Geocode addresses

In [14]:
geocode_results = batch_geocode(geocode_addresses)

Load results into a dataframe and examine match scores.

In [15]:
df_geocode_results = pd.DataFrame(data=geocode_results)
df_geocode_results['score'].describe()

count    100.000000
mean      99.777900
std        0.740106
min       95.830000
25%      100.000000
50%      100.000000
75%      100.000000
max      100.000000
Name: score, dtype: float64

### Visual check of geocode results

In [16]:
map = gis.map("Tecumseh, Michigan", 13)
map

MapView(layout=Layout(height='400px', width='100%'), zoom=13.0)

In [17]:
for address in geocode_results:
    map.draw(
        address['location'], 
        symbol = {"angle":0,"xoffset":0,"yoffset":10,"type":"esriPMS","url":"http://static.arcgis.com/images/Symbols/Shapes/RedPin1LargeB.png","contentType":"image/png","width":24,"height":24}
    )