Offline parcel-based geocoding for addresses in Hamilton County, Ohio, USA
This package is designed to geocode address strings using an offline copy of the Cincinnati Area Geographic Information System (CAGIS) master address/parcel files. This drastically improves geocoding accuracy, but is only available for addresses that have a zipcode beginning with 450, 451, or 452, i.e. the Greater Cincinnati Area.
Its major functions include:
- parse an address string into individual components
- converts all letters to lower case
- replaces consecutive whitespace with one whitespace character
- removes non-alphanumeric characters
- takes an address string and returns a tibble of address components
- parcel-based geocoding based on included CAGIS address database
- fuzzy text matching on street will return the best match and its similarity score
- return CAGIS parcel id to link with other extant databases including Hamilton County Auditor
Install the package by running the following in
The package relies on python to parse the address, which means that both
python and the
usaddress python library must be installed (
pip install usaddress). Alternative installation methods are available
depending on your operating system; seek instructions online.
Geocode a single address string using
geocodeCAGIS(). This will
automatically take care of parsing the address string and will return
the latitude and longitude, the CAGIS parcel identifier, and optionally,
some additional diagnostic information regarding the geocoding match
geocodeCAGIS('3333 Burnet Ave, Cincinnati, OH 45229') #> # A tibble: 1 x 5 #> lat lon PARCELID score match #> <dbl> <dbl> <chr> <dbl> <chr> #> 1 39.1 -84.5 010400020052 1 3333 burnet av 45229
Address String Formatting
- An address must be submitted as a single string
- The city and state are optional and not used for geocoding.
- Street number, street name, and zip code must be in the address string.
- Street numbers must be numeric (i.e. “18”, not “Eighteen”).
Geocoding an address string requires a significant amount of
computational time, so it may be useful to memoise results, provide a
progress bar, implement graceful error handling, and/or use parallel
addrs <- c('3333 Burnet Ave, Cincinnati, OH 45229', '222 E Central Pkwy, Cincinnati, OH 45202', '1600 Pennsylvania Ave NW Washington, DC 20500') CB::mappp(addrs, geocodeCAGIS) #> ... :what ( 0%) [ ETA: ?s | Elapsed: 0s ] ... processing 1 of 3 ( 33%) [ ETA: 0s | Elapsed: 0s ] ... processing 2 of 3 ( 67%) [ ETA: 0s | Elapsed: 1s ] ... processing 3 of 3 (100%) [ ETA: 0s | Elapsed: 1s ] #> warning: zip code does not begin with 450, 451, or 452; returning NA #> [] #> # A tibble: 1 x 5 #> lat lon PARCELID score match #> <dbl> <dbl> <chr> <dbl> <chr> #> 1 39.1 -84.5 010400020052 1 3333 burnet av 45229 #> #> [] #> # A tibble: 1 x 5 #> lat lon PARCELID score match #> <dbl> <dbl> <chr> <dbl> <chr> #> 1 39.1 -84.5 007500040247 0 222 central pkwy 45202 e #> #> [] #>  NA