Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geocode(): Toggle geographic columns returned #58

Closed
kaijagahm opened this issue Aug 25, 2020 · 4 comments
Closed

geocode(): Toggle geographic columns returned #58

kaijagahm opened this issue Aug 25, 2020 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@kaijagahm
Copy link

I've noticed that depending on the specificity of the address sent to hereR::geocode(), different columns are returned in the resulting sf/data frame object.

Here's a reproducible example (assuming you have the geocoding API set up):

# Make some sample data
vec <- c("boston, massachusetts", "massachusetts") # two localities, one of which can be geocoded down to the city/zip code level
vec2 <- "massachusetts" #locality that can only be geocoded down to the US state level

# Geocode each of these (I'm leaving sf = F just for simplicity of display of the output. We get the same behavior with sf = T).
vec_gc <- hereR::geocode(vec, sf = F)
vec2_gc <- hereR::geocode(vec2, sf = F)

# View the results
vec_gc
    id                 address                    postalCode   city       county state country  type       lng              lat
1  1 Boston, MA, United States      02109            Boston Suffolk  MA     USA      point      -71.05675 42.35866
2  2         MA, United States           <NA>             <NA>    <NA>   MA     USA      point      -71.05675 42.35866

vec2_gc
    id  address                     state  country  type       lng              lat
1  1   MA, United States    MA     USA        point      -71.05675 42.35866

Note that because vec2 could not be geocoded more precisely than a US state, additional columns (city, county, postalCode) don't show up.

I can understand why this behavior would be desired: it prevents the user from getting back a needlessly-large data frame with a huge number of NA values. But I'm trying to write an automated workflow where I'll be geocoding addresses at varying levels of specificity. I need to know ahead of time what columns will be present in my output data frame, so that I can write functions to operate on it without first checking whether the county/state/city columns exist.

Is there currently a way to toggle the inclusion of additional columns in geocode()? If not, would you consider adding this feature? Maybe an argument show_all_cols that defaults to FALSE, or something of that nature.

@munterfi munterfi self-assigned this Aug 26, 2020
@munterfi munterfi added the enhancement New feature or request label Aug 26, 2020
@munterfi
Copy link
Owner

Thank you for reporting. The reason for this behavior of geocode() is that the response received by the Geocoder API differs depending on the input addresses.

library(hereR)
set_verbose(TRUE)

addr_1 <- "boston, massachusetts"
addr_2 <- "massachusetts"

url_1 <- geocode(addr_1, url_only = TRUE)
url_2 <- geocode(addr_2, url_only = TRUE)

json_1 <- hereR:::.get_content(url_1)
#> Sending 1 request(s) to: 'https://geocoder.ls.hereapi.com/6.2/geocode.json?...'
#> Received 1 response(s) with total size: 1.2 Kb
json_2 <- hereR:::.get_content(url_2)
#> Sending 1 request(s) to: 'https://geocoder.ls.hereapi.com/6.2/geocode.json?...'
#> Received 1 response(s) with total size: 1.1 Kb

See the "Address" key in the JSON response of your example addresses, which {hereR} receives from the API:

First response
jsonlite::prettify(json_1)
#> {
#>     "Response": {
#>         "MetaInfo": {
#>             "Timestamp": "2020-08-26T18:04:05.229+0000"
#>         },
#>         "View": [
#>             {
#>                 "_type": "SearchResultsViewType",
#>                 "ViewId": 0,
#>                 "Result": [
#>                     {
#>                         "Relevance": 1.0,
#>                         "MatchLevel": "city",
#>                         "MatchQuality": {
#>                             "State": 1.0,
#>                             "City": 1.0
#>                         },
#>                         "Location": {
#>                             "LocationId": "NT_0iuf-KHj2X8GfdEYL.GQXD",
#>                             "LocationType": "point",
#>                             "DisplayPosition": {
#>                                 "Latitude": 42.35866,
#>                                 "Longitude": -71.05675
#>                             },
#>                             "NavigationPosition": [
#>                                 {
#>                                     "Latitude": 42.35866,
#>                                     "Longitude": -71.05675
#>                                 }
#>                             ],
#>                             "MapView": {
#>                                 "TopLeft": {
#>                                     "Latitude": 42.39726,
#>                                     "Longitude": -71.19121
#>                                 },
#>                                 "BottomRight": {
#>                                     "Latitude": 42.22786,
#>                                     "Longitude": -70.80852
#>                                 }
#>                             },
#>                             "Address": {
#>                                 "Label": "Boston, MA, United States",
#>                                 "Country": "USA",
#>                                 "State": "MA",
#>                                 "County": "Suffolk",
#>                                 "City": "Boston",
#>                                 "PostalCode": "02109",
#>                                 "AdditionalData": [
#>                                     {
#>                                         "value": "United States",
#>                                         "key": "CountryName"
#>                                     },
#>                                     {
#>                                         "value": "Massachusetts",
#>                                         "key": "StateName"
#>                                     },
#>                                     {
#>                                         "value": "Suffolk",
#>                                         "key": "CountyName"
#>                                     },
#>                                     {
#>                                         "value": "N",
#>                                         "key": "PostalCodeType"
#>                                     }
#>                                 ]
#>                             }
#>                         }
#>                     }
#>                 ]
#>             }
#>         ]
#>     }
#> }
#> 
Second response
jsonlite::prettify(json_2)
#> {
#>     "Response": {
#>         "MetaInfo": {
#>             "Timestamp": "2020-08-26T18:04:05.243+0000"
#>         },
#>         "View": [
#>             {
#>                 "_type": "SearchResultsViewType",
#>                 "ViewId": 0,
#>                 "Result": [
#>                     {
#>                         "Relevance": 1.0,
#>                         "MatchLevel": "state",
#>                         "MatchQuality": {
#>                             "State": 1.0
#>                         },
#>                         "Location": {
#>                             "LocationId": "NT_xsuPM8XX3.ynKq8rCjhCcB",
#>                             "LocationType": "point",
#>                             "DisplayPosition": {
#>                                 "Latitude": 42.35866,
#>                                 "Longitude": -71.05675
#>                             },
#>                             "NavigationPosition": [
#>                                 {
#>                                     "Latitude": 42.35866,
#>                                     "Longitude": -71.05675
#>                                 }
#>                             ],
#>                             "MapView": {
#>                                 "TopLeft": {
#>                                     "Latitude": 42.88677,
#>                                     "Longitude": -73.50814
#>                                 },
#>                                 "BottomRight": {
#>                                     "Latitude": 41.13234,
#>                                     "Longitude": -69.92823
#>                                 }
#>                             },
#>                             "Address": {
#>                                 "Label": "MA, United States",
#>                                 "Country": "USA",
#>                                 "State": "MA",
#>                                 "AdditionalData": [
#>                                     {
#>                                         "value": "United States",
#>                                         "key": "CountryName"
#>                                     },
#>                                     {
#>                                         "value": "Massachusetts",
#>                                         "key": "StateName"
#>                                     }
#>                                 ]
#>                             }
#>                         }
#>                     }
#>                 ]
#>             }
#>         ]
#>     }
#> }
#> 

A quick fix for your automated workflow could be rbindlist() with the argument fill = TRUE from the {data.table} package:

data.table::rbindlist(
  list(
    geocode(addr_1, sf = FALSE),
    geocode(addr_2, sf = FALSE)
  ),
  fill = TRUE
)
#> Sending 1 request(s) to: 'https://geocoder.ls.hereapi.com/6.2/geocode.json?...'
#> Received 1 response(s) with total size: 1.2 Kb
#> Sending 1 request(s) to: 'https://geocoder.ls.hereapi.com/6.2/geocode.json?...'
#> Received 1 response(s) with total size: 1.1 Kb
#>    id                   address postalCode   city  county state country  type
#> 1:  1 Boston, MA, United States      02109 Boston Suffolk    MA     USA point
#> 2:  1         MA, United States       <NA>   <NA>    <NA>    MA     USA point
#>          lng      lat
#> 1: -71.05675 42.35866
#> 2: -71.05675 42.35866

I will consider adding consistent columns in the return value of geocode() as default behavior in the next release of the package.

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       macOS Catalina 10.15.6      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Europe/Zurich               
#>  date     2020-08-26                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  backports     1.1.8   2020-06-17 [1] CRAN (R 4.0.2)
#>  callr         3.4.3   2020-03-28 [1] CRAN (R 4.0.0)
#>  class         7.3-17  2020-04-26 [1] CRAN (R 4.0.2)
#>  classInt      0.4-3   2020-04-07 [1] CRAN (R 4.0.0)
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 4.0.0)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  curl          4.3     2019-12-02 [1] CRAN (R 4.0.1)
#>  data.table    1.13.0  2020-07-24 [1] CRAN (R 4.0.2)
#>  DBI           1.1.0   2019-12-15 [1] CRAN (R 4.0.0)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools      2.3.1   2020-07-21 [1] CRAN (R 4.0.2)
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 4.0.0)
#>  dplyr         1.0.2   2020-08-18 [1] CRAN (R 4.0.2)
#>  e1071         1.7-3   2019-11-26 [1] CRAN (R 4.0.0)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
#>  generics      0.0.2   2018-11-29 [1] CRAN (R 4.0.0)
#>  glue          1.4.1   2020-05-13 [1] CRAN (R 4.0.0)
#>  hereR       * 0.4.1   2020-08-24 [1] CRAN (R 4.0.2)
#>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.2)
#>  jsonlite      1.7.0   2020-06-25 [1] CRAN (R 4.0.2)
#>  KernSmooth    2.23-17 2020-04-26 [1] CRAN (R 4.0.2)
#>  knitr         1.29    2020-06-23 [1] CRAN (R 4.0.2)
#>  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.0)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.0)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
#>  pillar        1.4.6   2020-07-10 [1] CRAN (R 4.0.2)
#>  pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.2)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
#>  processx      3.4.3   2020-07-05 [1] CRAN (R 4.0.2)
#>  ps            1.3.4   2020-08-11 [1] CRAN (R 4.0.2)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.0)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 4.0.0)
#>  Rcpp          1.0.5   2020-07-06 [1] CRAN (R 4.0.2)
#>  remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.2)
#>  rlang         0.4.7   2020-07-09 [1] CRAN (R 4.0.2)
#>  rmarkdown     2.3     2020-06-18 [1] CRAN (R 4.0.2)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 4.0.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  sf            0.9-5   2020-07-14 [1] CRAN (R 4.0.2)
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 4.0.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.2)
#>  testthat      2.3.2   2020-03-02 [1] CRAN (R 4.0.2)
#>  tibble        3.0.3   2020-07-10 [1] CRAN (R 4.0.2)
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.0)
#>  units         0.6-7   2020-06-13 [1] CRAN (R 4.0.2)
#>  usethis       1.6.1   2020-04-29 [1] CRAN (R 4.0.0)
#>  vctrs         0.3.2   2020-07-15 [1] CRAN (R 4.0.2)
#>  withr         2.2.0   2020-04-20 [1] CRAN (R 4.0.0)
#>  xfun          0.16    2020-07-24 [1] CRAN (R 4.0.2)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

@kaijagahm
Copy link
Author

Thank you for such a detailed response! I came across rbindlist as a solution, and that seems to be working pretty well for me, so I'm glad to hear that's also what you would do.

@munterfi
Copy link
Owner

The columns are now consistently the same, regardless of the level of the input address.

library(hereR)
set_verbose(TRUE)

geocode("boston, massachusetts", sf = FALSE)
#> Sending 1 request(s) to: 'https://geocode.search.hereapi.com/v1/geocode?...'
#> Received 1 response(s) with total size: 896 bytes
#>   id                   address     type street house_number postal_code
#> 1  1 Boston, MA, United States locality   <NA>         <NA>       02109
#>   district   city  county         state       country lng_access lat_access
#> 1     <NA> Boston Suffolk Massachusetts United States         NA         NA
#>   lng_position lat_position
#> 1    -71.05674     42.35866

geocode("massachusetts", sf = FALSE)
#> Sending 1 request(s) to: 'https://geocode.search.hereapi.com/v1/geocode?...'
#> Received 1 response(s) with total size: 832 bytes
#>   id           address               type street house_number postal_code
#> 1  1 MA, United States administrativeArea   <NA>         <NA>        <NA>
#>   district city county         state       country lng_access lat_access
#> 1     <NA> <NA>   <NA> Massachusetts United States         NA         NA
#>   lng_position lat_position
#> 1    -71.05674     42.35866

To check, install the development version from GitHub: remotes::install_github("munterfinger/hereR@develop")

@kaijagahm
Copy link
Author

kaijagahm commented Dec 21, 2020

@munterfinger thanks for adding this. I'm noticing that in the latest package release, the country column returns a country name, rather than a 3-letter ISO country code as in previous releases.

Is this a change that comes from hereR, or from the HERE API itself? Either way, do you know whether the country names returned are ISO standard names, or something else? This is a change that breaks some code I had written. That's okay--I'm updating my code accordingly. But I do need to convert those country names back to the ISO codes for consistency with my previous results, so I need to know what standards the names are following.

EDIT: Actually, it seems like the names returned are not ISO standard names, and I can't tell what they are. In an example I just ran, I got "United States", but "Brasil" (not "Brazil"). I can't find a system of country names that would include both of those.

If you know which country names are being used, that would be great. If not, is there a way to still return those country codes?
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants