Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pad fips with 0's for geocode() #47

Closed
atungate opened this issue Sep 16, 2020 · 4 comments
Closed

Pad fips with 0's for geocode() #47

atungate opened this issue Sep 16, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@atungate
Copy link

Fantastic package! One thing for you to consider is to pad the fips from geocode(..., method = 'census', full_results = TRUE, return_type = "geographies") with 0's on the left to be consistent with the number of digits expected (outlined by the census: https://www.census.gov/programs-surveys/geography/guidance/geo-identifiers.html#par_textimage_8). Of course, you'll also have to force those columns to be character because of the leading zeros. For example, changing this:

library(dplyr)
library(tidygeocoder)

address_single <- tibble(singlelineaddress = c('11 Wall St, NY, NY', 
                                               '600 Peachtree Street NE, Atlanta, Georgia'))
address_components <- tribble(
  ~street                      , ~cty,               ~st,
  '11 Wall St',                  'NY',               'NY',
  '600 Peachtree Street NE',     'Atlanta',          'GA'
)

address_single %>% geocode(address = singlelineaddress, method = 'census',
                           full_results = TRUE, return_type = "geographies") %>%
  select(matches("(fips|tract|block)$"))
# A tibble: 2 x 4
  state_fips county_fips census_tract census_block
       <int>       <int>        <int>        <int>
1         36          61          700         1008
2         13         121         1900         2003

To this:

address_single %>% geocode(address = singlelineaddress, method = 'census',
                           full_results = TRUE, return_type = "geographies") %>%
  transmute(state_fips = stringr::str_pad(state_fips, 2, pad = "0"),
            county_fips = stringr::str_pad(county_fips, 3, pad = "0"),
            census_tract = stringr::str_pad(census_tract, 6, pad = "0"),
            census_block = stringr::str_pad(census_block, 4, pad = "0"))
# A tibble: 2 x 4
  state_fips county_fips census_tract census_block
  <chr>      <chr>       <chr>        <chr>       
1 36         061         000700       1008        
2 13         121         001900       2003 

If it would help I can submit a PR, I just need to dig into the code more to figure out if you can retain the 0's upstream or if you'll need to do it similar to how I did it (but avoiding the stringr import).

@jessecambon jessecambon added the enhancement New feature or request label Sep 18, 2020
@jessecambon
Copy link
Owner

jessecambon commented Sep 22, 2020

@atungate thanks for this, I agree this would be a good enhancement. I would just want to make sure that it's done in a way where it doesn't interfere with other columns of the same name (ie. "census_block" etc.) that might already exist in an input dataset (ie. not coming from the geocoding service results). Also, as you note having the solution in base R so that we don't have to add a dependency would be good.

In line 294 of geo.R there is a call to the extract_results() function which is in utils.R. I think we could probably add an if statement to extract_results() to implement this solution (ie. if method == 'census' & full_results == TRUE & return_type = "geographies"). I'll have somewhat limited time for development in the near future, but you're welcome to take a stab at it and submit a PR.

@atungate
Copy link
Author

@jessecambon, sounds good and thanks for the head-start! I will take a look at things and submit a PR in the next week or two--then you can decide from there!

jessecambon added a commit that referenced this issue Dec 20, 2020
jessecambon added a commit that referenced this issue Dec 20, 2020
@jessecambon
Copy link
Owner

jessecambon commented Dec 21, 2020

@atungate I've got a fix for this in the master branch now if you want to test it out. The state_fips , county_fips, census_tract, and census_block fields are now read in as character (instead of numeric) so the leading zeros are preserved (when using the Census batch geocoder).

library(tidygeocoder)
library(dplyr)
a <- sample_addresses %>% head(3) %>% geocode(addr, full_results = TRUE, return_type = 'geographies')
View(a)

image

@atungate
Copy link
Author

Hey @jessecambon, works great for me! Thank you! I took a crack at fixing it upstream but couldn't figure it out. I was going to come back to it and use sprintf() in-place of str_pad()but I am glad you figured out a better solution (than the one I was going to try). Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants