Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated shapefile updates #5

Closed
pdil opened this issue Dec 11, 2023 · 1 comment · Fixed by #7
Closed

Automated shapefile updates #5

pdil opened this issue Dec 11, 2023 · 1 comment · Fixed by #7
Assignees
Labels
tech Technical issues and pull requests not directly related to usmapdata functionality

Comments

@pdil
Copy link
Owner

pdil commented Dec 11, 2023

Currently the process of updating shapefiles is only semi-automated. While there are scripts in the data-raw folder that will convert US Census Bureau shapefiles into data frames readable by usmapdata, it is still a process that requires manual checking of the census website and downloading the appropriate files.

In order to provide for more timely updates to shapefiles as new ones are released, it would be beneficial to create (or attempt to create) a fully automated process that can check the relevant website(s), download the files, and create the csv data files automatically.

GitHub Actions workflows can be used to facilitate this:

  1. Scheduled monthly task to check US Census Bureau website for shapefile updates*
  2. If update exists, download required files*
  3. Process files and convert them to .csv files that are readable by usmapdata
  4. Open a pull request with the changes and new files (possibly send a notification using Pushover)
  5. Project maintainer(s) review the changes and ensure everything is correct, works, and tests pass
  6. Changes are made manually to resolve any issues and then the pull request is merged

*Monitoring and download can be done with either shell, python, or R scripts - whichever is more ergonomic for the given task

@pdil pdil added the tech Technical issues and pull requests not directly related to usmapdata functionality label Dec 11, 2023
@pdil pdil self-assigned this Dec 11, 2023
@pdil pdil pinned this issue Dec 11, 2023
@pdil
Copy link
Owner Author

pdil commented Dec 16, 2023

PR #6 lays the groundwork for this change by modernizing the shapefiles used by the package.

pdil added a commit that referenced this issue Dec 16, 2023
Shapefiles were previously created using a mixture of retired packages
such as `rgeos` and `rgdal`. The new system is fully integrated into the
`usmapdata` package using an internal function `create_map_data()` which
creates modified shapefiles in a similar way as the legacy method,
except through the use of the `sf` package. This should greatly future
proof the creation of modified shapefiles for `usmap` and allow for
interesting automations in the future to help keep shapefiles up to
date.

For now the default behavior of the `us_map` and `fips_data` functions
remains unchanged. To obtain the new shapefiles, the `as_sf` parameter
must be set to `TRUE`. In the future this will become the new default
and the parameter will be removed.

The new functions also return an `sf` object instead of a `data.frame`.
This will make plotting easier through the use of `ggplot2::geom_sf()`.
The shapefiles are now stored in GeoPackages (`.gpkg`) which greatly
reduces the file size of the CSVs used previously and allows for much
more flexibility of data manipulation using `sf`.

This pull request is a pre-requisite for #5.
@pdil pdil closed this as completed in #7 Dec 17, 2023
pdil added a commit that referenced this issue Dec 27, 2023
Shapefiles were previously created using a mixture of retired packages
such as `rgeos` and `rgdal`. The new system is fully integrated into the
`usmapdata` package using an internal function `create_map_data()` which
creates modified shapefiles in a similar way as the legacy method,
except through the use of the `sf` package. This should greatly future
proof the creation of modified shapefiles for `usmap` and allow for
interesting automations in the future to help keep shapefiles up to
date.

For now the default behavior of the `us_map` and `fips_data` functions
remains unchanged. To obtain the new shapefiles, the `as_sf` parameter
must be set to `TRUE`. In the future this will become the new default
and the parameter will be removed.

The new functions also return an `sf` object instead of a `data.frame`.
This will make plotting easier through the use of `ggplot2::geom_sf()`.
The shapefiles are now stored in GeoPackages (`.gpkg`) which greatly
reduces the file size of the CSVs used previously and allows for much
more flexibility of data manipulation using `sf`.

This pull request is a pre-requisite for #5.
@pdil pdil unpinned this issue Dec 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tech Technical issues and pull requests not directly related to usmapdata functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant