Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add USGS PAD data to FedData? #100

Closed
kevinwolz opened this issue Jun 1, 2023 · 8 comments
Closed

Add USGS PAD data to FedData? #100

kevinwolz opened this issue Jun 1, 2023 · 8 comments

Comments

@kevinwolz
Copy link

@bocinsky have you considered adding the USGS's PAD (Protected Areas Database) to FedData? It's an incredible, aggregated dataset across federal, state, county, and other agencies. I'm not sure if they have an easy way to query. I'd be happy to help if you thought it was feasible/of interest.

@bocinsky
Copy link
Collaborator

bocinsky commented Jun 1, 2023

I love the PAD-US dataset, and USGS has ArcGIS web services available that should make including it straightforward. The way we access the NHD could serve as a template. I support including PAD, but haven't had ample time for development lately (beyond bug-squashing). I'll add it to the to-do for the next major release!

@bocinsky
Copy link
Collaborator

bocinsky commented Jun 6, 2023

Found some time, and was excited to implement this. Do you mind testing out the implementation? Install the dev version, and see ?get_padus.

@kevinwolz
Copy link
Author

kevinwolz commented Jun 13, 2023

@bocinsky thanks for diving in! Sorry for the slow follow up. I just dove in. The function works very well as intended.

I do have a philosophical question/feedback, though:
Previously, I have been downloading the PAD-US data as Shapefiles by state from HERE. This data comes into R as a single sf feature collection. I'm not very familiar with the PAD-US data, so I was a bit surprised to see that get_padus did not also return a single sf feature collection. I took a quick glance and it looks like, for the most part, the list of features that are returned all designate the same areas. Some features are more aggregated (boundaries dissolved) than others, and they each have different associated data fields.

From a simplicity and easy-of-use perspective, especially for those not as familiar with the PAD-US data, this is a bit confusing. If each of the returned features represented distinct geometries (e.g. as is the case for the multiple layers returned by get_nhd), I could see this analagous method making the most sense, but currently it seems like what's returned is the same spatial data in 10 different variations. Especially when downloading the same spatial data in 10 variations does indeed increase download times by 10x!

I realize that you did not come up with this data structure, so if you want to maintain the FedData approach of adhering to the raw data structure of the data source, I can definitely understand that. However, you might consider one or more of the following approaches, which might improve ease-of-use for the function in the majority of use cases:

  • Only retrieve the "highest resolution" layer, which seems to be Manager_Name or Manager_Type and return this.
  • Retain the ability to retrieve the other layers, but have the function default be layer = "Manager_Name" so that the default functionality is the simplified, most streamlined use case.
  • Add a processing step wherein the various layers are merged together so that you end up with the highest resolution features (from Manager_Name) but with all of the additional data fields from the other layers.
  • See how the data has been processed into the state-level shapefiles (linked above), and mimic that approach to aggregating the various layers into a single layer

@bocinsky
Copy link
Collaborator

Hey @kevinwolz, sorry for the month and a half delay in responding to this... it's been a busy summer! Thanks for this discussion... you highlight something that I've really struggled with in developing FedData.

I definitely have landed on the "don't mess with the original data" side of things, and have tried to adhere to that across the different datasets FedData provides access to. An example is the incredibly complicated SSURGO data structure, which returns a list with a spatial (sf) table defining mapunits (a term of art for NRCS) and a very long and complicated list of tabular data that has a hierarchical structure with various one-to-one and one-to-many relationships between the tables. I've actually included some functions that enable simplification of that data structure (basic filtering, for example), but generally just give people the data for their AOI and leave it at that.

That being said, the approach you suggest for PAD is quite sensible — and is in fact what I've done myself when I've used the PAD-US data. For FedData, I might take a somewhat different approach where I do spatial joins between all of the layers, which would preserve geometrical differences between the layers and identify (via a new column) from which PAD-US layer the data derive. (This is similar to your third bullet point, I think, except retaining all of the unique geometries.)

The simplest solution for now, I think, is to just be opinionated about the layer we download by default ("Manager Name" seems right) and retain the ability for people to get the other layers. I'll open this back up and implement that suggestion.

@bocinsky bocinsky reopened this Jul 28, 2023
@lanescher
Copy link

Hey @bocinsky, I use your functions all the time and am excited to access PAD-US data! I've tried several variation of get_padus() that all give me the same error that I can't figure out. Do you know what could be causing this? When I run:

devtools::install_github("ropensci/FedData")
library(FedData)

PADUS <- get_padus(
  template = FedData::meve,
  label = "meve"
)

I get the following error:

Error in `purrr::map()`:
ℹ In index: 1.
ℹ With name: Protection_Status_by_GAP_Status_Code.
Caused by error in `purrr::map()`:
ℹ In index: 1.
ℹ With name: Protected Areas Database of the United States (PAD-US) v3.0.
Caused by error:
! lexical error: invalid char in json text.
                                       Bad Request
                     (right here) ------^
Run `rlang::last_trace()` to see where the error occurred.
> rlang::last_trace()
<error/purrr_error_indexed>
Error in `purrr::map()`:
ℹ In index: 1.
ℹ With name: Protection_Status_by_GAP_Status_Code.
Caused by error in `purrr::map()`:
ℹ In index: 1.
ℹ With name: Protected Areas Database of the United States (PAD-US) v3.0.
Caused by error:
! lexical error: invalid char in json text.
                                       Bad Request
                     (right here) ------^
---
Backtrace:
     ▆
  1. ├─FedData::get_padus(template = FedData::meve, label = "meve")
  2. │ └─padus_services[layer] %>% ...
  3. ├─purrr::map(...)
  4. │ └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
  5. │   ├─purrr:::with_indexed_errors(...)
  6. │   │ └─base::withCallingHandlers(...)
  7. │   ├─purrr:::call_with_cleanup(...)
  8. │   └─FedData (local) .f(.x[[i]], ...)
  9. │     └─... %>% magrittr::extract2(1)
 10. ├─FedData:::esri_feature_query(., geom = template)
 11. │ └─layer_ids %>% ...
 12. ├─purrr::map(...)
 13. │ └─purrr:::map_("list", .x, .f, ..., .progress = .progress)
 14. │   ├─purrr:::with_indexed_errors(...)
 15. │   │ └─base::withCallingHandlers(...)
 16. │   ├─purrr:::call_with_cleanup(...)
 17. │   └─FedData (local) .f(.x[[i]], ...)
 18. │     └─... %>% magrittr::extract2("objectIds")
 19. └─jsonlite::fromJSON(.)
 20.   └─jsonlite:::parse_and_simplify(...)
 21.     └─jsonlite:::parseJSON(txt, bigint_as_char)
 22.       └─jsonlite:::parse_string(txt, bigint_as_char)

Thank you!

@bocinsky
Copy link
Collaborator

bocinsky commented Aug 9, 2023

Hi there! I'm glad you find FedData useful! I'm at a bit of a loss as to what is going wrong here, though... I confirmed on a friend's computer that this is a problem on Windows machines (I use a Mac), so it definitely isn't just an issue for you. Let me get on a VM and I'll try and get to the bottom of it.

@bocinsky
Copy link
Collaborator

Alright, I think this is fixed; tested on a local install of Windows 10. @lanescher could you install the latest dev version from Github and give it a go? Closing for now.

@lanescher
Copy link

That worked! Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants