Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review process_* functions for time data and layer name #68

Closed
mitchellmanware opened this issue Apr 17, 2024 · 7 comments
Closed

Review process_* functions for time data and layer name #68

mitchellmanware opened this issue Apr 17, 2024 · 7 comments
Assignees
Labels
enhancement New feature or request manuscript Development or enhancement identified during manuscript writing.

Comments

@mitchellmanware
Copy link
Collaborator

Review "static" data set functions (ie GMTED, population, groads, NLCD) to ensure time information is returned. All objects should have time orientation, even if not frequently updated. May require hard-coding of year.

@mitchellmanware mitchellmanware added enhancement New feature or request manuscript Development or enhancement identified during manuscript writing. labels Apr 17, 2024
@mitchellmanware mitchellmanware self-assigned this Apr 17, 2024
@sigmafelix
Copy link
Collaborator

@mitchellmanware In process_nlcd, I use terra::metags to record year. Since GMTED and SEDAC data were produced in a single year, year could be added in the similar way with year value hard-coded. If we expect future updates in these datasets, it would be good to add year argument in process_* functions.

terra::metags(nlcd) <- c(year = year)

@mitchellmanware
Copy link
Collaborator Author

mitchellmanware commented Apr 30, 2024

  • GMTED
  • SEDAC groads
  • SEDAC population
  • Koppen-Geiger
  • Ecoregions

@mitchellmanware
Copy link
Collaborator Author

mitchellmanware commented Apr 30, 2024

@sigmafelix
When reviewing the process_ and calc_ functions, I have noticed that some of the calc_ functions you created accept only SpatVector or sf objects as locations.

Example is from calc_ecoregions

calc_ecoregion <-
  function(
    from = NULL,
    locs,
    locs_id = "site_id",
    ...
  ) {

    if (!methods::is(locs, "SpatVector")) {
      locs <- terra::vect(locs)
    }

Is there a reason you do not use the process_conformity function to accept SpatVector, sf, and data.frame alike?

Could be:

calc_ecoregion <-
  function(
    from = NULL,
    locs,
    locs_id = "site_id",
    ...
  ) {

    if (!methods::is(locs, "SpatVector")) {
      locs <- process_conformity(locs = locs)
    }

to accept all three classes.

@mitchellmanware
Copy link
Collaborator Author

See commit 062f448.

Year/range metadata tag has been added for GMTED, groads, population, and Koppen Geiger process_* functions and a $time column for their calc_ functinos. For GMTED and SEDAC population, single year is returned (always 2010 for GMTED and variable for population depending on user-selected year).

For SEDAC groads, Koppen Geiger, and ecoregions functions, I have added the year range coverage as indicated by the datasets' descriptions. For example, SEDAC groads data was collected covering the period of 1980 to 2010, and is therefore added as a metadata tag and covariate column.

> ### sedac groads
> g <- process_sedac_groads(
+   path = "tests/testdata/groads_test.shp"
+ )
> calc_sedac_groads(
+   g,
+   l,
+   "id"
+ )
                id        time GRD_TOTAL_0_01000 GRD_DENKM_0_01000
1 3799900018810101 1980 - 2010          1.762476         0.5633273



> ### koppen geiger
> k <- process_koppen_geiger(
+   path = "tests/testdata/koppen_subset.tif"
+ )
> terra::metags(k)
         year 
"1980 - 2016" 
> calc_koppen_geiger(
+   k,
+   l,
+   "id"
+ )
                id        time DUM_CLRGA_0_00000 DUM_CLRGB_0_00000 DUM_CLRGC_0_00000 DUM_CLRGD_0_00000 DUM_CLRGE_0_00000
1 3799900018810101 1980 - 2016                 0                 0                 1                 0                 0



> ### ecoregions
> e <- process_ecoregion(
+   path = "tests/testdata/eco_l3_clip.gpkg"
+ )
> site_faux <-
+   data.frame(
+     site_id = "37999109988101",
+     lon = -77.576,
+     lat = 39.40,
+     date = as.Date("2022-01-01")
+   )
> site_faux <-
+   terra::vect(
+     site_faux,
+     geom = c("lon", "lat"),
+     keepgeom = TRUE,
+     crs = "EPSG:4326")
> site_faux <- terra::project(site_faux, "EPSG:5070")
> calc_ecoregion(
+   e,
+   site_faux,
+   "site_id"
+ )
         site_id        time DUM_E2083_0_00000 DUM_E3064_0_00000
1 37999109988101 1997 - 2024                 1                 1
> 

Although this does not conform to the normal values in the $time column, at least it is consistent with the original dataset.

HUC and OpenLandMap are the only datasets that do not include some sort of time information.

@sigmafelix
Copy link
Collaborator

@mitchellmanware I think time field is supposed to be working as one of keys. In the demonstration above, the time field looks like a field with description on the time of representation in the source dataset. An advantage of using time field as a key is that users will be able to join multiple calc_* results with common keys. Could we move the source data description into a separate field with a name, for example, description?

@mitchellmanware
Copy link
Collaborator Author

@sigmafelix
Yes, that makes sense. I will update.

@mitchellmanware
Copy link
Collaborator Author

Update

> e <- process_ecoregion(
+   path = "tests/testdata/eco_l3_clip.gpkg"
+ )
> site_faux <-
+    data.frame(
+      id = "1",
+      lon = -77.576,
+      lat = 39.40,
+      date = as.Date("2022-01-01")
+ )
> site_faux <- terra::vect(site_faux, crs = "EPSG:4326")
> site_proj <- terra::project(site_faux, terra::crs(e))
> calc_ecoregion(
+   e,
+   site_proj,
+   "id"
+ )
  id description DUM_E2083_0_00000 DUM_E3064_0_00000
1  1 1997 - 2024                 1                 1

sigmafelix added a commit that referenced this issue May 1, 2024
- To make calc_tri, calc_nlcd, calc_sedac_population abide by the protocol discussed in #68
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request manuscript Development or enhancement identified during manuscript writing.
Projects
None yet
Development

No branches or pull requests

2 participants