Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submission: grainchanger #289

Closed
laurajanegraham opened this issue Mar 14, 2019 · 23 comments

Comments

@laurajanegraham
Copy link

commented Mar 14, 2019

Submitting Author: Name (@laurajanegraham)
Repository: https://github.com/laurajanegraham/grainchanger
Version submitted: 0.1.0 .9000
Editor: @karthik
Reviewer 1: @mbjoseph
Reviewer 2: @johnbaums
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: grainchanger
Title: Moving-Window and Direct Data Aggregation
Version: 0.1.0.9000
Authors@R: c(person("Laura", "Graham", 
             email = "LauraJaneEGraham@gmail.com", 
             role = c("aut", "cre"),
             comment = c(ORCID = "0000-0002-3611-7281")),
             person("Felix", "Eigenbrod",
             role = "ctb",
             email = "f.eigenbrod@soton.ac.uk",
             comment = "Input on initial conceptual development"),
             person("Marco", "Sciaini",
             role = "ctb",
             email = "sciaini.marco@gmail.com",
             comment = "Input on package development and structure"))
Description: Data aggregation via moving window or direct methods. Aggregate a 
    fine-resolution raster to a grid. The moving window method smooths the surface 
    using a specified function within a moving window of a specified size and shape 
    prior to aggregation. The direct method simply aggregates to the grid using the 
    specified function.
Depends: R (>= 3.3)
License: GPL-3
Encoding: UTF-8
LazyData: true
Imports: 
    raster,
    sf,
    furrr,
    checkmate,
    methods
Suggests: 
    testthat,
    spelling,
    knitr,
    rmarkdown,
    covr,
    ggplot2,
    landscapetools
RoxygenNote: 6.1.0
Language: en-GB
VignetteBuilder: knitr
URL: https://github.com/laurajanegraham/grainchanger
BugReports: https://github.com/laurajanegraham/grainchanger/issues

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • database access
    • data munging
    • data deposition
    • reproducibility
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

The package provides utilities to aggregate fine resolution spatial data to a coarser resolution grid through 2 methods: (1) moving-window data aggregation; and (2) direct data aggregation.

  • Who is the target audience and what are scientific applications of this package?

Spatial scientists (geographers, spatial ecologists, social scientists, anyone who works with spatial data)

No. It is possible to do these analyses in the raster package, but it certainly isn't trivial to do so. We have streamlined the process for (a) ease of use; and (b) computational speed.

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Technical checks

Confirm each of the following by checking the box. This package:

Publication options

JOSS Options
  • The package has an obvious research application according to JOSS's definition.
    • The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
    • The package is deposited in a long-term repository with the DOI:
    • (Do not submit your package separately to JOSS)
MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@karthik

This comment has been minimized.

Copy link
Member

commented Mar 15, 2019

👋 @laurajanegraham! I will be your editor for this. Stay tuned for next steps.

@karthik

This comment has been minimized.

Copy link
Member

commented Mar 27, 2019

👋 @laurajanegraham Just a quick update that I'm waiting to hear back from reviewers. Once they accept I will update this thread soon. 🙏

@laurajanegraham

This comment has been minimized.

Copy link
Author

commented Mar 27, 2019

Thanks for letting me know @karthik .

@karthik

This comment has been minimized.

Copy link
Member

commented Mar 27, 2019

I've assigned @mbjoseph as the first reviewer. Thanks Max for agreeing to review! Looks like you’ve reviewed for us before but please let me know if you have any questions or issues. We look forward to your review by April 17th (let me know if anything comes up).

@karthik

This comment has been minimized.

Copy link
Member

commented Apr 1, 2019

Assigning @johnbaums as reviewer 2! Please get your review in by April 22. 🙏

@mbjoseph

This comment has been minimized.

Copy link
Member

commented Apr 10, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 8


Review Comments

Really neat package @laurajanegraham! Overall I think this is in great shape.

README

  • A description in the README that outlines the similarities and differences to raster::focal(), and how the moving window functionality relates to spatial convolution would be helpful. It took me a while to wrap my head around what the package did based on the README, but a more explicit connection to other operations that users may know already could reduce the spin-up time for new users.

  • It could be good to point out the target audience more directly in the README - i.e., some explicit statement that a user could identify with - landscape ecologists, or generally folks working with spatial data at different resolutions, etc.

  • The three panel figure in the README (mwda_schematic.png) I thought would be helpful to wrap my head around what's happening, but there were a few things that I found confusing:

  1. The terms "response" and "predictor" along with f(x) in the first panel might suggest some kind of model, but these aren't explained elsewhere.

  2. It would be nice to have an explanation for why the number of grid cells increases in the second figure. Was a torus created?

  3. An explanation of what the thick black squares represent would be useful.

  • The show_landscape function applied to a categorical raster might be improved by using a discrete colormap (via discrete = TRUE). I was confused briefly by the current example, where the colorbar implies continuity.

  • In the README there seem to be some places where "surface" and "landscape" are used interchangeably. Would using one or the other, (or "raster") generalize the language a bit?

Implementation and interface

  • Some of the function arguments are very concise, possibly at the expense of clarity. For example, the parameter g represents a grid, and the parameter dat represents a raster. Some more expressive argument names might be useful here (e.g., maybe grd and ras). This is definitely tricky because you might not want to use grid and raster, but something to think about.

  • The na.rm argument to nm_prop appears to be unused - is that intentional?

Documentation

  • An example of how to use a custom function for moving window and direct aggregation might be useful for users who want to extend or modify the current functionality.

  • In the help file for winmove_agg, it might be useful to elaborate on the description of the type argument to list the valid values (I can tell that these are at least "circle", "Gauss", and "rectangle", but maybe there are more?).

  • Great to have the code of conduct, but it would be even better to also have contributing guidelines (either in the README or in a separate file) to outline your preferred way of having others contribute to the package.

  • You could add a CodeMeta file for the package as described in the rOpenSci dev guide: https://ropensci.github.io/dev_guide/building.html#creating-metadata-for-your-package

  • Missing a closing parenthesis on the Description field for var_range in the table of functions in the getting started vignette: https://laurajanegraham.github.io/grainchanger/articles/getting-started.html#functions

Testing

  • Building out some unit tests for winmove(...fun = 'prop'), winmove() with user defined functions, and the uncovered functions in R/calc_functions.R would be a nice addition.

  • For the winmove tests, it might be a good idea to use some different values of d - currently all of the tests use d = 5, which is great for that case, but as a consequence the tests might not reveal whether future changes break the functionality that enable different values of d to alter the function's behavior.

  • If possible, it would be nice to have testing on Windows via AppVeyor.

Generic checks

  • goodpractice checks are pretty clean, but do generate some potentially useful suggestions:
> goodpractice::gp()
...
It is good practice towrite unit tests for all functions, and all package code in general. 81% of code lines are covered by test cases.

    R/calc_functions.R:11:NA
    R/calc_functions.R:12:NA
    R/calc_functions.R:43:NA
    R/calc_functions.R:44:NA
    R/calc_functions.R:47:NA
    ... and 16 more linesavoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters
    wide. Try make your lines shorter than 80 characters

    R/create_torus.R:41:1
    R/create_torus.R:43:1
    R/create_torus.R:47:1
    R/create_torus.R:49:1
    R/create_torus.R:51:1
    ... and 4 more linesavoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using
    vapply() instead.

    R/calc_functions.R:34:8
  • A quick spell check raises a bunch of false positives as usual, but also some valid misspellings like "Ecololgy":
> spelling::spell_check_package()
  WORD                       FOUND IN
Codecov                    README.md:11
                           README.Rmd:20
CRS                        winmove_agg.Rd:15
                           winmove.Rd:12
doi                        cat_ls.Rd:11
                           cont_ls.Rd:11
                           g_sf.Rd:11
Ecololgy                   cat_ls.Rd:10
                           cont_ls.Rd:10
                           g_sf.Rd:10
Fritsch                    cat_ls.Rd:9,13
                           cont_ls.Rd:9
                           g_sf.Rd:9
grainchangerREADME.md:161
landscapetools             cat_ls.Rd:9,13,14
                           cont_ls.Rd:9
                           g_sf.Rd:9
lc                         README.md:128,130,132,133
NLMR                       cat_ls.Rd:9
                           cont_ls.Rd:9
                           g_sf.Rd:9
                           README.md:144
                           README.Rmd:119
                           getting-started.Rmd:100
org                        cat_ls.Rd:11,14
                           cont_ls.Rd:11
                           g_sf.Rd:11
rm                         nomove_agg.Rd:17
                           winmove_agg.Rd:24
                           winmove.Rd:21
shei                       README.md:130,132
Simpkins                   cat_ls.Rd:9
                           cont_ls.Rd:9
                           g_sf.Rd:9
sp                         nomove_agg.Rd:23
SpatialPolygonsDataFrame   nomove_agg.Rd:11
                           winmove_agg.Rd:11
useage                     README.md:118
                           README.Rmd:91
                           getting-started.Rmd:72
var                        README.md:134
wm                         README.md:128,129,130,131
@laurajanegraham

This comment has been minimized.

Copy link
Author

commented Apr 12, 2019

Thanks for your thorough review @mbjoseph - that's all really useful.

I'll deal with the comments all at once when we have the second review through, but in the meantime @karthik how would you recommend going about changing parameter names for a package that may already be in use? I agree that they should be made more meaningful.

@karthik

This comment has been minimized.

Copy link
Member

commented Apr 25, 2019

Hi @laurajanegraham Thanks for the question. We have a section in the dev guide that addresses this specific issue: https://ropensci.github.io/dev_guide/evolution.html#functions-changing-function-names

I hope that's helpful. If not please let us know and we can help more.

@laurajanegraham

This comment has been minimized.

Copy link
Author

commented Apr 26, 2019

Thanks @karthik that sounds like it'll answer the question.

@johnbaums

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

  • A statement of need clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s) demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions in R help
  • Examples for all exported functions in R Help that run successfully locally
  • Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • A short summary describing the high-level functionality of the software
  • Authors: A list of authors with their affiliations
  • A statement of need clearly stating problems the software is designed to solve and its target audience.
  • References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software been confirmed.
  • Performance: Any performance claims of the software been confirmed.
  • Automated tests: Unit tests cover essential functions of the package
    and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

  • The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing: 6


Review Comments

Great submission, @laurajanegraham! I'm always on the lookout for packages that make spatial data munging more efficient - this is a nice one to add to the toolkit.

I've had a quick look through @mbjoseph's review and I second his suggestions regarding additional unit tests.

Installation

  • I initially had some trouble getting the key functions to run. This was first due to an outdated version of dplyr (0.7.8; error: object 'group_map' not found ... dplyr 0.8.0.1 rectified this), and then due to an outdated version of R (3.4.2; invalid class "Extent" object: TRUE ... all tests ran succeeded after updating to R 3.5.3). It might be worth stipulating minimum versions for these in DESCRIPTION. (Note that the old R issue is was also reported here.)

README

  • Nice but a little brief. It might be worth stating that winmove can be used on its own, without subsequently aggregating. I'd also like to see more examples, e.g. using rasters as the target grids. It would also benefit from a clearer description of the process involved in moving window aggregation. In particular, it's not clear that what is returned is the average of smoothed pixel values within the larger extents.
  • I was a bit confused by the schematic, until I realised that the leftmost grid represents one of the 16 grid cells shown in the middle grid.. I'm not sure how that could be more effectively portrayed.
  • For people unfamiliar with construction of sf polygon grids, it'd be useful to include an example in the readme (and in the function docs) that creates g_sf.

General comments

  • typos: Ecololgy, useage
  • I'd like to see more details about the available functions (i.e. things in calc_functions.R). The names given in the Details section (e.g., 'wm_shei') throw errors, and it's not immediately clear that the wm_ prefix should be dropped by the user.
  • It's unclear how you intend for wm_classes to be used with winmove.
  • Might be an idea to provide a reference or formula for Shannon's evenness.

create_torus()

  • I think that the output extent should be bsaed on the input raster. Currently, the extent is reset to c(0, ncol(data_pad) * resolution, 0, nrow(dat_pad) * resolution). It would make more sense to use extent(dat) + 2*r so that this propagates nicely to other functions in the smoothing/aggregation pipeline (and also because create_torus will be useful independent of the pipeline).

  • Documentation:

    • r: "The radius by which to pad the raster" - this should stipulate the units (i.e., number of cells). Actually I don't think "radius" is the best word choice here.
    • Define torus e.g. with "wrapping"/"tiling" - it's unclear without digging into the source, and the torus approach to padding may not always be appropriate so it should be made explicit. I had initially expected that "Pad a raster by a specified radius" meant it would be padded with NA, as per raster::extend). Though maybe the approach you're using is also what raster::focal does (where it's also not clearly described)?

winmove_agg()

  • is grid_buffer <- sf::st_geometry(grid_buffer) necessary?
  • I'm not sure whether this potentially a problem or not.. the distance, d, is used to define the moving window, but also to buffer the extent that is being considered (i.e. to buffer grid_cell). The latter is necessary to prevent edge effects of blocked moving window analysis, but I think the resulting raster (winmove_cellr) should then be cropped back to the extent of grid_cell prior to calculating the mean over the cells. Otherwise, raster cells 2d away from the grid cell might(?) contribute to the aggregate mean of that grid cell. I think this may in fact not happen, since the cells at the margins are NA after winmove, at least in one test I ran.
  • It's probably worth letting users define the function used for aggregation of smoothed cell values, and not just the moving window summary. Currently the function enforces calculation of the cells' mean. In contrast, the nomove_agg function uses the user-specified function for aggregation. This inconsistency might be confusing, and since it's trivial to allow users to specify the aggregation function as well, I think it's worth doing.
  • g cannot be a SpatialPolygons object - if there's no harm in allowing that, I would do so (checkmate::check_class(g, "SpatialPolygons")).
  • winmove_agg fails if g is a RasterLayer filled with NA, e.g.:
g <- raster(matrix(NA, ncol=2, nrow=2), xmn=0, xmx=8, ymn=0, ymx=8)
r <- raster(matrix(1:64, ncol=8), xmn=0, xmx=8, ymn=0, ymx=8)
winmove_agg(g, r, 2, 'rectangle', 'mean')
## Error in UseMethod("st_geometry") : 
##  no applicable method for 'st_geometry' applied to an object of class "NULL"
  • Should winmove_agg return a raster object by default if the target grid is a raster object? It currently returns a numeric vector. This is inconsistent with nomove_agg, which returns a raster if input is a raster.
  • If the destination grid is a regular grid aligned with the initial raster (though alignment may be unecessary since raster::crop snaps to nearest cell by default), then it's much faster in some cases to use a raster::focal and raster::aggregate workflow. It might be worth testing for this and using that workflow where appropriate. For example:
# small raster:
r <- raster(matrix(runif(200^2), 200), xmn=0, xmx=200, ymn=0, ymx=200)
g <- raster(matrix(1, nc=40, nr=40), xmn=0, xmx=200, ymn=0, ymx=200)
system.time(d1 <- winmove_agg(g, r, 5, "rectangle", "mean"))
##   user  system elapsed 
##  26.97    0.01   27.16 

system.time({
  rf <- focal(r, raster::focalWeight(r, 5, 'rectangle'))
  d2 <- raster::aggregate(rf, 5, mean)
})
##   user  system elapsed 
##   0.02    0.00    0.02 

range(d1-d2[], na.rm=T)
## [1] -2.220446e-16  2.220446e-16

# scales well:
r <- raster(matrix(runif(2000^2), 2000), xmn=0, xmx=2000, ymn=0, ymx=2000)
system.time({
  rf <- focal(r, raster::focalWeight(r, 5, 'rectangle'))
  d2 <- raster::aggregate(rf, 5, mean)
})
##   user  system elapsed 
##   1.54    0.21    1.75
  • It would be nice if these functions worked on raster stacks as well.

  • Documentation:

    • docs list only a subset of the functions that exist in calc_functions.R ("grainchanger has several built-in functions. Functions currently included are: wm_shei, wm_prop, wm_classes, var_range"). This is missing wm_mean. I would also suggest not referring to the nm_ and wm_ prefixes in the docs. Users should be able to (and can) just use, e.g. "shei". I don't think they need to know the underlying function name, unless you intend for it to be used directly (which I doubt since it's not exported).
    • Function names could be more intuitive, e.g. "shannon" as opposed to "shei".
    • d: "the size of the matrix returned" - I know that this relates to raster::focalWeights, but users might be confused about which matrix you refer to.
    • fun: use \code{} to mark up code.

winmove()

  • is grid_buffer <- sf::st_geometry(grid_buffer) necessary?

  • w <- ifelse(w > 0, 1, NA) in winmove can break expected behaviour when type='Gauss', for example if fun=max. E.g., compare winmove(cont_ls, 5, 'Gauss', 'max') to raster::focal(cont_ls, raster::focalWeight(cont_ls, 5, 'Gauss'), max).

  • I think it would be nice to accomodate anonymous functions in winmove. E.g.: winmove(cont_ls, 5, 'rectangle', function(x, ...) diff(range(x))). This applies to nomove_agg as well.

  • Documentation:

    • "The shape of the moving window" is unclear. What options can this take? Specify in Details or in the argument description. See ?raster::focalWeight, for example.

nomove_agg()

  • nomove_agg uses as.vector() while winmove_agg uses raster::values(). The latter is perhaps slightly more performant, though the difference is trivial. But you could change for consistency.

  • when g is a raster layer, it's only used to determine the aggregation factor. However, users might expect that if the layers are offset, the aggregation will be done within the offset, target cells. This isn't the case.

  • In nm_shei, I believe p <- sum(dat == i) / raster::ncell(dat) needs a na.rm=TRUE, and since dat is not a raster object in that scope, you could replace the ncell call with length. If you want to ignore NA cells from the number of cells (I'm not sure which is more appropriate), then you can replace the whole line with p <- mean(dat==i, na.rm=TRUE).

  • In var_range, can replace if (sum(is.na(dat)) == length(dat)) with if (all(is.na(dat)))

  • Documentation:

    • Use \code{} in docs (fun arg description)
    • I think the final example in nomove_agg (d <- nomove_agg(g_sf, cont_ls, "nm_shei", lc_class = 0:3)) should use cat_ls not cont_ls.

Tests and checks

  • devtools::test() - all passed
  • devtools::check() - no errors, warnings, or notes

sessionInfo

## R version 3.5.3 (2019-03-11)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows >= 8 x64 (build 9200)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                      
## [5] LC_TIME=English_Australia.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] purrr_0.3.2             grainchanger_0.1.0.9000 testthat_2.1.1          raster_2.8-19           sp_1.3-1               
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.1         compiler_3.5.3     class_7.3-15       prettyunits_1.0.2  remotes_2.0.4      tools_3.5.3        digest_0.6.18      pkgbuild_1.0.3    
##  [9] pkgload_1.0.2      checkmate_1.9.1    memoise_1.1.0      lattice_0.20-38    rlang_0.3.4        DBI_1.0.0          cli_1.1.0          rstudioapi_0.10   
## [17] commonmark_1.7     parallel_3.5.3     yaml_2.2.0         xopen_1.0.0        e1071_1.7-1        furrr_0.1.0        withr_2.1.2        stringr_1.4.0     
## [25] roxygen2_6.1.1     xml2_1.2.0         globals_0.12.4     desc_1.2.0         fs_1.2.7           devtools_2.0.2     classInt_0.3-3     rprojroot_1.3-2   
## [33] grid_3.5.3         glue_1.3.1         listenv_0.7.0      sf_0.7-4           R6_2.4.0           processx_3.3.0     rcmdcheck_1.3.2    sessioninfo_1.1.1 
## [41] callr_3.2.0        magrittr_1.5       units_0.6-2        backports_1.1.4    codetools_0.2-16   ps_1.3.0           usethis_1.5.0      assertthat_0.2.1  
## [49] future_1.12.0      KernSmooth_2.23-15 stringi_1.4.3      crayon_1.3.4      
@johnbaums

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2019

Assigning @johnbaums as reviewer 2! Please get your review in by April 22. 🙏

Apologies for the slow review, and thank you for your patience

@laurajanegraham

This comment has been minimized.

Copy link
Author

commented May 1, 2019

Thank you @johnbaums and @mbjoseph for your thorough and thoughtful reviews. I'll get working on them!

@laurajanegraham

This comment has been minimized.

Copy link
Author

commented Jun 1, 2019

Firstly, thanks to @mbjoseph and @johnbaums for really thorough and helpful reviews. They really allowed me to properly think through the package structure, and hopefully I've managed to improve it as a result. The package now passes goodpractice checks and spelling mistakes have been corrected.

As a general note, I've added additional functionality to the package - the coarse resolution data (to which the fine resolution data is being aggregated) no longer needs to be gridded. This means there is an additional option is_grid and I've added examples and tests to reflect this.

First, responses to @mbjoseph review:

README

A description in the README that outlines the similarities and differences to raster::focal(), and how the moving window functionality relates to spatial convolution would be helpful. It took me a while to wrap my head around what the package did based on the README, but a more explicit connection to other operations that users may know already could reduce the spin-up time for new users.

I've added text to the readme which provides a package overview. In particular:


The winmove function acts as a convenient wrapper to raster::focalWeight and raster::focal which takes advantage of optimised functions built into the grainchanger package.


It could be good to point out the target audience more directly in the README - i.e., some explicit statement that a user could identify with - landscape ecologists, or generally folks working with spatial data at different resolutions, etc.

Statement added to readme:


As landscape ecologists and macroecologists, we often need to aggregate data in order to harmonise datasets. In doing so, we often lose a lot of information about the spatial structure and environmental heterogeneity of data measured at finer resolution. An example of this is when the response data (e.g. species' atlas data) are available at a coarser resolution to the predictor data (e.g. land-use data). We developed this method and R package in order to overcome some of these issues.


The three panel figure in the README (mwda_schematic.png) I thought would be helpful to wrap my head around what's happening, but there were a few things that I found confusing:

The terms "response" and "predictor" along with f(x) in the first panel might suggest some kind of model, but these aren't explained elsewhere.

It would be nice to have an explanation for why the number of grid cells increases in the second figure. Was a torus created?

An explanation of what the thick black squares represent would be useful.

Apologies, I just stuck this figure in from a paper without any explanation! I've moved it into a separate vignette Background & Motivation and given it a caption, which hopefully answers all the above points:


The above is a graphical representation of the MWDA method. In calculating the MWDA measure, three aspects of scale are considered. Predictor grain is the characteristic spatial scale of a predictor variable, that is, the resolution of the environmental data; scale‐of‐effect determines the appropriate scale of the relationship between predictor and response, for example, an ecological neighbourhood; response grain is the grain of the unit into which you are predicting, that is, the resolution of the response variable (represented by the black lines). Note that the colour scale is unitless. Yellow cells represent ‘high’ values and dark blue cells ‘low’ values. Panel 1 shows a close up of one of the response grain cells in panel 2, whereas panel 2 shows all response grain cells for the study region. Panel 3 shows the study region after aggregation. From Graham et al. 2019.


The show_landscape function applied to a categorical raster might be improved by using a discrete colormap (via discrete = TRUE). I was confused briefly by the current example, where the colorbar implies continuity.

Done

In the README there seem to be some places where "surface" and "landscape" are used interchangeably. Would using one or the other, (or "raster") generalize the language a bit?

Done - changed all to raster

Implementation and interface

Some of the function arguments are very concise, possibly at the expense of clarity. For example, the parameter g represents a grid, and the parameter dat represents a raster. Some more expressive argument names might be useful here (e.g., maybe grd and ras). This is definitely tricky because you might not want to use grid and raster, but something to think about.

I've changed the parameter names to fine_dat, coarse_dat, win_fun, and agg_fun which I think better explains what they represent.

The na.rm argument to nm_prop appears to be unused - is that intentional?

Unintentional, the new structure of the calc_functions sorts this.

Documentation

An example of how to use a custom function for moving window and direct aggregation might be useful for users who want to extend or modify the current functionality.

Good point. I've now provided this in the vignette Built-in functions

In the help file for winmove_agg, it might be useful to elaborate on the description of the type argument to list the valid values (I can tell that these are at least "circle", "Gauss", and "rectangle", but maybe there are more?).

Done

Great to have the code of conduct, but it would be even better to also have contributing guidelines (either in the README or in a separate file) to outline your preferred way of having others contribute to the package.

Contributions section added to the readme


We welcome contributions to this package. To contribute, submit a pull request making sure develop is the destination branch on the grainchanger repository.


You could add a CodeMeta file for the package as described in the rOpenSci dev guide: https://ropensci.github.io/dev_guide/building.html#creating-metadata-for-your-package

Done

Missing a closing parenthesis on the Description field for var_range in the table of functions in the getting started vignette: https://laurajanegraham.github.io/grainchanger/articles/getting-started.html#functions

Fixed

Testing

Building out some unit tests for winmove(...fun = 'prop'), winmove() with user defined functions, and the uncovered functions in R/calc_functions.R would be a nice addition.

Included user-defined functions in the testing, and testing now 99.5% coverage

For the winmove tests, it might be a good idea to use some different values of d - currently all of the tests use d = 5, which is great for that case, but as a consequence the tests might not reveal whether future changes break the functionality that enable different values of d to alter the function's behavior.

Sorted. Have also varied the inputs and moving window shape

If possible, it would be nice to have testing on Windows via AppVeyor.

Done

Responses to @johnbaums review:

Installation

I initially had some trouble getting the key functions to run. This was first due to an outdated version of dplyr (0.7.8; error: object 'group_map' not found ... dplyr 0.8.0.1 rectified this), and then due to an outdated version of R (3.4.2; invalid class "Extent" object: TRUE ... all tests ran succeeded after updating to R 3.5.3). It might be worth stipulating minimum versions for these in DESCRIPTION. (Note that the old R issue is was also reported here.)

I've updated the minimum versions in DESCRIPTION

README

Nice but a little brief. It might be worth stating that winmove can be used on its own, without subsequently aggregating.

I've updated the readme a bit with information about who may want to use it:


As landscape ecologists and macroecologists, we often need to aggregate data in order to harmonise datasets. In doing so, we often lose a lot of information about the spatial structure and environmental heterogeneity of data measured at finer resolution. An example of this is when the response data (e.g. species’ atlas data) are available at a coarser resolution to the predictor data (e.g. land-use data). We developed this method and R package in order to overcome some of these issues.


An overview of the package - which includes that winmove can be used on its own:


The primary functions of the grainchanger package are those which facilitate moving-window (winmove_agg) and direct (nomove_agg) data aggregation. These functions aggregate fine-grain data (fine_dat) to a coarse-grain (coarse_dat) using a function specified by the user (agg_fun). The moving-window method takes in an additional function (win_fun) which smooths the fine-grain data prior to aggregation.

The moving-window smoothing function is also available in the package (winmove), as well as several built-in functions, and an additional utility function for use with simulated landsacpes (create_torus).

The winmove function acts as a convenient wrapper to raster::focalWeight and raster::focal which takes advantage of optimised functions built into the grainchanger package.


I'd also like to see more examples, e.g. using rasters as the target grids. It would also benefit from a clearer description of the process involved in moving window aggregation. In particular, it's not clear that what is returned is the average of smoothed pixel values within the larger extents.

I've included a raster as a target grid in one of the examples. I've also better explained the output:


The below example shows the moving-window data aggregation in action. It aggregates a categorical raster (fine_dat) to a grid using Shannon evenness (specified by win_fun) as the function calculated within a square moving window of 5 units. The value returned is the mean (specified by agg_fun) of the smoothed value for each cell of coarse_dat. This value is included as a column on the grid sf object.


I was a bit confused by the schematic, until I realised that the leftmost grid represents one of the 16 grid cells shown in the middle grid.. I'm not sure how that could be more effectively portrayed.

Apologies, I just stuck this figure in from a paper without any explanation! I've moved it into a separate vignette Background & Motivation and given it a caption, which hopefully answers all the above points:


The above is a graphical representation of the MWDA method. In calculating the MWDA measure, three aspects of scale are considered. Predictor grain is the characteristic spatial scale of a predictor variable, that is, the resolution of the environmental data; scale‐of‐effect determines the appropriate scale of the relationship between predictor and response, for example, an ecological neighbourhood; response grain is the grain of the unit into which you are predicting, that is, the resolution of the response variable (represented by the black lines). Note that the colour scale is unitless. Yellow cells represent ‘high’ values and dark blue cells ‘low’ values. Panel 1 shows a close up of one of the response grain cells in panel 2, whereas panel 2 shows all response grain cells for the study region. Panel 3 shows the study region after aggregation. From Graham et al. 2019.


For people unfamiliar with construction of sf polygon grids, it'd be useful to include an example in the readme (and in the function docs) that creates g_sf.

I've now included this in the getting started vignette. There is one example which loads an sf object using sf::st_read, and another which creates one using the st_make_grid function from the sf package. #

General comments

typos: Ecololgy, useage

Fixed

I'd like to see more details about the available functions (i.e. things in calc_functions.R). The names given in the Details section (e.g., 'wm_shei') throw errors, and it's not immediately clear that the wm_ prefix should be dropped by the user.

I've changed the structure of the built in functions to take an object oriented approach. They are now fully documented, don't have the prefix, and there is a vignette (Built-in functions) which explains their usage.

It's unclear how you intend for wm_classes to be used with winmove.

I've removed this one for now - it may be built into a future update, but if it is, it will be fully documented in the same way as (e.g) shei and shdi.

Might be an idea to provide a reference or formula for Shannon's evenness.

This has been provided in the Built-in functions vignette and the help file for diversity-metrics.

create_torus()

I think that the output extent should be bsaed on the input raster. Currently, the extent is reset to c(0, ncol(data_pad) * resolution, 0, nrow(dat_pad) * resolution). It would make more sense to use extent(dat) + 2*r so that this propagates nicely to other functions in the smoothing/aggregation pipeline (and also because create_torus will be useful independent of the pipeline).

Agreed - I've updated the function to work in this way.

I toyed with removing this function from the package because it's not been used anywhere in the pipeline. The reason for its creation was for me to use with simulated landscapes for testing the properties of winmove and winmove_agg - which may be useful to other users. Would appreciate thoughts on this!

Documentation:

r: "The radius by which to pad the raster" - this should stipulate the units (i.e., number of cells). Actually I don't think "radius" is the best word choice here.

Agreed. Updated


The amount by which to pad the raster (in the same units as the raster)


Define torus e.g. with "wrapping"/"tiling" - it's unclear without digging into the source, and the torus approach to padding may not always be appropriate so it should be made explicit. I had initially expected that "Pad a raster by a specified radius" meant it would be padded with NA, as per raster::extend). Though maybe the approach you're using is also what raster::focal does (where it's also not clearly described)?

Added a description to the details:


A torus is an infinite surface where the top joins the bottom, and the left side meets the right side. See https://en.wikipedia.org/wiki/Torus for a full mathematical description.

In this function, the torus effect is achieved by adding the specified number of rows of the top of the raster to the bottom (and vice versa) and the specified number of rows of the right of the raster to the left (and vice versa)


winmove_agg()

is grid_buffer <- sf::st_geometry(grid_buffer) necessary?

Yes, and I've not a clue why. If I run the code outside of the future_map it runs fine without, but within it, it needs this to work... Sorry, not the best answer!!

I'm not sure whether this potentially a problem or not.. the distance, d, is used to define the moving window, but also to buffer the extent that is being considered (i.e. to buffer grid_cell). The latter is necessary to prevent edge effects of blocked moving window analysis, but I think the resulting raster (winmove_cellr) should then be cropped back to the extent of grid_cell prior to calculating the mean over the cells. Otherwise, raster cells 2d away from the grid cell might(?) contribute to the aggregate mean of that grid cell. I think this may in fact not happen, since the cells at the margins are NA after winmove, at least in one test I ran.

Yes, all the margins will be NA, so it doesn't need cropping back to the extent.

It's probably worth letting users define the function used for aggregation of smoothed cell values, and not just the moving window summary. Currently the function enforces calculation of the cells' mean. In contrast, the nomove_agg function uses the user-specified function for aggregation. This inconsistency might be confusing, and since it's trivial to allow users to specify the aggregation function as well, I think it's worth doing.

Agreed - I've updated this to be an additional argument (agg_fun) with the default set to mean.

g cannot be a SpatialPolygons object - if there's no harm in allowing that, I would do so (checkmate::check_class(g, "SpatialPolygons")).

Updated to allow this.

winmove_agg fails if g is a RasterLayer filled with NA

Fixed.

Should winmove_agg return a raster object by default if the target grid is a raster object? It currently returns a numeric vector. This is inconsistent with nomove_agg, which returns a raster if input is a raster.

Good point, I've updated the code to output a raster if the input is a raster.

If the destination grid is a regular grid aligned with the initial raster (though alignment may be unecessary since raster::crop snaps to nearest cell by default), then it's much faster in some cases to use a raster::focal and raster::aggregate workflow. It might be worth testing for this and using that workflow where appropriate.

Agreed, but these methods take advantage of parallel processing (which I've now included in the documentation for winmove_agg and nomove_agg as well as in the getting started vignette). For now I have left the code as is, but this is something worth considering for a future update.

It would be nice if these functions worked on raster stacks as well.

Agreed. I think that perhaps this is something for the next update as it requires a bit of thinking as to how to approach it (e.g. the focal function on which this is built only takes in RasterLayer as input).

Documentation:

docs list only a subset of the functions that exist in calc_functions.R ("grainchanger has several built-in functions. Functions currently included are: wm_shei, wm_prop, wm_classes, var_range"). This is missing wm_mean. I would also suggest not referring to the nm_ and wm_ prefixes in the docs. Users should be able to (and can) just use, e.g. "shei". I don't think they need to know the underlying function name, unless you intend for it to be used directly (which I doubt since it's not exported).

I've updated the way that the functions work and all are properly documented now.

Function names could be more intuitive, e.g. "shannon" as opposed to "shei".

I've gone with this because it is how it is referred to in the fragstats manual. As I build in more of the functions from there I will keep the consistency of using these codes.

d: "the size of the matrix returned" - I know that this relates to raster::focalWeights, but users might be confused about which matrix you refer to.

Gauss has been removed as an option - see below.

fun: use \code{} to mark up code.

Fixed

winmove()

is grid_buffer <- sf::st_geometry(grid_buffer) necessary?

See above

w <- ifelse(w > 0, 1, NA) in winmove can break expected behaviour when type='Gauss', for example if fun=max. E.g., compare winmove(cont_ls, 5, 'Gauss', 'max') to raster::focal(cont_ls, raster::focalWeight(cont_ls, 5, 'Gauss'), max).

Ah! Good spot, I completely missed this (as I don't use the Gaussian filter). I've removed Gauss for now, because I can't think of a straightforward way to include it in the workflow. Additionally, the primary focus of this package is on the aggregation, and some simulations I did a while ago showed that when aggregating smoothed measures, there is little difference between using circle and Gauss (the problems pointed out above weren't present as this was using focal).

I think it would be nice to accomodate anonymous functions in winmove. E.g.: winmove(cont_ls, 5, 'rectangle', function(x, ...) diff(range(x))). This applies to nomove_agg as well.

This is now possible (functions are no longer input as strings).

Documentation:

"The shape of the moving window" is unclear. What options can this take? Specify in Details or in the argument description. See ?raster::focalWeight, for example.

Done

nomove_agg uses as.vector() while winmove_agg uses raster::values(). The latter is perhaps slightly more performant, though the difference is trivial. But you could change for consistency.

Both now use raster::values

when g is a raster layer, it's only used to determine the aggregation factor. However, users might expect that if the layers are offset, the aggregation will be done within the offset, target cells. This isn't the case.

This is now possible

In nm_shei, I believe p <- sum(dat == i) / raster::ncell(dat) needs a na.rm=TRUE, and since dat is not a raster object in that scope, you could replace the ncell call with length. If you want to ignore NA cells from the number of cells (I'm not sure which is more appropriate), then you can replace the whole line with p <- mean(dat==i, na.rm=TRUE).

I've changed this to be sum(x %in% i) / length(x) to be in line with the winmove version, and to take into account the comment about the input not being a raster.

In var_range, can replace if (sum(is.na(dat)) == length(dat)) with if (all(is.na(dat)))

Done

Documentation:

Use \code{} in docs (fun arg description)

Done

I think the final example in nomove_agg (d <- nomove_agg(g_sf, cont_ls, "nm_shei", lc_class = 0:3)) should use cat_ls not cont_ls.

Examples have changed.

@karthik

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

Thank you so much Laura for this really comprehensive and exhaustive response to the reviewer's comments! 💯
I've never been such an impressive response here or elsewhere. Kudos.

@mbjoseph and @johnbaums Thanks very much for your time and expertise on this review. When you get a chance could you look over Laura's responses and changes and either check off the final item on your respective checklists (recommending approval) or raise any additional questions? 🙏

@laurajanegraham

This comment has been minimized.

Copy link
Author

commented Jun 4, 2019

Aw, thanks @karthik 😊

@mbjoseph

This comment has been minimized.

Copy link
Member

commented Jun 5, 2019

@laurajanegraham nice work on this! Thanks for your thoughtful revisions and the well-laid out responses to the reviews. I'm satisfied with the changes.

@johnbaums

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2019

Awesome, @laurajanegraham - revisions look great!

I noticed one typo:

an additional utility function for use with simulated landsacpes

Regarding create_torus() - I think it's fine to leave it in, but maybe clarify in the docs that it's only really intended to be used with simulated data. Docs currently state "Allows for moving window analysis that avoids edge effects (e.g. on simulated landscapes)", which might lead users to apply it to real world data as well in order to avoid edge effects.

@laurajanegraham

This comment has been minimized.

Copy link
Author

commented Jun 11, 2019

Thanks @mbjoseph and @johnbaums . I've corrected the typo and made clearer in the docs that create_torus() is for simulated landscapes.

@laurajanegraham

This comment has been minimized.

Copy link
Author

commented Jun 12, 2019

Oops. I've run the spellcheck now and fixed this (and a couple of others). Thanks all.

@karthik

This comment has been minimized.

Copy link
Member

commented Jun 12, 2019

Congrats @laurajanegraham, your submission has been approved! 🎉 Thank you for submitting and @mbjoseph and @johnbaums for thorough and timely reviews. To-dos:

  • Transfer the repo to the rOpenSci organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.

  • Add the rOpenSci footer to the bottom of your README

[![ropensci_footer](https://ropensci.org/public_images/ropensci_footer.png)](https://ropensci.org)

  • Fix any links in badges for CI and coverage to point to the ropensci URL. (We'll turn on the services on our end as needed)

Welcome aboard! 🎉

@karthik karthik added the 6/approved label Jun 12, 2019

@karthik

This comment has been minimized.

Copy link
Member

commented Jun 12, 2019

@laurajanegraham I would also encourage you to archive this version of the accepted package on Zenodo and add the DOI to your readme.

@karthik karthik added the package label Jun 12, 2019

@laurajanegraham

This comment has been minimized.

Copy link
Author

commented Jun 12, 2019

Great news, thank you! I've done most of these things. I've not been able to get the Zenodo DOI (but I drafted the release from the repo after moving to ropensci, which may be the issue).

@karthik karthik closed this Aug 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.