Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rsi: Efficiently Retrieve and Process Satellite Imagery #636

Closed
14 of 29 tasks
mikemahoney218 opened this issue Mar 30, 2024 · 61 comments
Closed
14 of 29 tasks

rsi: Efficiently Retrieve and Process Satellite Imagery #636

mikemahoney218 opened this issue Mar 30, 2024 · 61 comments

Comments

@mikemahoney218
Copy link
Member

mikemahoney218 commented Mar 30, 2024

Date accepted: 2024-10-01
Submitting Author Name: Mike Mahoney
Submitting Author Github Handle: @mikemahoney218
Repository: https://github.com/Permian-Global-Research/rsi/
Version submitted:
Submission type: Standard
Editor: @jhollist
Reviewers: @mdsumner, @OldLipe

Archive: TBD
Version accepted: TBD
Language: en


  • Paste the full DESCRIPTION file inside a code block below:
Package: rsi
Title: Efficiently Retrieve and Process Satellite Imagery
Version: 0.2.0.9000
Authors@R: c(
    person("Michael", "Mahoney", , "mike.mahoney.218@gmail.com", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0003-2402-304X")),
    person("Permian Global", role = c("cph", "fnd"))
  )
Description: Downloads spatial data from spatiotemporal asset catalogs 
    ('STAC'), computes standard spectral indices from the Awesome Spectral 
    Indices project (Montero et al. (2023) <doi:10.1038/s41597-023-02096-0>) 
    against raster data, and glues the outputs together into predictor bricks. 
    Methods focus on interoperability with the broader spatial ecosystem; 
    function arguments and outputs use classes from 'sf' and 'terra', and data 
    downloading functions support complex 'CQL2' queries using 'rstac'.
License: Apache License (>= 2)
Depends: 
    R (>= 4.0)
Imports: 
    future.apply,
    glue,
    httr,
    jsonlite,
    lifecycle,
    proceduralnames,
    rlang,
    rstac,
    sf,
    terra,
    tibble
Suggests: 
    curl,
    knitr,
    progressr,
    rmarkdown,
    testthat (>= 3.0.0),
    withr
Config/testthat/edition: 3
Config/testthat/parallel: true
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.0
URL: https://github.com/Permian-Global-Research/rsi, https://permian-global-research.github.io/rsi/
BugReports: https://github.com/Permian-Global-Research/rsi/issues
VignetteBuilder: knitr

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • data validation and testing
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):

This package supports (spatial) data retrieval from APIs implementing the OGC STAC API standard, processing the downloaded data (including automated masking, compositing and rescaling) computing spectral indices from those data, and wrangling the outputs into formats useful for modeling and visualization.

  • Who is the target audience and what are scientific applications of this package?

Anyone with a need to download and process spatial data, particularly remote sensing data, particularly satellite-based earth observation rasters. We've used rsi to automate the entire data preparation process of forest carbon and structure models (not yet published), but the package is broadly useful to anyone working in Earth surface modeling.

Yes:

  • rstac is a fantastic package for querying and downloading data from STAC APIs. rstac does not implement the other elements of rsi (compositing, masking, rescaling, computing spectral indices, wrangling the outputs), and rsi provides a "higher level" method for downloading data from STAC APIs (powered by lower-level rstac functions); rsi also uses a faster download method.
  • gdalcubes and sits both provide higher-level approaches for accessing data from STAC APIs, organized around data cube models. There is more substantial overlap between these packages and rsi. The biggest difference is that rsi, very intentionally, does not provide a new data model. To quote the vignette:

A core difference between rsi and these packages is that rsi does not have a data model: rsi is focused entirely on finding the bits of data you want from remote endpoints, and getting those bits on your local machine for you to process with your normal spatial data tooling. There are no new classes in rsi (other than the band mapping objects), and the outputs of functions are local rasters. This is an approach that fits better in my head than the more abstract delayed computations in some other packages; at the same time, it’s possible that this approach can be less efficient, downloading more data at finer resolutions than is actually needed for a given task.

There are a few minor things that I think rsi does better than other approaches (we sign items right before each one is downloaded, for instance, whereas some other packages sign items before starting to download the entire set, meaning the signature can expire causing large downloads to fail), but this difference I think is mostly a matter of taste. I'm very familiar with local GDAL, terra, and sf, and so rsi tries to get users back to working with local GDAL, terra, and sf as fast as possible.

NA

  • If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

NA

  • Explain reasons for any pkgcheck items which your package is unable to pass.

I have never successfully gotten the CI item to be a check, and I have no idea why. I'm using the standard usethis functions to structure my CI, for what it's worth! Apparently this is a local-only issue.

I addressed failing covr in #636 (comment)

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@ropensci-review-bot
Copy link
Collaborator

Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help for help.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@mikemahoney218
Copy link
Member Author

Another quick note -- I will not be able to transfer this repository to rOpenSci if accepted. I had asked on Slack and was told by Yani that this was acceptable, though it's only mentioned in the book here. I want to flag this at the start, in case it turns out to be an issue!

@ropensci-review-bot
Copy link
Collaborator

Checks for rsi (v0.2.0.9000)

git hash: 694d2a5f

  • ✔️ Package is already on CRAN.
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✖️ Package coverage failed
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.
  • 👀 Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with 👀 may be optionally addressed.)

Package License: Apache License (>= 2)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 175
internal rsi 34
internal methods 3
internal stats 2
internal tools 2
imports rlang 10
imports terra 10
imports rstac 5
imports glue 4
imports future.apply 2
imports httr 1
imports sf 1
imports jsonlite NA
imports lifecycle NA
imports proceduralnames NA
imports tibble NA
suggests progressr 1
suggests curl NA
suggests knitr NA
suggests rmarkdown NA
suggests testthat NA
suggests withr NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

names (20), c (19), class (10), lapply (9), length (8), vapply (8), args (6), formals (6), list (6), mget (6), tryCatch (5), file.path (4), for (4), max (4), min (4), options (4), tempfile (4), ifelse (3), url (3), all (2), call (2), character (2), drop (2), eval (2), is.null (2), nrow (2), paste (2), paste0 (2), unlist (2), with (2), col (1), data.frame (1), dirname (1), get (1), grep (1), mapply (1), merge (1), ncol (1), numeric (1), readLines (1), replicate (1), seq_len (1), source (1), str2lang (1), suppressWarnings (1), t (1), tempdir (1), tolower (1), toupper (1), vector (1)

rsi

build_progressr (5), spectral_indices (3), extract_urls (2), remap_band_names (2), alos_palsar_mask_function (1), calc_scale_strings (1), calculate_indices (1), check_indices (1), check_type_and_length (1), default_query_function (1), download_web_indices (1), filter_bands (1), filter_platforms (1), get_alos_palsar_imagery (1), get_dem (1), get_landsat_imagery (1), get_naip_imagery (1), get_rescaling_formula (1), get_sentinel1_imagery (1), get_sentinel2_imagery (1), get_stac_data (1), is_pc (1), landsat_mask_function (1), maybe_sign_items (1), set_gdalwarp_extent (1), spectral_indices_url (1)

rlang

arg_match (4), caller_env (2), warn (2), exec (1), new_environment (1)

terra

rast (5), sprc (2), crs (1), nlyr (1), predict (1)

rstac

assets_url (2), items_datetime (2), stac_search (1)

glue

glue (4)

methods

is (3)

future.apply

future_lapply (2)

stats

predict (1), setNames (1)

tools

file_ext (1), R_user_dir (1)

httr

user_agent (1)

progressr

progressor (1)

sf

st_bbox (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 15 files) and
  • 1 authors
  • 3 vignettes
  • 5 internal data files
  • 11 imported packages
  • 21 exported functions (median 19 lines of code)
  • 62 non-exported functions in R (median 15 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 15 73.0
files_vignettes 3 92.4
files_tests 9 89.6
loc_R 1437 77.1
loc_vignettes 453 75.7
loc_tests 853 84.6
num_vignettes 3 94.2
data_size_total 26424 76.4
data_size_median 4831 74.3
n_fns_r 83 71.4
n_fns_r_exported 21 68.8
n_fns_r_not_exported 62 73.1
n_fns_per_file_r 3 55.1
num_params_per_fn 7 85.3
loc_per_fn_r 15 46.1
loc_per_fn_r_exp 19 44.7
loc_per_fn_r_not_exp 15 49.5
rel_whitespace_R 10 63.3
rel_whitespace_vignettes 32 76.2
rel_whitespace_tests 13 73.5
doclines_per_fn_exp 51 64.2
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 36 59.4

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check.yaml

GitHub Workflow Results

id name conclusion sha run_number date
8486854009 Lock Threads success 694d2a 156 2024-03-30
8482543797 pages build and deployment success 1de384 65 2024-03-29
8482287996 pkgdown success 694d2a 135 2024-03-29
8482287992 R-CMD-check success 694d2a 131 2024-03-29
8482287990 R-CMD-check-hard success 694d2a 131 2024-03-29
8482287998 test-coverage success 694d2a 131 2024-03-29

3b. goodpractice results

R CMD check with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

ERROR: Test Coverage Failed

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
get_stac_data 44
stack_rasters 28
check_type_and_length 25

Static code analyses with lintr

lintr found the following 127 potential issues:

message number of times
Avoid library() and require() calls in packages 4
Lines should not be more than 80 characters. 123


4. Other Checks

Details of other checks (click to open)

✖️ The following 2 function names are duplicated in other packages:

    • calculate_indices from ClusterStability
    • sign_planetary_computer from rstac


Package Versions

package version
pkgstats 0.1.3.11
pkgcheck 0.1.2.21


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@mikemahoney218
Copy link
Member Author

Guessing covr fails due to not setting my custom is_covr environment variable:
Permian-Global-Research/rsi@c62e6e9

This environment variable is used to skip a test on my covr CI. The tl;dr is that rsi executes some code in a minimal environment to protect against malicious code downloaded from the internet, which prevents covr from injecting its tracking inside of that minimal environment. I still want the file to be tested (and the other pieces of the file to be counted in coverage), though, so I wrapped the local environment section in nocov and added this environment variable.

You can see my code coverage report at https://app.codecov.io/gh/Permian-Global-Research/rsi and my CI workflow for this at https://github.com/Permian-Global-Research/rsi/blob/main/.github/workflows/test-coverage.yaml

@ldecicco-USGS
Copy link

@ropensci-review-bot assign @jhollist as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @jhollist is now the editor

@jhollist
Copy link
Member

jhollist commented Apr 4, 2024

@mikemahoney218 Been swamped these last few days. I will work on digging through this today and tomorrow and get back to you soon and should hopefully be ready to start finding reviewers.

I am looking forward to this review. Does look like an interesting package!

@mikemahoney218
Copy link
Member Author

No worries, and thanks for the update!

@jhollist
Copy link
Member

@ropensci-review-bot check rsi

@ropensci-review-bot
Copy link
Collaborator

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@ropensci-review-bot help

@jhollist
Copy link
Member

@ropensci-review-bot check package

@ropensci-review-bot
Copy link
Collaborator

Thanks, about to send the query.

@ropensci-review-bot
Copy link
Collaborator

🚀

Editor check started

👋

@mikemahoney218
Copy link
Member Author

mikemahoney218 commented Apr 11, 2024

Just want to flag that I'm still expecting (your version of) covr to fail, due to #636 (comment)

The core issue is that calculate_indices() is basically intended to run code from a random site on the internet -- a trusted site, but still a security risk. As such, that downloaded code is run inside a minimal environment that prevents injecting any unexpected code or functions. Unfortunately, covr works by injecting unexpected functions into your source code, and then counting how many times its functions get run -- then calculate_indices() doesn't allow those to execute, causing the function to fail.

As a result, I toggle the tests that hit this code path using a custom is_covr variable, which isn't set by your covr check (because I just made it up, I don't know of a supported way to do this) and so your covr check fails.

I don't want to disable the whole .R file from covr, because covr can instrument the rest of the file, and I don't want to drop these tests (or make them off by default) because I'd like R CMD check to check this function. I've got a live coverage report running via GHA and hopefully viewable at:
https://app.codecov.io/gh/Permian-Global-Research/rsi?branch=main

@ropensci-review-bot
Copy link
Collaborator

Checks for rsi (v0.2.0.9000)

git hash: e71186f2

  • ✔️ Package is already on CRAN.
  • ✔️ has a 'codemeta.json' file.
  • ✔️ has a 'contributing' file.
  • ✔️ uses 'roxygen2'.
  • ✔️ 'DESCRIPTION' has a URL field.
  • ✔️ 'DESCRIPTION' has a BugReports field.
  • ✔️ Package has at least one HTML vignette
  • ✔️ All functions have examples.
  • ✔️ Package has continuous integration checks.
  • ✖️ Package coverage failed
  • ✔️ R CMD check found no errors.
  • ✔️ R CMD check found no warnings.
  • 👀 Function names are duplicated in other packages

Important: All failing checks above must be addressed prior to proceeding

(Checks marked with 👀 may be optionally addressed.)

Package License: Apache License (>= 2)


1. Package Dependencies

Details of Package Dependency Usage (click to open)

The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.

type package ncalls
internal base 175
internal rsi 34
internal methods 3
internal stats 2
internal tools 2
imports rlang 10
imports terra 10
imports rstac 5
imports glue 4
imports future.apply 2
imports httr 1
imports sf 1
imports jsonlite NA
imports lifecycle NA
imports proceduralnames NA
imports tibble NA
suggests progressr 1
suggests curl NA
suggests knitr NA
suggests rmarkdown NA
suggests testthat NA
suggests withr NA
linking_to NA NA

Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table.

base

names (20), c (19), class (10), lapply (9), length (8), vapply (8), args (6), formals (6), list (6), mget (6), tryCatch (5), file.path (4), for (4), max (4), min (4), options (4), tempfile (4), ifelse (3), url (3), all (2), call (2), character (2), drop (2), eval (2), is.null (2), nrow (2), paste (2), paste0 (2), unlist (2), with (2), col (1), data.frame (1), dirname (1), get (1), grep (1), mapply (1), merge (1), ncol (1), numeric (1), readLines (1), replicate (1), seq_len (1), source (1), str2lang (1), suppressWarnings (1), t (1), tempdir (1), tolower (1), toupper (1), vector (1)

rsi

build_progressr (5), spectral_indices (3), extract_urls (2), remap_band_names (2), alos_palsar_mask_function (1), calc_scale_strings (1), calculate_indices (1), check_indices (1), check_type_and_length (1), default_query_function (1), download_web_indices (1), filter_bands (1), filter_platforms (1), get_alos_palsar_imagery (1), get_dem (1), get_landsat_imagery (1), get_naip_imagery (1), get_rescaling_formula (1), get_sentinel1_imagery (1), get_sentinel2_imagery (1), get_stac_data (1), is_pc (1), landsat_mask_function (1), maybe_sign_items (1), set_gdalwarp_extent (1), spectral_indices_url (1)

rlang

arg_match (4), caller_env (2), warn (2), exec (1), new_environment (1)

terra

rast (5), sprc (2), crs (1), nlyr (1), predict (1)

rstac

assets_url (2), items_datetime (2), stac_search (1)

glue

glue (4)

methods

is (3)

future.apply

future_lapply (2)

stats

predict (1), setNames (1)

tools

file_ext (1), R_user_dir (1)

httr

user_agent (1)

progressr

progressor (1)

sf

st_bbox (1)

NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately.


2. Statistical Properties

This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.

Details of statistical properties (click to open)

The package has:

  • code in R (100% in 15 files) and
  • 1 authors
  • 3 vignettes
  • 5 internal data files
  • 11 imported packages
  • 21 exported functions (median 19 lines of code)
  • 62 non-exported functions in R (median 15 lines of code)

Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
The following terminology is used:

  • loc = "Lines of Code"
  • fn = "function"
  • exp/not_exp = exported / not exported

All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the checks_to_markdown() function

The final measure (fn_call_network_size) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile.

measure value percentile noteworthy
files_R 15 73.0
files_vignettes 3 92.4
files_tests 9 89.6
loc_R 1437 77.1
loc_vignettes 499 78.0
loc_tests 853 84.6
num_vignettes 3 94.2
data_size_total 26424 76.4
data_size_median 4831 74.3
n_fns_r 83 71.4
n_fns_r_exported 21 68.8
n_fns_r_not_exported 62 73.1
n_fns_per_file_r 3 55.1
num_params_per_fn 7 85.3
loc_per_fn_r 15 46.1
loc_per_fn_r_exp 19 44.7
loc_per_fn_r_not_exp 15 49.5
rel_whitespace_R 10 63.3
rel_whitespace_vignettes 33 79.8
rel_whitespace_tests 13 73.5
doclines_per_fn_exp 51 64.2
doclines_per_fn_not_exp 0 0.0 TRUE
fn_call_network_size 36 59.4

2a. Network visualisation

Click to see the interactive network visualisation of calls between objects in package


3. goodpractice and other checks

Details of goodpractice checks (click to open)

3a. Continuous Integration Badges

R-CMD-check.yaml

GitHub Workflow Results

id name conclusion sha run_number date
8572262292 Commands skipped bc9e46 46 2024-04-05
8639486278 Lock Threads success e71186 168 2024-04-11
8575408691 pages build and deployment success f75519 69 2024-04-05
8575285598 pkgdown success e71186 140 2024-04-05
8575285595 R-CMD-check success e71186 135 2024-04-05
8575285599 R-CMD-check-hard success e71186 135 2024-04-05
8575285597 test-coverage success e71186 135 2024-04-05

3b. goodpractice results

R CMD check with rcmdcheck

rcmdcheck found no errors, warnings, or notes

Test coverage with covr

ERROR: Test Coverage Failed

Cyclocomplexity with cyclocomp

The following functions have cyclocomplexity >= 15:

function cyclocomplexity
get_stac_data 44
stack_rasters 28
check_type_and_length 25

Static code analyses with lintr

lintr found the following 127 potential issues:

message number of times
Avoid library() and require() calls in packages 4
Lines should not be more than 80 characters. 123


4. Other Checks

Details of other checks (click to open)

✖️ The following 2 function names are duplicated in other packages:

    • calculate_indices from ClusterStability
    • sign_planetary_computer from rstac


Package Versions

package version
pkgstats 0.1.3.11
pkgcheck 0.1.2.21


Editor-in-Chief Instructions:

Processing may not proceed until the items marked with ✖️ have been resolved.

@jhollist
Copy link
Member

@mikemahoney218 Thanks for the update. And as you expected the bot checks fail. I am not too worried about that given you have good coverage and can demonstrate it with the codecov reports. We will want to make sure that this coverage is reported out in the repositories README. You may already be doing that, I just haven't checked yet!

Stay tuned, I am working on this today and tomorrow and expect to move on to finding reviewers shortly after that!

@jhollist
Copy link
Member

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package. In particular,
    • Is the case for the package well made?
    • Is the reference index page clear (grouped by topic if necessary)?
    • Are vignettes readable, sufficiently detailed and not just perfunctory?
  • Fit: The package meets criteria for fit and overlap.
  • Installation instructions: Are installation instructions clear enough for human users?
  • Tests: If the package has some interactivity / HTTP / plot production etc. are the tests using state-of-the-art tooling?
  • Contributing information: Is the documentation for contribution clear enough e.g. tokens for tests, playgrounds?
  • License: The package has a CRAN or OSI accepted license.
  • Project management: Are the issue and PR trackers in a good shape, e.g. are there outstanding bugs, is it clear when feature requests are meant to be tackled?

Editor comments

I think we are ready to pass on to reviewers! Nice Job!

Only one very small request:

  • Add a link to the CONTRIBUTING.md file on the README.

I like to see a more upfront CONTRIBUTING. I always get lost trying to find them when embedded inside .github. A simple link to that file should suffice!

Also, I have no concerns about the tests failing on the bot. You have implemented them well, the coverage is good, and the badge makes it easy to find.

@jhollist
Copy link
Member

@ropensci-review-bot seeking reviewers

@ropensci-review-bot
Copy link
Collaborator

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/636_status.svg)](https://github.com/ropensci/software-review/issues/636)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

@mikemahoney218
Copy link
Member Author

Added the link to CONTRIBUTING!

@jhollist
Copy link
Member

@ropensci-review-bot assign @mdsumner as reviewer

@ropensci-review-bot
Copy link
Collaborator

@mdsumner added to the reviewers list. Review due date is 2024-05-09. Thanks @mdsumner for accepting to review! Please refer to our reviewer guide.

rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.

@ropensci-review-bot
Copy link
Collaborator

@mdsumner: If you haven't done so, please fill this form for us to update our reviewers records.

@jhollist
Copy link
Member

@mdsumner just pinging to see if you are finished with the review yet.

@mdsumner
Copy link

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

  • Briefly describe any working relationship you have (had) with the package authors.
  • As the reviewer I confirm that there are no conflicts of interest.

Documentation

The package includes all the following forms of documentation:

  • A statement of need: clearly stating problems the software is designed to solve and its target audience in README
  • Installation instructions: for the development version of package and any non-standard dependencies in README
  • Vignette(s): demonstrating major functionality that runs successfully locally
  • Function Documentation: for all exported functions
  • Examples: (that run successfully locally) for all exported functions
  • Community guidelines: including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

Functionality

  • Installation: Installation succeeds as documented.
  • Functionality: Any functional claims of the software have been confirmed.
  • Performance: Any performance claims of the software have been confirmed.
  • [ x Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

Estimated hours spent reviewing: 7

  • Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

I really like this package, I see a lot of familiar experiences and the result is a good and consistent set of functions for navigating this space. I haven't personally done anything with spectral indices, usually I expore imagery and how "it looks". I appreciate having all this ease-of-use tooling at hand, and I will be pointing colleagues directly to this package, and I hope to stay involved in at least small ways.

Please make mention somewhere that "where the compute runs" (i.e. what "local" means) is important here. There's an issue with performance in different regions - and some pointers to how one might run in a region closer to where the usual data sources are (us-west2 for example. I'm not suggesting that's an easy topic to cover, just would like to see it mentioned. It takes about 2x as long to run examples here in Tasmania, vs computers in us-west (I know that comparison is a lot more complex than this). It might be an idea to point to ways of running computers on public systems that run closer to the data, or at least a guide to that.

I'm a bit disappointed in our community in that packages in R have tended towards doing "everything", and (not a criticism of this package) here we are doing a lot of things: getting file sources from STAC (a json web query), configuring for authentication, downloading specific files (a curl task), mosaicing and standardizing imagery (a GDAL task). I like that the top level functions are decomposed in this package itself, and the documentation is clear as to what underlying functions are used, and I can understand why parts of "foundational" packages are used with a mix of underlying generic tools for this complex stack of work. I'm massively impressed with how much this package actually does, and exposes in a general way down to the level of templated types of functionality and details like the GDALWarp options.

rsi is a little bit like a datacube, but with exposure of the underlying components at key steps. I really like how the asset is a lightly classed object with other functions that it needs stored as attributes, and that that is how the remaining arguments to the getter function is structured, so it seamlessly inherits each level but also remains flexible to user-changes in that one call.

Questions I had that I could put more effort+examples into:

I'd like see some validations of the raster values obtained in some examples, if there was an external example that's documented elsewhere (in a python notebook, or another R package) it would be neat to have a clear comparison of the scale/s shown be these raster files obtained here against an independent example. (I had intended to do this myself ...)

I was looking for an example where I can make a true colour image with RGB, I don't understand the scaling that occurs in the examples (we have 0-1 scale numbers, which don't work with terra::plotRGB or its stretch argument off the shelf, and leave me unclear about the scaled ranges and whether I am plotting an RGB image correctly. The first example in the readme plots an RGB image as separate bands, and I'm unclear what to do to plot that as an "image". Again, I had meant to provide an actual example here. I believe the advice should be:

plotRGB(stretch(x))

## note that  this provides different defaults to stretch() itself
plotRGB(x, stretch = "lin")

but also, it's open to interpretation and expert use afaics.

Specific notes

Two of these three files have no data in the aoi region.

qfiles <- get_landsat_imagery(aoi, 
 start_date = "2022-06-01", end_date = "2022-06-30", 
composite_function = NULL)

I tried this naive thing, to run rsi_query_api, and I think at the least it should error fast when not getting a bbox in longlat.

rsi_query_api(  aoi,
+                 start_date = "2022-06-01",
+                 end_date = "2022-06-30",)
Error in rsi_query_api(aoi, start_date = "2022-06-01", end_date = "2022-06-30",  : 
  argument "stac_source" is missing, with no default
> rsi_query_api(  aoi, stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",
+                 start_date = "2022-06-01",
+                 end_date = "2022-06-30",)
Error in rsi_query_api(aoi, stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",  : 
  argument "collection" is missing, with no default
> rsi_query_api(  aoi, stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",
+                 start_date = "2022-06-01",
+                 end_date = "2022-06-30", collection = "sentinel2-c1-l2a")
Error in Ops.sfg(bbox[[2]], bbox[[4]]) : 
  operation > not supported for sfg objects
> rsi_query_api(  sf::st_bbox(aoi), stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",
+                 start_date = "2022-06-01",
+                 end_date = "2022-06-30", collection = "sentinel2-c1-l2a")
Error in rsi_query_api(sf::st_bbox(aoi), stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",  : 
  argument "limit" is missing, with no default
> rsi_query_api(  sf::st_bbox(aoi), stac_source = "https://planetarycomputer.microsoft.com/api/stac/v1/",
+                 start_date = "2022-06-01",
+                 end_date = "2022-06-30", collection = "sentinel2-c1-l2a", limit  = 100)

it could detect that the bbox input is not in longlat and 1) error or 2) just transform it.

It's often useful to buffer your aoi object slightly, on the order of 1-2 cell widths, in order to ensure that data is downloaded for your entire AOI even after accounting for any reprojection needed to compare your AOI to the data on the STAC server.

Here I would rather reproject the extent, this is not so hard to do and exists in a few places
raster::projectExtent, reproj::reproj_extent, and terra::project but possibly the best option is to use GDAL warp (via sf as elsewhere here) to reproject an empty raster. (Happy to follow up to illustrate).

This function can either download all data that intersects with your spatiotemporal AOI as multiple files (if composite_function = NULL), or can be used to rescale band values, apply a mask function, and create a composite from the resulting files in a single function call

On this point, GDAL warp can itself do these things, itself a monolith of abstractions itself but we can possibly avoid downloading entire scenes. I mention this not as a strong suggestion, mainly an invite to discuss further (I'm looking at the rise of {gdalraster} here).

It can be a good idea to tile your aoi using sf::st_make_grid()

I think this really needs an example, because there's room for guidance on creating a nice tile mosaic definition (with terra::align for example, or actually with st_tile or getTileExtents). It's very important point, and say what if we wanted a (CONUS) state-level imagery. This can be done a little more abstractly using terra::makeTiles and stars::st_tile (which give the index and extents helpfully but separately). (Maybe an assign-to-me task).

If you set the rsi_pc_key environment variable to your key (either primary or secondary; there is no difference), rsi will automatically use this key to sign all requests against Planetary Computer.

I only note this as a discussion-bounce, for possible followup: GDAL can do this too, and has file system abstrations that can copy, info, or warp-in-part to a target grid.

In the vignette I like when there's "one change", that's an excellent situation when you can tweak major changes with only one tiny plumbing modification.

This example (in the README) should save the file name as an output. It's otherwise a pipeline that I don't get an object for in the end, and it can take some time (in Australia).

projected_ashe <- sf::st_transform(ashe, 6542)
> get_landsat_imagery(
+     aoi = projected_ashe,
+     start_date = "2021-06-01",
+     end_date = "2021-06-30",
+     output_filename = tempfile(fileext = ".tif")
+ ) |>
+     terra::rast() |>
+     terra::plot()

I have some minor discomforts about when exactly warping or masking is done. I just want to talk about the details of this and might follow up, I'm a bit lost in the depths of the compositing function and use of sprc and mosaic, though.

A standalone question I had: can we use the aoi to drive the STAC query, but then download the files as-is without cropping/warping or compositing? I guess that would mean providing a link to the temp-file space being used.

Fin. Thanks so much for such a great package, and much apology for being so late here (especially appreciative of the patience of @mikemahoney218 and @jhollist).

@mikemahoney218
Copy link
Member Author

Thank you so much, @mdsumner ! I'm excited to dive into your review.

As I said to Jeff over email, I also haven't been a paragon of timeliness here -- I got started replying to Felipe three weeks ago, and then immediately lost my focus. I'm hoping to carve out time to respond to both reviews starting on Wednesday of this week!

@mikemahoney218
Copy link
Member Author

Response to @OldLipe

Thank you so, so much for your review here @OldLipe ! I feel like your comments have really improved the package. I'll walk through specific changes below, but first I wanted to ask about one remaining comment:

1- By default, the functions that download satellite images use random names for the images. I believe that using random names for satellite images could pose challenges for end users.

The issue I have with generating non-random names is that I feel users are in a better position than I am to know how to organize their files within a project. Users downloading multiple files are likely iterating through either collections, time ranges, or spatial areas of interest, and probably have a pre-existing idea of what distinguishes each file (and therefore would make for a good name). This is why, for instance, the documentation shows examples of providing your own output_filename values when you know how your downloads are being "chunked".

So the idea behind random filenames is that it's something that is good enough for fast proof-of-concept "does this function work" tests, but is clearly not good enough for "real" usage. I'm hoping to force users to come up with their own file name conventions, rather than accepting the default options. This is also why I do handle the auto-naming when composite_function = NULL, because I am very confident that I know what differentiates each file (the datetime) in that situation.

I'm curious what you think about this reasoning. If anything, I'd lean towards removing the default filenames altogether, but I do think they're useful for quick evaluation of the package.

1- [R]egarding the parameters gdalwarp_options and gdal_config_options [...] Could these GDAL values be stored in a configuration file?

I moved these into functions: Permian-Global-Research/rsi@5adacb2

1- In the example of the alos_palsar_mask_function(), two warnings appeared. Listed below:
[...]
Is this behavior expected? Additionally, in this example, wouldn't it be more appropriate to save the image in a temporary directory? From a user's perspective, these warnings might scare new users.

I couldn't reproduce the warning (though I've seen it before, I just don't understand what triggers it) but I updated the documentation with a gdalwarp option to silence the warning: Permian-Global-Research/rsi@b1a15ba

I also saved to a temporary file -- thanks! Because these examples don't run on CRAN, that one is a really easy thing to accidentally miss 😅 There were a few more of these here: Permian-Global-Research/rsi@fa7dc91

2- The example in the calculate_indices() function presents an error. I understand that you are demonstrating an example that does not work. Users sometimes enter the documentation, copy the code, execute it, and then test it. Instead of the error, perhaps providing more examples of how to use the function would be helpful.

I left the error in place, because I want to explain why this function intentionally doesn't let users use arbitrary functions. But I added more documentation around it: a comment right above the try() to emphasize that the error is expected, and then another block below it to show how to work around this design: Permian-Global-Research/rsi@e2ec137

3- The functions that access catalogs [...] share the same documentation. [...] [I]nclude new examples for other types of data, such as radar, since the names of the assets vary for each collection.

I added additional examples to this document, using multiple data-retrieving functions and showing how to work with band mapping objects: Permian-Global-Research/rsi@7b82f3d

* In the `sentinel2_mask_function()` function, I would like to understand why the value `2` (`shadows`) is kept as a non-masked value?

This was a straight-up mistake; thank you. Fixed here: Permian-Global-Research/rsi@f499c95

* Regarding Landsat, I believe there could be improvements in filtering the values to be retained. Additionally, the implementation could be easier if working with bit values rather than integers.

Thank you for the examples here; without them, I absolutely never would have figured out how bitmasks work. I added a masked_bits argument to the Landsat mask function, which can take vectors of bits to mask out. The include argument now works by setting these bits: Permian-Global-Research/rsi@f499c95

I find these filtering functions interesting, but I believe they may not be scalable. For instance, you provide two filtering functions: filter_platforms() and filter_bands(). What if a user wants to filter by cloud percentage within items? Or apply more complex filters to the properties of each item?

Discussed in #636 (comment)

Thank you again -- this was a massively helpful review. Let me know if I missed anything (or made anything worse by mistake 😄)

@jhollist
Copy link
Member

jhollist commented Aug 15, 2024

@mdsumner Thank you for your review and no worries on the timeline. We are very much appreciative (and understanding) of the time that all of our reviewers dedicate to rOpenSci! Keep an eye out for revisions and once those are in, use our review template (https://devguide.ropensci.org/approval2template.html) to indicate your approval of the revisions or if other changes are needed.

@OldLipe Thank you for your review as well! As you can see @mikemahoney218 has addressed your review. When you have a chance could you take a look at that and let us know if his revisions address your concerns or if you would like to see some additional changes. As mentioned above, please use the review template (https://devguide.ropensci.org/approval2template.html) for this.

For both of you, can you provide me with a rough estimate of hours spent on the review? This is something we keep track of.

Thank you all!

@mikemahoney218
Copy link
Member Author

Response to @mdsumner

Thank you so much, Mike! This was an incredibly useful review process. I've responded to your specific comments below:

Please make mention somewhere that "where the compute runs" (i.e. what "local" means) is important here. There's an issue with performance in different regions - and some pointers to how one might run in a region closer to where the usual data sources are (us-west2 for example. I'm not suggesting that's an easy topic to cover, just would like to see it mentioned. It takes about 2x as long to run examples here in Tasmania, vs computers in us-west (I know that comparison is a lot more complex than this). It might be an idea to point to ways of running computers on public systems that run closer to the data, or at least a guide to that.

I added a small mention of this to the README and a larger mention to the Downloading vignette: Permian-Global-Research/rsi@55ad0d9

I'd like see some validations of the raster values obtained in some examples, if there was an external example that's documented elsewhere (in a python notebook, or another R package) it would be neat to have a clear comparison of the scale/s shown be these raster files obtained here against an independent example. (I had intended to do this myself ...)

I'm curious if you have ideas on an efficient way to do this. I've tried to stub out tests to do this, but am running into issues that I wrote rsi to fix some of the rough edges I found when downloading data sets, which means I'm effectively re-implementing rsi's download functions in the test itself to try and square the circle. To give a more concrete example: I can download the assets of an item using rstac::assets_download(), but that takes quite some time as it requires downloading the entire tile. I can work around that by using GDAL to do a partial download, but then I need to start writing the options for gdalwarp... which starts becoming just the code inside rsi itself again. I'm having a hard time thinking of a clever way to not need to download entire images but also not just copying package internals to confirm that they agree with the package itself.

I was looking for an example where I can make a true colour image with RGB, I don't understand the scaling that occurs in the examples (we have 0-1 scale numbers, which don't work with terra::plotRGB or its stretch argument off the shelf, and leave me unclear about the scaled ranges and whether I am plotting an RGB image correctly. The first example in the readme plots an RGB image as separate bands, and I'm unclear what to do to plot that as an "image". Again, I had meant to provide an actual example here. I believe the advice should be:

plotRGB(stretch(x))

## note that  this provides different defaults to stretch() itself
plotRGB(x, stretch = "lin")

but also, it's open to interpretation and expert use afaics.

Added to the "How can I" vignette: Permian-Global-Research/rsi@528e31f

And to the README and the get_stac_data() examples: Permian-Global-Research/rsi@7c2c0cf

Specific notes

Two of these three files have no data in the aoi region.

qfiles <- get_landsat_imagery(aoi, 
 start_date = "2022-06-01", end_date = "2022-06-30", 
composite_function = NULL)

I tried this naive thing, to run rsi_query_api, and I think at the least it should error fast when not getting a bbox in longlat.

it could detect that the bbox input is not in longlat and 1) error or 2) just transform it.

Fixed in Permian-Global-Research/rsi@d0acb46 . The documentation was also just wrong here; this function needed a bbox, not an sfc object. I changed things so either works. You can tell this function was pulled out from get_stac_data() relatively late in the game 😅

It's often useful to buffer your aoi object slightly, on the order of 1-2 cell widths, in order to ensure that data is downloaded for your entire AOI even after accounting for any reprojection needed to compare your AOI to the data on the STAC server.

Here I would rather reproject the extent, this is not so hard to do and exists in a few places raster::projectExtent, reproj::reproj_extent, and terra::project but possibly the best option is to use GDAL warp (via sf as elsewhere here) to reproject an empty raster. (Happy to follow up to illustrate).

I'm not sure I entirely follow you here! Would you be able to give an example?

This function can either download all data that intersects with your spatiotemporal AOI as multiple files (if composite_function = NULL), or can be used to rescale band values, apply a mask function, and create a composite from the resulting files in a single function call

On this point, GDAL warp can itself do these things, itself a monolith of abstractions itself but we can possibly avoid downloading entire scenes. I mention this not as a strong suggestion, mainly an invite to discuss further (I'm looking at the rise of {gdalraster} here).

Unfortunately I don't think gdalwarp can handle complicated compositing yet: OSGeo/gdal#5176
And I haven't found any way to make it handle masking, either.

With regards to simple composites ("latest pixel wins" style), this is one of the messiest parts of get_stac_data(), but we actually do use gdalwarp directly to handle those. Specifically, these lines check if we can get away with using the warper to stamp a bunch of images together:

https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L366-L369

The output of that gets passed as merge to the download function:

https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L374-L383

Then this is where things get really silly: if merge == TRUE, we only create a single output file for each asset:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/download.R#L64-L76

And then we wind up calling this warp with multiple source URLs and only the single output path, meaning we warp all the files while downloading:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/download.R#L116-L123

This is actually a lot less complicated than it used to be -- there used to be an rsi_simple_download and an rsi_complex_download to handle the warpable/not-warpable downloads separately, which didn't share any code paths and as a result got quickly out of sync. Handling both via the same path makes things less readable, but means that all downloads flow down the same (heavily tested) pathway.

All that said, I documented this a bit more here:
Permian-Global-Research/rsi@0b294d3

With regards to rescaling via the warper -- I'm definitely interested in this, but I've seen some rather complex rescaling formulas in the wild that aren't just a simple scale and offset, which has made me a bit spooked. I think there's still some dark magic you can do by writing a VRT with a complex transform equation, but I don't know that I understand VRTs well enough to maintain a package that did that, right now.

It can be a good idea to tile your aoi using sf::st_make_grid()

I think this really needs an example, because there's room for guidance on creating a nice tile mosaic definition (with terra::align for example, or actually with st_tile or getTileExtents). It's very important point, and say what if we wanted a (CONUS) state-level imagery. This can be done a little more abstractly using terra::makeTiles and stars::st_tile (which give the index and extents helpfully but separately). (Maybe an assign-to-me task).

I added an example of using st_make_grid() here: Permian-Global-Research/rsi@f95714f

This example (in the README) should save the file name as an output. It's otherwise a pipeline that I don't get an object for in the end, and it can take some time (in Australia).

Fixed: Permian-Global-Research/rsi@280d513

I have some minor discomforts about when exactly warping or masking is done. I just want to talk about the details of this and might follow up, I'm a bit lost in the depths of the compositing function and use of sprc and mosaic, though.

Yeah, it's an easy function to get lost in. You can see my own notes here from the last time I was refactoring:

https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L240

The steps are usually querying, filtering down the returned results, (warping and) downloading the relevant items, masking them, compositing the outputs, and then rescaling.

The warping is controlled by the gdalwarp_options object, primarily. The first "live" code (not just parameter checking) is this bit here, which processes those options:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L295-L300

That processing function just sets the t_srs and tr options, if they weren't provided, to handle the actual warp:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L781-L798

Which then eventually gets passed to the actual download call:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/download.R#L116-L123

So that's warping and downloading handled: each asset is (usually) warped and downloaded separately.

Each asset then gets masked independently:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L388-L394

These are masked independently mostly to make the implementation easier, because now each asset is composited into a single file per asset:
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L399-L409

I didn't want to try and track if all assets existed in all items, or so on, and so we don't aggregate assets into items until after rescaling.

As for compositing: there's three pathways here. The first one I discussed above: if files didn't need to get masked or rescaled, we composited them during the download stage and they skip the composite process entirely.

The second one is if users specified "merge" but also wanted a mask or rescaling, in which case we're basically just calling terra::merge():
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L690-L697

The third is for all the other functions, which are applied using terra::mosaic():
https://github.com/Permian-Global-Research/rsi/blob/9e50b37c51bbc23100d52d7d4b7c91247ae61d08/R/get_stac_data.R#L699-L707

As far as I'm concerned, sprc() is just a container that can handle one-or-more possibly overlapping rasters (compared to rast(), which can't). We're using that to hold the unknown number of possibly-overlapping files associated with a single asset, then merging them using some terra function.

Hope this all made sense!

A standalone question I had: can we use the aoi to drive the STAC query, but then download the files as-is without cropping/warping or compositing? I guess that would mean providing a link to the temp-file space being used.

I added a section to the "How Can I" vignette about one version of this -- downloading each item separately:
Permian-Global-Research/rsi@528e31f

(To skip masking, you'd also set mask_function = NULL)

If you want to get each asset separately, that's probably where rstac::assets_download() becomes more useful -- or building a query with rstac and using GDAL to grab items yourself. I think at that point, you no longer want the additional abstraction rsi gives you, and it makes sense to go one level "lower" down the stack.

@mdsumner
Copy link

mdsumner commented Sep 17, 2024

Ok amazing, love these responses and the changes you've made. I think you'll need to link to this issue in the Details section, because it's a really great section on the concerns and the journey you've been on. (I'm getting more enmeshed in the python side via odc and so each time I come to rsi I have more perspective and learn a lot more).

There's no showstoppers now from my perspective, I think you've responded to this review brilliantly and I'm stoked with all the updates you've made, and the explanations. Please consider my take as 'approved'. @jhollist

I'm curious if you have ideas on an efficient way to do this.

I actually didn't mean automated testing validation, just like a real world example where we can get confirmation of the values we see in a small context. I will follow up when I can but I don't consider this a blocker or anything.

Also I need to follow up here (which I can't remember exactly now, I may have been thinking about a different part of the help content). When I explore again I will bring this up, outside this review as an issue/discussion piece.

I'm not sure I entirely follow you here! Would you be able to give an example?

Thanks!!

@jhollist
Copy link
Member

Looks like we are really close on this one!

@OldLipe do you feel like to @mikemahoney218 has addressed the issues raised in your review?

@mdsumner Thank you for the follow up and the approval! How many hours do you think you spent on the review?

You both can use this template for your response: https://devguide.ropensci.org/approval2template.html or you can just provide that directly as well!

@jhollist
Copy link
Member

@mdsumner and @OldLipe Just trying to finalize this. See #636 (comment)

@mdsumner
Copy link

7 hours was my tally 🙏

@jhollist
Copy link
Member

@ropensci-review-bot submit review #636 (comment) time 7

@ropensci-review-bot
Copy link
Collaborator

Logged review for mdsumner (hours: 7)

@jhollist
Copy link
Member

jhollist commented Oct 1, 2024

@ropensci-review-bot submit review #636 (comment) time 5

@ropensci-review-bot
Copy link
Collaborator

Logged review for OldLipe (hours: 5)

@jhollist
Copy link
Member

jhollist commented Oct 1, 2024

@OldLipe thanks for the email. I am recording your response and acceptance of @mikemahoney218 revisions here. No need for you to do it as well.

And I think we are all set to go. Thank you all for your efforts on this! Will work on moving this along later this AM.

@jhollist
Copy link
Member

jhollist commented Oct 1, 2024

@ropensci-review-bot approve rsi

@ropensci-review-bot
Copy link
Collaborator

Approved! Thanks @mikemahoney218 for submitting and @mdsumner, @OldLipe for your reviews! 😁

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You will need to enable two-factor authentication for your GitHub account.
    This invitation will expire after one week. If it happens write a comment @ropensci-review-bot invite me to ropensci/<package-name> which will re-send an invitation.
  • After transfer write a comment @ropensci-review-bot finalize transfer of <package-name> where <package-name> is the repo/package name. This will give you admin access back.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar, https://github.com/ropensci/foobar
  • Skim the docs of the pkgdown automatic deployment, in particular if your website needs MathJax.
  • Fix any links in badges for CI and coverage to point to the new repository URL.
  • Increment the package version to reflect the changes you made during review. In NEWS.md, add a heading for the new version and one bullet for each user-facing change, and each developer-facing change that you think is relevant.
  • We're starting to roll out software metadata files to all rOpenSci packages via the Codemeta initiative, see https://docs.ropensci.org/codemetar/ for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.
  • You can add this installation method to your package README install.packages("<package-name>", repos = "https://ropensci.r-universe.dev") thanks to R-universe.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent).

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @ropensci/blog-editors in your reply. They will get in touch about timing and can answer any questions.

We maintain an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding (with advice on releases, package marketing, GitHub grooming); the guide also feature CRAN gotchas. Please tell us what could be improved.

Last but not least, you can volunteer as a reviewer via filling a short form.

@mikemahoney218
Copy link
Member Author

Thank you so much @jhollist , @mdsumner , and @OldLipe ! This was a fantastic process (as usual with rOpenSci!).

@jhollist -- I mentioned at the start of the review, but I'm not able to transfer this repo to the rOpenSci namespace. I'm not sure which boxes on the checklist still apply -- and I've got the faintest memory that the package needs to get added to a registry somewhere, but I forget where that is. Does this make sense?

@maelle
Copy link
Member

maelle commented Oct 3, 2024

👋 here! ropensci/roregistry@e5d2c02 should be it but I'll be checking the package and registry building to be sure. 😸

@mpadge
Copy link
Member

mpadge commented Oct 3, 2024

Registry now updated as expected. All good!

@maelle
Copy link
Member

maelle commented Oct 3, 2024

https://docs.ropensci.org/rsi/ 🎉

@jhollist
Copy link
Member

jhollist commented Oct 3, 2024

@mpadge and @maelle Is there anything special that @mikemahoney218 needs to do for this since he won't be transferring the package to the rOpenSci org? Looks like things have moved along fine without that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants