scoringutils 0.1.7.2 review #121

Bisaloo · 2021-07-27T14:26:00Z

Preface

This is an informal review conducted by a lab member. To ensure maximal objectivity, the rOpenSci review template is used. This template also guarantees that this package is following the most up-to-date and strictest standards available in the R community.

The template is released under CC-BY-NC-SA and this review is therefore published under the same license.

The review was finished on 2021-07-27 and concerns the version 0.1.7.2 of scoringutils (commit de45fb7).

Package Review

Documentation

The package includes all the following forms of documentation:

☒ A statement of need clearly stating problems the software is designed to solve and its target audience in README

The scoringutils package provides a collection of metrics and proper scoring rules that make it simple to score forecasts against the true observed values.

Functionality

Estimated hours spent reviewing: 13h

Review Comments / Code Review

I noticed some occurrences of bool == TRUE which doesn’t really make sense and makes the code more difficult to read. If the object is already a logical, you can use it as-is in if. For example:

scoringutils/R/bias.R

Lines 86 to 90 in de45fb7

    
           if (all.equal(as.vector(predictions), as.integer(predictions)) != TRUE) { 
        
             continuous_predictions <- TRUE 
        
           } else { 
        
             continuous_predictions <- FALSE 
        
           }

could be simplified as

continuous_predictions <- !all.equal(as.vector(predictions), as.integer(predictions))

Some other occurrences:

scoringutils/R/eval_forecasts_continuous_integer.R

Line 52 in de45fb7

if (all.equal(data$prediction, as.integer(data$prediction)) == TRUE) {

scoringutils/R/pit.R

Line 157 in de45fb7

if (all.equal(as.vector(predictions), as.integer(predictions)) != TRUE) {

Good job on specifying the type of your NA (NA_real_, NA_integer_, etc.)!
I think there is some inconsistency in type / value checking before computation. For example, in brier_score(), there is a check that predictions takes values between 0 and 1 but this check is not present in, e.g., bias(). Would it make sense to have check_truth(), check_predictions() internal functions that you call each time?
Would it be worth printing a warning when the user requests ‘coverage’ in eval_forecasts_sample() instead of silently dropping it?
As mentioned in another discussion, there is some inconsistency in the use of data.table and modifying in place vs copying. Beyond the stylistic issue, this is a possible source of bugs so I’d recommend sticking to one or the other.
Some functions have a very large number of arguments which makes them difficult to use. Research in software engineering tends to suggests to the number of arguments should not exceed ~7. As the function is complex, it may be difficult to reduce the number of arguments but here are possible some ways:
- drop the verbose argument. Either the diagnostic messages are unnecessary and should be dropped entirely or they are useful and should be printed. If users really don’t want to see messages/warnings, they can use base functions suppressMessages() / suppressWarnings(). This could also be controlled by a global switch in options() like usethis is doing.
- plotting side effects could be removed. The primary goal of eval_forecasts() is to return a data.frame with the scores. Users that want the plot could call another function afterwards. This would allow the removal of the pit_plots argument.
- remove the possibility of having either a single data or forecasts, truth_data and merge_by as inputs.
- get rid of summarised and act as if summarised = TRUE if by != summarise_by (is this Check whether we can get rid of the summarised = TRUE argument? #106?)
I understand the wish to provide flexibility but since eval_forecasts() is meant to be a high-level wrapper / one-liner to compute everything, I believe it’s okay to provide a limited interface. Users that strive for more flexibility can always use the low-level individual scoring functions.
I think there is a misplaced closing parenthesis here and you mean
```
if (length(cols_to_delete) > 1) {
```
instead of the current:

scoringutils/R/pairwise-comparisons.R

Line 165 in de45fb7

if (length(cols_to_delete > 1)) {

Same here:

scoringutils/R/pairwise-comparisons.R

Line 220 in de45fb7

if (length(cols_to_remove > 0)) {

If this is indeed a bug, its presence twice in the same file suggests this code portion should be refactored as a function (isn’t that the purpose of delete_columns() already?). By the way, why is it > 1 in one case and > 0 in the other?
I cannot say for sure because as mentioned previously, I don’t understand the documentation of the test_options argument in compare_two_models() but this selection of the first element ([1]) does not seem super robust:

scoringutils/R/pairwise-comparisons.R

Line 373 in de45fb7

if (test_options$test_type[1] == "permutation") {
There are some occurrence where loops (vapply()) are used when you could rely on vectorized functions / linear algebra for much faster computation and a more readable code (fixed in Remove unnecessary vapply() #120):

[ ]

scoringutils/R/bias.R

Lines 95 to 99 in de45fb7

    
           P_x <- vapply(seq_along(true_values), 
        
                         function(i) { 
        
                           sum(predictions[i,] <= true_values[i]) / n_pred 
        
                         }, 
        
                         .0)

[ ]

scoringutils/R/bias.R

Lines 106 to 110 in de45fb7

    
           P_xm1 <- vapply(seq_along(true_values), 
        
                           function(i) { 
        
                             sum(predictions[i,] <= true_values[i] - 1) / n_pred 
        
                           }, 
        
                           .0)

[ ]

scoringutils/R/pit.R

Lines 169 to 173 in de45fb7

    
           P_x <- vapply(seq_along(true_values), 
        
                         function(i) { 
        
                           sum(predictions[i, ] <= true_values[i]) / n_pred 
        
                         }, 
        
                         .0)

[ ]

scoringutils/R/pit.R

Lines 193 to 197 in de45fb7

    
           P_xm1 <- vapply(seq_along(true_values), 
        
                           function(i) { 
        
                             sum(predictions[i,] <= true_values[i] - 1) / n_pred 
        
                           }, 
        
                           .0)

predictions <- matrix(rnorm(100), ncol = 10, nrow = 10)
true_values <- rnorm(10)

microbenchmark::microbenchmark(
  "vapply" = { vapply(seq_along(true_values), function(i) sum(predictions[i, ] <= true_values[i]), numeric(1)) },
  "vector" = rowSums(predictions <= true_values),
  check = "identical"
)

## Unit: microseconds
##    expr    min      lq     mean  median     uq    max neval
##  vapply 15.492 16.4605 17.74523 16.8215 17.302 47.741   100
##  vector  5.161  5.7475  6.24385  5.9215  6.089 34.897   100

In ggplot2 plots with the facet_wrap_or_grid argument, I would change the default value to c("facet_wrap", "facet_grid") and start the function with:
```
facet_wrap_or_grid <- match.arg(facet_wrap_or_grid)
```
Currently, a very minor and inconspicuous typo such as "facet_warp" would make it silently switch to facet_grid and it would be very difficult to notice the mistake.
All functions in scoringRules_wrappers.R seem to have the same checks at the beginning. It would be less error-prone to refactor this.
This is not a robust way to get the value of an argument with defaults:

scoringutils/R/utils.R

Line 208 in de45fb7

if (comparison_mode[1] == "ratio") {

scoringutils/R/utils.R

Line 220 in de45fb7

if (comparison_mode[1] == "ratio") {

scoringutils/R/utils_data_handling.R

Lines 422 to 430 in de45fb7

    
           if (join[1] == "left") { 
        
             # do a left_join, where all data in the observations are kept. 
        
             combined <- merge(observations, forecasts, by = by, all.x = TRUE) 
        
           } else if (join[1] == "full") { 
        
             # do a full, where all data is kept. 
        
             combined <- merge(observations, forecasts, by = by, all = TRUE) 
        
           } else { 
        
             combined <- merge(observations, forecasts, by = by, all.y = TRUE) 
        
           }

Instead, you should use:

comparison_mode <- match.arg(comparison_model)

...

if (comparison_mode == "ratio") { ... }

Deprecated functions from utils_data_handling.R should be categorised as such in the pkgdown reference index.
There is no unit testing here since this is not a testthat function:

scoringutils/tests/testthat/test-bias.R

Line 48 in de45fb7

    
           all(scoringutils::bias(true_values, predictions) == scoringutils::bias(true_values, predictions))

Conclusion

This is overall a solid package that could become a widely used tool in forecast sciences. I could not see any bugs in the code and the performance looks very good on the examples I ran. The package interface is clever and can surely prove useful to a large array of users thanks to the two levels of functions (low-level scoring functions vs all-in-one eval_forecasts()).

Two points could slow down / reduce adoption and these should be fixed for this package to reach its full potential and attract as many users as possible:

the package remains complex to use. This complexity is in part inherent to the task but it could nonetheless be reduced by following best practices in software engineering such as reducing the number of parameters and adopting a consistent naming scheme.
there is no strong evidence that this package implements correctly the computed metrics. This is especially important for fields that can have a policy impact. Test coverage should be increased and comparisons to computation via other tools / methods should be added.

The text was updated successfully, but these errors were encountered:

seabbs · 2021-08-04T10:22:25Z

This is amazing and very useful.

These points about reducing interface complexity seem spot on:

drop the verbose argument. Either the diagnostic messages are unnecessary and should be dropped entirely or they are useful and should be printed. If users really don’t want to see messages/warnings, they can use base functions > suppressMessages() / suppressWarnings(). This could also be controlled by a global switch in options() like usethis is > doing.

plotting side effects could be removed. The primary goal of eval_forecasts() is to return a data.frame with the scores. Users that want the plot could call another function afterwards. This would allow the removal of the pit_plots argument.

remove the possibility of having either a single data or forecasts, truth_data and merge_by as inputs.
get rid of summarised and act as if summarised = TRUE if by != summarise_by (is this Check whether we can get rid of the summarised = TRUE argument? Check whether we can get rid of the summarised = TRUE argument? #106?)

nikosbosse · 2022-01-12T14:41:59Z

Bonjour!

A few questions:

do I need examples for non-exported functions? In the list above, this is compare_two_models() and pairwise_comparison_one_group()

recommended using the fct() when talking about functions. It is then easier to make the difference between functions and other objects and it enables auto-linking to the function documentation in the pkgdown website.

does that work in the vignette as well?
Also re the vignette: at some point probably the future paper should become the vignette. At the moment the vignette is essentially the same as the Readme.Rmd file (that is then displayed as Readme.md). At the moment I've just been using function() both in the readme and the vignette. Is there a better way to handle this?

there is a minor issue with the equation rendering in the pkgdown website (e.g., https://epiforecasts.io/scoringutils/reference/abs_error.html). The solution is probably to pass both a LaTeX/mathjax and a ASCII version of the equation to \deqn{}.

how can I do that?

As mentioned in another discussion, there is some inconsistency in the use of data.table and modifying in place vs copying. Beyond the stylistic issue, this is a possible source of bugs so I’d recommend sticking to one or the other.

not entirely sure what to do here

Thank you very much!

seabbs · 2022-01-13T15:55:33Z

I think you can't have examples for non-exported functions? Or at least you can't without some workaround. I would say no anyway.

Yes

Will the paper be kept updated for ever? I would probably be in favour of having the paper content spread across multiple vignettes as I imagine quite long. That way will be more fluid and easy to update. If the vignette is the same as the readme I would probably drop the vignette or focus move most of the content into the vignette and just keep a small quick start in the readme.

no idea on the equation issue (@Bisaloo probably knows).

I would just use data.table or dplyr. I doon't think its a major issue though.

The preview you gave today looked great by the way.

seabbs · 2022-03-24T17:59:03Z

Is this closable or worth going back over?

nikosbosse · 2022-03-27T16:16:15Z

I think we can close it. Testing remains an issue, but that has its own issue :)

nikosbosse · 2022-03-27T16:16:30Z

Thank you again @Bisaloo, this was amazing!

seabbs · 2022-03-28T10:06:19Z

Might be best to let @Bisaloo close as part of the review process?

Bisaloo · 2022-03-29T13:42:55Z

Yep, I think the major points have been moved to separate issues. Let's continue the discussion there.

Bisaloo mentioned this issue Jan 20, 2022

Provide both a LaTeX and ASCII version of equations when necessary #173

Closed

seabbs mentioned this issue Feb 3, 2022

Scoringutils 1.0.0 review #179

Closed

17 tasks

nikosbosse closed this as completed Mar 27, 2022

seabbs reopened this Mar 28, 2022

Bisaloo closed this as completed Mar 29, 2022

nikosbosse mentioned this issue Oct 30, 2023

Review scoringutils 2.0.0 #373

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scoringutils 0.1.7.2 review #121

scoringutils 0.1.7.2 review #121

Bisaloo commented Jul 27, 2021 •

edited by nikosbosse

Loading

seabbs commented Aug 4, 2021

nikosbosse commented Jan 12, 2022 •

edited

Loading

seabbs commented Jan 13, 2022

seabbs commented Mar 24, 2022

nikosbosse commented Mar 27, 2022

nikosbosse commented Mar 27, 2022

seabbs commented Mar 28, 2022

Bisaloo commented Mar 29, 2022

scoringutils 0.1.7.2 review #121

scoringutils 0.1.7.2 review #121

Comments

Bisaloo commented Jul 27, 2021 • edited by nikosbosse Loading

Preface

Package Review

Documentation

Functionality

Review Comments / Code Review

Conclusion

seabbs commented Aug 4, 2021

nikosbosse commented Jan 12, 2022 • edited Loading

seabbs commented Jan 13, 2022

seabbs commented Mar 24, 2022

nikosbosse commented Mar 27, 2022

nikosbosse commented Mar 27, 2022

seabbs commented Mar 28, 2022

Bisaloo commented Mar 29, 2022

Bisaloo commented Jul 27, 2021 •

edited by nikosbosse

Loading

nikosbosse commented Jan 12, 2022 •

edited

Loading