create proposal for a add_transformation function #271

nikosbosse · 2023-02-27T17:16:26Z

I just implemented a first proposal for #270.

I think one question is whether we want to call the extra column 'scale' or something else. 'transformation'? I like the word scale, but also "log scale" is potentially ambiguous.

I haven't done any checking here (which one could do), but I though it would be bit of an overkill since it's an extremely simple function and running eg check_forecasts() can take some while.

R/convenience-functions.R

tests/testthat/test-convenience-functions.R

seabbs

This looks nice. I was a little confused by the workflow you were proposing so more docs for that I think (though if its as I imagine it seems nice enough). Also just need more docs in general and adding some default options with verbose messaging of what they mean seems like it would be very useful.

How do you imagine this function being used in serial (so I want to have a series of transformations). At the moment I think it will just break but could easily code around this.

R/convenience-functions.R

nikosbosse · 2023-02-28T12:59:58Z

Ah yes, I think your suggestion makes probably more sense than mine if you have more than one transformation. The workflow I had in mind was something like

forecasts |>
  add_transformation(option = "log") |>
  score() |>
  summarise_score(by = c("model", "scale"))

with results something like

model	wis	scale
A.	2.	natural
A.	0.1	log
B.	3.	natural
B.	0.3	log

So it would automatically just add the log scaled data to your data.frame and add a separate column. With your alternative suggestion the equivalent workflow would be something like

forecasts |>
  mutate(scale = "natural") |>
  rbind(
    scale_forecasts(option = "log") |>
      mutate(scale = "log")
  ) |>
  score() |>
  summarise_score(by = c("model", "scale"))

(the additional suggestion in your workflow to enable data |> check_forecasts() |> score() would require an additional change to check_forecasts() that we probably should deal with separately)

seabbs · 2023-03-01T10:42:59Z

I like your workflow as its certainly less clunky but yes worry about what happens when you want multiple transformations. Perhaps the solution is to add some code logic t oadd_transformation so that it looks for a scale variable and if it finds it filters for natural and applies everything there then appends to the end vs doing the current logic if it doesn't find it?

The alternative would be making it be called once and enabling passing a list of transformations but I am not a big fan of that as it hides quite a lot of code internally for small gain..

I hadn't realised check_forecasts wasn't pipeable - I guess it doesn't totally need to be but it would be nice if it were. Solutions are likely some kind of s3 class I would guess.

nikosbosse · 2023-03-01T17:26:47Z

Trying to summarise the discussion:
we want to

implement the version with add_transformation(option = "log")
- have a way to do that multiple times, checking whether there is already a "scale" column
have your proposed function that transforms without adding as well?
have default options?
- problem with that one e.g. for the log would be that one would need to pass in the offset as an extra argument
- and for population that would require a very specific column with a name that's appropriate in epi, but maybe not in other instances
have the possibility to pass in a custom function? I
- think that would be nice
have both options, i.e. some prespecified choicse + custom functions?

seabbs · 2023-03-01T21:29:06Z

I think we:

implement the version with add_transformation(option = "log")
have a way to do that multiple times, checking whether there is already a "scale" column

Want this as its nice and clear for one but feel we need to support multiple.

have your proposed function that transforms without adding as well?

No I don't think we need this as well.

have default options?
problem with that one e.g. for the log would be that one would need to pass in the offset as an extra argument
and for population that would require a very specific column with a name that's appropriate in epi, but maybe not in other instances

I think we want defaults as lots of users won't know what to use. For the log we could just set the offset and for more advanced use expect users to specify manually (we can clearly document this).

For population adjustments its really just a normalisation using an offset which is common everywhere if we label it as that and say in the docs its using it for population would mean x might work?

have the possibility to pass in a custom function? I
think that would be nice

Agree it definitely needs this and can just have this in the first instance (though has default options aren't that much work I would suggest we do both).

sbfnk · 2023-03-02T08:15:51Z

implement the version with add_transformation(option = "log")
have a way to do that multiple times, checking whether there is already a "scale" column

Are the transformation always targeting the natural values? Or might we want to apply a scaling to the scaled values? The former makes more sense to me overall (as anything could be passed in a custom function) but it might be a bit complicated having to communicate to the user that what the operation does depends on whether a scale column is present and contains values "natural". Alternatively the log+1 could be implemented as e.g.

forecasts |>
  transform(operation = sum,  offset = 1) |>
  transform(operation = log) |>
  score() |>
  summarise_score(by = c("model"))

which would remove the need to name the transform and then filter out columns values (I think this is your option (a), @nikosbosse). If implemented in an S3 class (as mentioned by @seabbs in #270 (comment)) then we could track what has been applied.

I think your suggestion might work better than this but would require careful signalling to the user as it could be confusing.

seabbs · 2023-03-02T09:45:29Z

@sbfnk for your suggestion you would lose the ability to score natural and transformed forecasts at once as in @nikosbosse approach + wouldn't be able stack transformations without the addition of code. On the plus side the code is much simpler, the workflow for a single transform is cleaner, and you don't need to bother labelling things which is a pain.

It is similar in spirit to my suggestion here: #271 (comment)

Having seen @nikosbosse suggest use case I do like the idea of scoring multiple transformations at once and having functionality to make this easier. Not entirely convinced its worth the complexity trade-offs you highlight though.

Default options: I really like the idea of default options (like those supplied in the scale_ ggplot2 functions) as its a nice hand holder for less experienced users and they can easily be documented (i.e in the examples). That said it really isn't much code to add these so perhaps just having examples and links to reading more is all that is needed.

I do like the use case for s3 here and in other parts of this package but we have discussed this previously and decided not to go this way. I think it might be better to have a general should scoringutils have an s3 rewrite than introduce piecemeal?

… tests

codecov · 2023-03-05T13:57:34Z

Codecov Report

Merging #271 (7bf1bab) into master (66487aa) will decrease coverage by 0.40%.
The diff coverage is 78.04%.

❗ Current head 7bf1bab differs from pull request most recent head 4a72b1d. Consider uploading reports for the commit 4a72b1d to get more accurate results

@@            Coverage Diff             @@
##           master     #271      +/-   ##
==========================================
- Coverage   90.82%   90.42%   -0.40%     
==========================================
  Files          21       22       +1     
  Lines        1286     1327      +41     
==========================================
+ Hits         1168     1200      +32     
- Misses        118      127       +9

Impacted Files	Coverage Δ
R/convenience-functions.R	`78.04% <78.04%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

nikosbosse · 2023-03-05T14:38:17Z

So I updated the suggested code a bit:

For now I decided against dedicated default options. Mainly because option = "log" doesn't seem that much easier to me than fun = log. Open to being convinced otherwise.
For fun = log I added some "clever" code to check whether there are any zero values. If there are, then it automatically applies an offset and complains to the user with a warning. You can pass an argument offset = 1 or whatever offset you like to the function. I think the workflow is well documented and intuitive, but I imagine this to be somewhat controversial :D
it has an argument add = TRUE to toggle whether you want to transform the existing data, or whether you want to add the transformed data to your existing data. This also works multiple times and there is an example for it

What do you think?

Some workflow examples:

#' library(magrittr) #(pipe operator)
#' # replace negative values with zero
#' transformed <- example_quantile %>%
#'   transform_forecasts(fun = function(x) {pmax(0, x)}, add = FALSE)
#'
#' # add log transformed forecasts (produces a warning as some values are zero)
#' transform_forecasts(transformed)
#'
#' # specifying an offset manually for the log transformation removes the warning
#' transform_forecasts(transformed, offset = 1)
#'
#' # adding multiple transformations
#' transformed %>%
#'   transform_forecasts(offset = 1) %>%
#'   transform_forecasts(fun = sqrt, label = "sqrt") %>%
#'   score() %>%
#'   summarise_scores(by = c("model", "scale"))

sbfnk · 2023-03-06T08:48:05Z

Alternative would be to create a small function log_shift or similar that removes the need for filtering through dot arguments, hiding default options and checking whether any function is a specific premitive?

log_shift <- function(x, offset = 1, ...) {
  return(log(x + offset, ...))
}

This could then be the default argument for transform_forecasts if you really wanted a default argument.

it has an argument add = TRUE to toggle whether you want to transform the existing data, or whether you want to add the transformed data to your existing data. This also works multiple times and there is an example for it

I'd call it append or similar - add to me implies numerical addition as in e.g. magrittr::add

nikosbosse · 2023-03-07T15:15:14Z

Maybe the best idea would be to just have an option offset = 0 (or shift = 0?) that just adds one to everything. It could then automatically do that for the log (or not do it and just complain / alert the user, which I guess is the better option).
With this we wouldn't have to change the log function which I think is good, but there would also be a simple argument that allows the user to apply an offset.

seabbs · 2023-03-08T12:41:22Z

I still prefer @sbfnk's suggestion of a custom transformation function vs giving special design consideration in this function for the log transform but your call.

seabbs · 2023-03-24T10:02:10Z

@nikosbosse any help needed to unstick this?

nikosbosse · 2023-03-24T15:07:34Z

No real blockers, I was waiting on more feedback from Le Big Boss. I'll implement the log_shift() version :)

seabbs · 2023-03-24T17:00:43Z

I was waiting on more feedback from Le Big Boss.

Really makes me feel empowered 😆

nikosbosse · 2023-03-24T17:15:25Z

You had cast your vote :)
Implemented a new suggestion. Current draft exports a function log_shift() that allows the user to add an offset and also truncate negative values in one function call. Default offset is 0.

R/convenience-functions.R

seabbs

Looks good to me. Seems like a nice workflow. Just flagged that the docs are currently out of date.

Not for this PR by a vignette on forecast transformation seems like a nice and useful idea.

…out adding anything

nikosbosse · 2023-03-30T20:24:10Z

@sbfnk @seabbs happy to merge?

sbfnk

Nearly there. Just one more question.

R/convenience-functions.R

sbfnk · 2023-03-30T20:59:16Z

R/convenience-functions.R

+log_shift <- function(
+    x,
+    offset = 0,
+    truncate = FALSE,


Do we really want this option here? I'd think it's better for the log function to throw an error and the user to pre-process the forecasts/data so that there are no negatives.

If sticking with it I would suggest to make this more self-explanatory, i.e. calling the option negative_to_zero or something like that (although given that we might be adding an offset anyway I'm not clear why we'd choose zero over anything else).

I think the reason to have that truncation is that the alternative (at least staying in the same workflow) is a bit more clunky. You would have to use something like transform_forecasts(fun = function(x) pmax(0, x), append = FALSE, label = "natural")

See e.g. these examples:

#' example_quantile %>% #' transform_forecasts(fun = function(x) pmax(0, x), append = FALSE) %>% #' transform_forecasts(fun = sqrt, label = "sqrt")

#' # adding multiple transformations #' library(magrittr) # pipe operator #' example_quantile %>% #' transform_forecasts(offset = 1, truncate = TRUE) %>% #' # manually truncate all negative values before applying sqrt #' transform_forecasts(fun = function(x) pmax(0, x), append = FALSE, label = "natural") %>% #' transform_forecasts(fun = sqrt, label = "sqrt") %>% #' score() %>% #' summarise_scores(by = c("model", "scale"))

(the part with the label = natural here is only because it's transforming the original while something else has already been computed, which maybe is a stupid thing to do?)

Looking at it again I should definitely simplify the second example...

I think the reason to have that truncation is that the alternative (at least staying in the same workflow) is a bit more clunky. You would have to use something like transform_forecasts(fun = function(x) pmax(0, x), append = FALSE, label = "natural")

Wouldn't it be more natural to do something like:

example_quantile %>% dplyr::filter(true_value >= 0) %>% transform_forecasts(fun = sqrt, label = "sqrt")

or

example_quantile %>% dplyr::mutate(true_value = dplyr::ifelse(true_value < 0, 0, true_value)) %>% transform_forecasts(fun = sqrt, label = "sqrt")

i.e. fix the problem in the data rather than letting the transformation function take care of it?

I don't have very strong feelings either way. The truncate option (now changed to negative_to_zero) is easier if you have negative values in both predictions and true values. So if you know what you're doing it's convenient (and also turned off by default). But agree that it feels maybe a bit odd?

nikosbosse · 2023-04-07T07:14:35Z

@seabbs do you have feelings about the negative_to_zero argument and whether or not it should be kept?
Pro: it's a bit more convenient for users who know what they're doing (as they can replace forecasts and observations in one go)
Con: it's a bit cleaner to force users to make the changes manually

Current version has the examples replaced to make everything a bit easier to understand:

#' library(magrittr) # pipe operator
#'
#' # add log transformed forecasts (produces a warning as some values are zero)
#' # negative values need to be handled (here by replacing them with 0)
#' example_quantile %>%
#'   .[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
#'   transform_forecasts(append = FALSE)
#'
#' # alternatively:
#' transform_forecasts(example_quantile, negative_to_zero = TRUE, append = FALSE)
#'
#' # specifying an offset manually for the log transformation removes the warning
#' example_quantile %>%
#'   .[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
#'   transform_forecasts(offset = 1, append = FALSE)
#'
#' # truncating forecasts manually before sqrt
#' example_quantile %>%
#'   .[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
#'   transform_forecasts(fun = sqrt, label = "sqrt")
#'
#' # alternatively, this achieves the same
#' example_quantile %>%
#'   transform_forecasts(fun = function(x) pmax(0, x), append = FALSE) %>%
#'   transform_forecasts(fun = sqrt, label = "sqrt")
#'
#' # adding multiple transformations
#' library(magrittr) # pipe operator
#' example_quantile %>%
#'   .[, true_value := ifelse(true_value < 0, 0, true_value)] %>%
#'   transform_forecasts(offset = 1) %>%
#'   transform_forecasts(fun = sqrt, label = "sqrt") %>%
#'   score() %>%
#'   summarise_scores(by = c("model", "scale"))

nikosbosse · 2023-04-07T07:15:57Z

@Bisaloo do you have thoughts? :) democratic development

seabbs · 2023-04-07T10:07:54Z

I think if we are keeping it then it should be its own helper function. I also think this is could be true for offsetting.

nikosbosse · 2023-04-07T10:36:20Z

I think if we are keeping it then it should be its own helper function. I also think this is could be true for offsetting.

Yes definitely - it's currently an argument of the log_shift() helper function (maybe I should adapt the examples again to make that clear).

the transform_forecasts() function currently only has the following arguments:

transform_forecasts <- function(data,
                                fun = log_shift,
                                append = TRUE,
                                label = "log",
                                ...)

and log_shift() has

log_shift <- function(x,
                      offset = 0,
                      negative_to_zero = FALSE,
                      base = exp(1))

seabbs · 2023-04-07T12:21:02Z

it's currently an argument of the

Yes, so not an argument but a helper function on its own.

nikosbosse · 2023-04-07T13:22:03Z

I'm not sure I understand. So you would have a separate helper function that's a wrapper for \(x) pmax(0,x)? And then one would call transform_forecasts() twice?

seabbs · 2023-04-07T16:07:54Z

A helper independent of transform forecasts but 🤷🏻

nikosbosse · 2023-04-11T06:51:42Z

I merged a slightly different version without a negative_to_zero argument in log_shift() now. Could still decide to introduce that argument later if we like.

create proposal for a add_transformation function

80427d4

nikosbosse requested a review from seabbs February 27, 2023 17:16

nikosbosse added 2 commits February 27, 2023 18:17

add namespace entry and update docs

710e94d

fix failing tests

21625af

seabbs reviewed Feb 28, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Feb 28, 2023

View reviewed changes

R/convenience-functions.R Show resolved Hide resolved

seabbs reviewed Feb 28, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Feb 28, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Feb 28, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Feb 28, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Feb 28, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Feb 28, 2023

View reviewed changes

tests/testthat/test-convenience-functions.R Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

seabbs reviewed Feb 28, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

nikosbosse added 2 commits March 3, 2023 16:32

function draft that has specific options

b970ca7

update suggested draft for transform_forecasts, add documentation and…

e977438

… tests

nikosbosse added 2 commits March 5, 2023 15:03

add example for adding multiple transformations

59c0809

update examples, handle adding multiple transformations

149c354

nikosbosse and others added 3 commits March 5, 2023 16:29

update doc

0dc87f6

Merge branch 'master' into add_scale_fct

7a2783c

update documentation

70d0e30

seabbs mentioned this pull request Mar 24, 2023

How to score literature/other resources #267

Open

add a log_shift() function that can be called from transform_forecasts()

6dba7e3

seabbs reviewed Mar 24, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Mar 24, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Mar 24, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Mar 24, 2023

View reviewed changes

R/convenience-functions.R Outdated Show resolved Hide resolved

seabbs reviewed Mar 24, 2023

View reviewed changes

nikosbosse added 2 commits March 24, 2023 23:58

update code to enable transforming existing natural scale values with…

846d1e9

…out adding anything

fix failing example

a4f2037

sbfnk reviewed Mar 30, 2023

View reviewed changes

update documentation, rename fucnction argument and lint code

fd3f55d

update documentation

4a72b1d

nikosbosse mentioned this pull request Apr 10, 2023

Add a transform_forecasts() function #278

Merged

nikosbosse merged commit 4a72b1d into master Apr 11, 2023

seabbs deleted the add_scale_fct branch April 11, 2023 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

create proposal for a add_transformation function #271

create proposal for a add_transformation function #271

nikosbosse commented Feb 27, 2023

This comment was marked as outdated.

seabbs left a comment

nikosbosse commented Feb 28, 2023 •

edited

Loading

seabbs commented Mar 1, 2023

nikosbosse commented Mar 1, 2023

seabbs commented Mar 1, 2023

sbfnk commented Mar 2, 2023 •

edited

Loading

seabbs commented Mar 2, 2023 •

edited

Loading

codecov bot commented Mar 5, 2023 •

edited

Loading

nikosbosse commented Mar 5, 2023

sbfnk commented Mar 6, 2023

nikosbosse commented Mar 7, 2023

seabbs commented Mar 8, 2023

seabbs commented Mar 24, 2023

nikosbosse commented Mar 24, 2023

seabbs commented Mar 24, 2023

nikosbosse commented Mar 24, 2023

seabbs left a comment

nikosbosse commented Mar 30, 2023

sbfnk left a comment

sbfnk Mar 30, 2023

nikosbosse Mar 30, 2023

nikosbosse Mar 30, 2023

sbfnk Mar 30, 2023

nikosbosse Mar 30, 2023 •

edited

Loading

nikosbosse commented Apr 7, 2023

nikosbosse commented Apr 7, 2023

seabbs commented Apr 7, 2023

nikosbosse commented Apr 7, 2023

seabbs commented Apr 7, 2023

nikosbosse commented Apr 7, 2023

seabbs commented Apr 7, 2023 •

edited

Loading

nikosbosse commented Apr 11, 2023

create proposal for a add_transformation function #271

create proposal for a add_transformation function #271

Conversation

nikosbosse commented Feb 27, 2023

This comment was marked as outdated.

seabbs left a comment

Choose a reason for hiding this comment

nikosbosse commented Feb 28, 2023 • edited Loading

seabbs commented Mar 1, 2023

nikosbosse commented Mar 1, 2023

seabbs commented Mar 1, 2023

sbfnk commented Mar 2, 2023 • edited Loading

seabbs commented Mar 2, 2023 • edited Loading

codecov bot commented Mar 5, 2023 • edited Loading

Codecov Report

nikosbosse commented Mar 5, 2023

sbfnk commented Mar 6, 2023

nikosbosse commented Mar 7, 2023

seabbs commented Mar 8, 2023

seabbs commented Mar 24, 2023

nikosbosse commented Mar 24, 2023

seabbs commented Mar 24, 2023

nikosbosse commented Mar 24, 2023

seabbs left a comment

Choose a reason for hiding this comment

nikosbosse commented Mar 30, 2023

sbfnk left a comment

Choose a reason for hiding this comment

sbfnk Mar 30, 2023

Choose a reason for hiding this comment

nikosbosse Mar 30, 2023

Choose a reason for hiding this comment

nikosbosse Mar 30, 2023

Choose a reason for hiding this comment

sbfnk Mar 30, 2023

Choose a reason for hiding this comment

nikosbosse Mar 30, 2023 • edited Loading

Choose a reason for hiding this comment

nikosbosse commented Apr 7, 2023

nikosbosse commented Apr 7, 2023

seabbs commented Apr 7, 2023

nikosbosse commented Apr 7, 2023

seabbs commented Apr 7, 2023

nikosbosse commented Apr 7, 2023

seabbs commented Apr 7, 2023 • edited Loading

nikosbosse commented Apr 11, 2023

nikosbosse commented Feb 28, 2023 •

edited

Loading

sbfnk commented Mar 2, 2023 •

edited

Loading

seabbs commented Mar 2, 2023 •

edited

Loading

codecov bot commented Mar 5, 2023 •

edited

Loading

nikosbosse Mar 30, 2023 •

edited

Loading

seabbs commented Apr 7, 2023 •

edited

Loading