Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 432: Add some basic functions for as_point and as_quantile #790

Closed
wants to merge 19 commits into from
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

S3method(`[`,scores)
S3method(as_forecast,default)
S3method(as_point,default)
S3method(as_point,forecast_quantile)
S3method(as_point,forecast_sample)
S3method(as_quantile,default)
S3method(as_quantile,forecast_sample)
seabbs marked this conversation as resolved.
Show resolved Hide resolved
Comment on lines +5 to +9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comment is that this feels as if there is a new point, or quantile class. I believe it would make sense, and be slightly easier to wrap my head around as a potential user, if we re-used the existing classes and terminology. I.e., forecast_point and forecast_quantile.

This would lead to longer names but that's probably still ok 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think if we are going with a abstract class as in #373 we should rename.

S3method(assert_forecast,default)
S3method(assert_forecast,forecast_binary)
S3method(assert_forecast,forecast_point)
Expand All @@ -27,6 +32,8 @@ export(add_relative_skill)
export(ae_median_quantile)
export(ae_median_sample)
export(as_forecast)
export(as_point)
export(as_quantile)
export(assert_forecast)
export(assert_forecast_generic)
export(bias_quantile)
Expand Down
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ scores <- score(forecast_quantile)
- To create and validate a new `forecast` object, users can use `as_forecast()`. To revalidate an existing `forecast` object users can call `assert_forecast()` (which validates the input and returns `invisible(NULL)`. `assert_forecast()` is a generic with methods for the different forecast types. Alternatively, `validate_forecast()` can be used (which calls `assert_forecast()`), which returns the input and is useful in a pipe. Lastly, users can simply print the object to obtain additional information.
- Users can test whether an object is of class `forecast_*()` using the function `is_forecast()`. Users can also test for a specific `forecast_*` class using the appropriate `is_forecast.forecast_*` method. For example, to check whether an object is of class `forecast_quantile`, you would use you would use `scoringutils:::is_forecast.forecast_quantile()`.

### Updating a forecast object
- The functions `as_point` and `as_quantile` have been added to allow conversion of various forecast types to point and quantile forecasts respectively.
seabbs marked this conversation as resolved.
Show resolved Hide resolved

### Pairwise comparisons and relative skill
- The functionality for computing pairwise comparisons was now split from `summarise_scores()`. Instead of doing pairwise comparisons as part of summarising scores, a new function, `add_relative_skill()`, was introduced that takes summarised scores as an input and adds columns with relative skill scores and scaled relative skill scores.
- The function `pairwise_comparison()` was renamed to `get_pairwise_comparisons()`, in line with other `get_`-functions. Analogously, `plot_pairwise_comparison()` was renamed to `plot_pairwise_comparisons()`.
Expand Down
109 changes: 109 additions & 0 deletions R/as_point.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#' @title Convert a forecast object to a point forecast
#'
#' @description
#' The `as_point` function is used to convert a forecast object to a point
#' forecast. It is a generic function that dispatches the conversion to the
#' appropriate method based on the class of the input forecast object.
#'
#' @param forecast An object of class `forecast_{type}`,
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#' representing a forecast.
#' @param ... Additional arguments to be passed to the specific method.
#' @return The function returns a point forecast object, which is a specific
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#' type of forecast object that represents a single value prediction.
#'
#' @export
#' @keywords check-forecasts
#' @examples
#' as_point(as_forecast(example_quantile))
#' as_point(as_forecast(example_sample_continuous))
as_point <- function(forecast, ...) {
UseMethod("as_point")
}

#' @rdname as_point
#' @export
#' @importFrom cli cli_abort
as_point.default <- function(forecast, ...) {
cli_abort(
seabbs marked this conversation as resolved.
Show resolved Hide resolved
c(
"!" = "The input needs to be a forecast object.",
"i" = "Please run `as_forecast()` first." # nolint
)
)
}

#' Convert a quantile forecast to a point forecast
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#'
#' This function takes a quantile forecast and converts it to a point forecast
#' by selecting the forecast corresponding to the specified quantile level.
#'
#' @param forecast The `forecast_quantile` object.
#' @param quantile_level The desired quantile level for the point forecast.
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#' Defaults to 0.5 (median).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised to see this as an argument - do you have a use case in mind where one would want it not to be 0.5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not reallly but I thought there was little issue with giving this option

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it's a minor issue but it does add to the cognitive burden on anyone coming tot he function so if we think there's no use case it might be better to simplify?

#' @param ... Additional arguments passing inherited from the default method but
#' unused.
#'
#' @return A `forecast_point` object.
#'
#' @export
#' @keywords check-forecasts
#' @examples
#' as_point(as_forecast(example_quantile))
as_point.forecast_quantile <- function(forecast, quantile_level = 0.5, ...) {
assert_forecast(forecast, verbose = FALSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_forecast(forecast, verbose = FALSE)
assert_forecast(forecast, forecast_type = "quantile", verbose = FALSE)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we validate the forecast at all then we should also make sure the forecast type is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seee above point and my disagreement with this

assert_numeric(quantile_level, lower = 0, upper = 1, len = 1)
seabbs marked this conversation as resolved.
Show resolved Hide resolved

forecast <- forecast[
quantile_level == target_quantile_level, ,
env = list(target_quantile_level = quantile_level)
]
forecast[, "quantile_level" := NULL]

point_forecast <- remake_forecast(
forecast, "forecast_quantile", "forecast_point", verbose = FALSE
)
return(point_forecast)
}

#' Convert a sample based forecast to a point forecast
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#'
#' This function converts a forecast object to a point forecast by either
#' taking the quantile forecast at a specified quantile level or by summarizing
#' the forecast using a custom function (for example `mean`).
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @param forecast The `forecast_sample` object to be converted to a point
#' forecast.
#'
#' @param fun A custom function to summarize the forecast. If provided, this
#' function will be used instead of the quantile method. An example could
#' be the `mean`.
#' @inheritParams as_point.forecast_quantile
#' @return A `forecast_point` object.
#' @export
#' @keywords check-forecasts
#' @examples
#' sample_forecast <- as_forecast(example_sample_continuous)
#'
#' # Quantile approach
#' as_point(sample_forecast)
#'
#' # Function approach
#' as_point(sample_forecast, fun = mean)
as_point.forecast_sample <- function(forecast, quantile_level = 0.5, fun, ...) {
seabbs marked this conversation as resolved.
Show resolved Hide resolved
assert_forecast(forecast, verbose = FALSE)
sbfnk marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert_forecast(forecast, verbose = FALSE)
assert_forecast(forecast, forecast_type = "sample", verbose = FALSE)

I think it makes sense to explicitly check for the correct forecast type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't thinks its needeed given it dispatches based on the type in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we do this with other auto dispatch functions? If not I think its a new issue and something to implement package widee.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we don't do this elsewhere so suggest it as a new issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new issue sounds good. If we're keeping the assert_forecast() then I think we definitely want it here.

The bigger question is probably "do we want to do a validation check at all?" and I think that's up for debate.
But if we're doing a validation check then we should also check whether the object has the right forecast type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we're doing a validation check then we should also check whether the object has the right forecast type.

I don't think I agree. as_point.forecast_sample isn't a user facing function and so they should only be calling as_point hence specifying the class again is a not needed. I can see the safety argument but it also bloats the code and makes it hard to ever rely on 3 properly with that pattern.

if (missing(fun)) {
seabbs marked this conversation as resolved.
Show resolved Hide resolved
quantile_forecast <- as_quantile(forecast, quantile_levels = quantile_level)
point_forecast <- as_point(
quantile_forecast, quantile_level = quantile_level
)
} else {
sum_forecast <- summarise_scores(
forecast, fun = fun, by = c(get_forecast_unit(forecast), "observed"),
metrics = "predicted"
)
point_forecast <- remake_forecast(
sum_forecast, "forecast_sample", "forecast_point"
)
}
return(point_forecast)
}
80 changes: 80 additions & 0 deletions R/as_quantile.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
#' Convert a forecast object to a quantile object
#'
#' This function is used to convert a forecast object to a quantile object.
#' It dispatches to the appropriate method based on the class of the forecast
#' object.
#'
#' @inheritParams as_point
#'
#' @return A `forecast_quantile` object.
#'
#' @export
#' @keywords check-forecasts
#' @examples
#' as_quantile(as_forecast(example_sample_continuous))
#'
as_quantile <- function(forecast, ...) {
UseMethod("as_quantile")
}

#' @rdname as_quantile
#' @export
#' @importFrom cli cli_abort
as_quantile.default <- function(forecast, ...) {
cli_abort(
c(
"!" = "The input needs to be a forecast object.",
"i" = "Please run `as_forecast()` first." # nolint
)
)
}

#' Convert a sample forecast to a quantile forecast
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#'
#' This function takes a sample forecast and converts it to a quantile forecast
#' sample.
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @param forecast The `forecast_sample`` object to convert to a
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#' `forecast_quantile`.
#'
#' @param quantile_levels A vector of quantile levels. Defaults to
#' 0.01 to 0.99 by 0.01.
#' @inheritParams as_point.forecast_quantile
#'
#' @return A `forecast_quantile` object.
#' @export
#' @keywords check-forecasts
#' @importFrom data.table melt
#'
#' @export
#' @examples
#' as_quantile(as_forecast(example_sample_continuous))
as_quantile.forecast_sample <- function(
seabbs marked this conversation as resolved.
Show resolved Hide resolved
forecast, quantile_levels = seq(from = 0.01, to = 0.99, by = 0.01), ...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
forecast, quantile_levels = seq(from = 0.01, to = 0.99, by = 0.01), ...
forecast, quantile_level = seq(from = 0.01, to = 0.99, by = 0.01), ...

I vote for calling this quantile_level for two reasons:

  • the argument is called quantile_level everywhere else
  • the column in the output object will also be called quantile_level
    I see the point for using the plural, but overall think it would be easier to use the singular here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree but happy to standardise with the rest of the code base. It feels like this needs a new issue though as its not very intuitive. (i.e its probs not prob in quantile for a reason).

) {
assert_forecast(forecast, verbose = FALSE)
assert_numeric(quantile_levels, lower = 0, upper = 1)

sum_forecast <- forecast[,
as.list(quantile(predicted, probs = quantile_levels, na.rm = TRUE)),
by = c(eval(get_forecast_unit(forecast)), "observed")
]

sum_forecast <- melt(
sum_forecast,
measure.vars = paste0(quantile_levels * 100, "%"),
variable.name = "quantile_level",
value.name = "predicted"
)

sum_forecast[,
quantile_level := as.numeric(
gsub("%", "", quantile_level, fixed = TRUE)
) / 100
]

quantile_forecast <- remake_forecast(
sum_forecast, "forecast_sample", "forecast_quantile"
)
return(quantile_forecast)
}
89 changes: 61 additions & 28 deletions R/forecast.R
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,9 @@ as_forecast <- function(data,
#' @param sample_id (optional) Name of the column in `data` that contains the
#' sample id. This column will be renamed to "sample_id". Only applicable to
#' sample-based forecasts.
#' @param check_forecast_type Logical, defaults to `FALSE`. Should the
#' `forecast_type` be checked and reset if not as implied by the `data`?
seabbs marked this conversation as resolved.
Show resolved Hide resolved
#'
#' @export
#' @importFrom cli cli_warn
as_forecast.default <- function(data,
Expand All @@ -75,6 +78,7 @@ as_forecast.default <- function(data,
model = NULL,
quantile_level = NULL,
sample_id = NULL,
check_forecast_type = TRUE,
...) {
# check inputs
data <- ensure_data.table(data)
Expand Down Expand Up @@ -119,42 +123,44 @@ as_forecast.default <- function(data,
}

# assert forecast type is as expected
assert_forecast_type(data, desired = forecast_type)
forecast_type <- get_forecast_type(data)
if (isTRUE(check_forecast_type)) {
assert_forecast_type(data, desired = forecast_type)
forecast_type <- get_forecast_type(data)

# produce warning if old format is suspected
# old quantile format
if (forecast_type == "point" && "quantile" %in% colnames(data)) {
#nolint start: keyword_quote_linter
cli_warn(
c(
"Found column 'quantile' in the input data",
"i" = "This column name was used before scoringutils version 2.0.0.
Should the column be called 'quantile_level' instead?"
),
.frequency = "once",
.frequency_id = "quantile_col_present"
)
#nolint end
}
#old binary format
if (forecast_type == "point") {
looks_binary <- check_input_binary(factor(data$observed), data$predicted)
if (is.logical(looks_binary)) {
#nolint start: keyword_quote_linter duplicate_argument_linter
# produce warning if old format is suspected
# old quantile format
if (forecast_type == "point" && "quantile" %in% colnames(data)) {
#nolint start: keyword_quote_linter
cli_warn(
c(
"All observations are either 1 or 0.",
"i" = "The forecast type was classified as 'point', but it looks like
it could be a binary forecast as well.",
"i" = "In case you want it to be a binary forecast, convert `observed`
to a factor. See `?as_forecast()` for more information."
"Found column 'quantile' in the input data",
"i" = "This column name was used before scoringutils version 2.0.0.
Should the column be called 'quantile_level' instead?"
),
.frequency = "once",
.frequency_id = "looks_binary"
.frequency_id = "quantile_col_present"
)
#nolint end
}
#old binary format
seabbs marked this conversation as resolved.
Show resolved Hide resolved
if (forecast_type == "point") {
looks_binary <- check_input_binary(factor(data$observed), data$predicted)
if (is.logical(looks_binary)) {
#nolint start: keyword_quote_linter duplicate_argument_linter
cli_warn(
c(
"All observations are either 1 or 0.",
"i" = "The forecast type was classified as 'point', but it looks like
it could be a binary forecast as well.",
"i" = "In case you want it to be a binary forecast, convert `observed`
to a factor. See `?as_forecast()` for more information."
),
.frequency = "once",
.frequency_id = "looks_binary"
)
#nolint end
}
}
}

# construct class
Expand Down Expand Up @@ -426,6 +432,33 @@ new_forecast <- function(data, classname) {
}


#' Remakes a forecast object with a subclass
#'
#' This function takes a forecast object and modifies its class name by removing
#' the old class name and adding a new class name. It also performs an assertion
#' check on the modified forecast object.
#'
#' @param forecast The forecast object to be modified
#' @param old_classname The name of the class to be removed from the forecast
#' object
#' @param new_classname The name of the class to be added to the forecast object
#' @param verbose A logical value indicating whether to print verbose output
#'
#' @return The modified forecast object
#' @keywords internal
remake_forecast <- function(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could just call as_forecast() instead. The current sample_to_quantile() function does that.
I'd be happy to avoid the additional code complexity of introducing a new function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this hence the new checking option and it still wouldn't work. This what motivated the question about refactoring as_forecast as the current approach with the type specific checking really doesn't seem feasible for future extensions.

I think we should make a new issue for this (related to the as_forecast issue). There is also a more general point about whether or not as_forecast should be able to reclass a forecast object or whether it does actually make sense workflow wise to have something that explicitly removes and then readds the class.

forecast, old_classname, new_classname, verbose = TRUE
) {
remade_forecast <- forecast
class(remade_forecast) <- setdiff(class(forecast), old_classname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_forecast() calls as.data.table() which strips the class anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and was having some weird errors. Can you confirm? More generally I think its helpful to make this explicit in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'll check it out and play around a bit with this. Circling back!

remade_forecast <- new_forecast(
remade_forecast, classname = new_classname
)
assert_forecast(remade_forecast, verbose = verbose)
return(remade_forecast)
}


#' @title Test whether an object is a forecast object
#'
#' @description
Expand Down
8 changes: 7 additions & 1 deletion R/summarise_scores.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
#' alternative to specifying `by` directly. If `across` is set, `by` will be
#' ignored. If `across` is `NULL` (default), then `by` will be used.
#' @param fun A function used for summarising scores. Default is [mean()].
#' @param metrics The metrics to summarise. If missing then `get_metrics()`
#' is used internally to infer these.
#' @param ... Additional parameters that can be passed to the summary function
#' provided to `fun`. For more information see the documentation of the
#' respective function.
Expand Down Expand Up @@ -60,13 +62,17 @@ summarise_scores <- function(scores,
by = "model",
across = NULL,
fun = mean,
metrics,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we strictly need an additional argument given that users can specify the metrics in score().

If we decide to have it then I'd vote for making the default metrics = get_metrics(scores).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does get_metric do if the input isn't a score object? That was the rub here.

I only added this so I can could use summarise_scores on more general objects and save introducing more code.

More generally I also quite like the idea that naive users can reduce their score object at this point as many likely won't know to manipulate the metric list going into score regardless of docs.

All of above being said I think this is a new issue to discuss.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense not to use summarise_scores() here (at least for now).

summarise_scores() ultimately is a glorified one-liner and the actual work happens here:

scores <- scores[, lapply(.SD, fun, ...),
    by = c(by),
    .SDcols = colnames(scores) %like% paste(metrics, collapse = "|")
  ]

Then we could discuss the metrics argument in a separate issue/PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

...) {
# input checking ------------------------------------------------------------
assert_data_frame(scores)
assert_subset(by, names(scores), empty.ok = TRUE)
assert_subset(across, names(scores), empty.ok = TRUE)
assert_function(fun)
metrics <- get_metrics(scores, error = TRUE)
if (missing(metrics)) {
metrics <- get_metrics(scores, error = TRUE)
}


forecast_unit <- get_forecast_unit(scores)

Expand Down
3 changes: 2 additions & 1 deletion R/z_globalVariables.R
Original file line number Diff line number Diff line change
Expand Up @@ -80,5 +80,6 @@ globalVariables(c(
"wis_component_name",
"x",
"y",
"g"
"g",
"target_quantile_level"
))