Online algorithm and increasing exponential weights #23

Eluvias · 2019-11-07T17:33:18Z

Issue description

We don’t get the expected results when online = TRUE and the weights
is a vector of increasing exponential weights, i.e., within the sliding
window the most recent observation has the smallest weight and the
oldest the highest weight (grows exponentially as we go further back in time).

Minimal example:

roll_sum(1:5,
         width = 2L,
         weights = c(1, 0),
         online = TRUE)

## [1]  NA NaN NaN NaN NaN

# NOT EXP. WEIGHTS BUT PROBLEMATIC CASE
roll_sum(1:5,
         width = 3L,
         weights = c(1, 0, 0),
         online = TRUE)

## [1]  NA  NA NaN NaN NaN

When switching to off-line it works as expected:

roll_sum(1:5,
         width = 2L,
         weights = c(1, 0),
         online = FALSE)

## [1] NA  1  2  3  4

roll_sum(1:5,
         width = 3L,
         weights = c(1, 0, 0),
         online = FALSE)

## [1] NA NA  1  2  3

The above examples might be considered corner case because of the
selected weights.

So let’s consider the following example that it seems to work as
expected:

# ONLINE AND INCR. EXP. WEIGHTS
roll_sum(1:5,
         width = 2L,
         weights = c(1, 0.8),
         online = TRUE)

## [1]  NA 2.6 4.4 6.2 8.0

# OFFLINE AND INCR. EXP. WEIGHTS
roll_sum(1:5,
         width = 2L,
         weights = c(1, 0.8),
         online = FALSE)

## [1]  NA 2.6 4.4 6.2 8.0

The above examples are equal, although not identical. In addition, we
used a vector of length 5. So let’s re-run the above examples by
increasing the input size.

# ONLINE AND INCR. EXP. WEIGHTS
res.online <- roll_sum(1:200,
                       width = 2L,
                       weights = c(1, 0.8),
                       online = TRUE)

# OFFLINE AND INCR. EXP. WEIGHTS
res.offline <- roll_sum(1:200,
                        width = 2L,
                        weights = c(1, 0.8),
                        online = FALSE)

all.equal(res.offline, res.online)

## [1] "Mean relative difference: 2.976919"

identical(res.offline, res.online)

## [1] FALSE

By increasing the vector size to 200, the online version didn’t give the
expected result. To get an idea of the different results between online
and offine compare the following:

tail(res.offline)

## [1] 350.0 351.8 353.6 355.4 357.2 359.0

tail(res.online)

## [1]  7362.624  9117.580 11310.825 14051.932 17477.865 21759.831

For completeness, for arbitrary weights the algorithim switches to
offline mode and throws a warning:

# ONLINE AND ARB. WEIGHTS
res.online <- roll_sum(1:200,
                       width = 2L,
                       weights = c(1, 0.8, 0.2),
                       online = TRUE)

## Warning in roll_sum(1:200, width = 2L, weights = c(1, 0.8, 0.2), online =
## TRUE): 'online' is only supported for equal or exponential 'weights'

Multiple Test Scenarios

To investigate further, I have created a custom test function to check
the output from base vs roll-online vs roll-offline for sum function.
The custom function accepts two input vectors and has the option to
reverse the weights.

run_check <- function(x, y, wts, rev_wts = FALSE) {

  n <- length(wts)

  if (rev_wts) {
    wts <- rev(wts)
  }

    x_base <- zoo::rollapplyr(x, n, function(x) sum(x * wts), fill = NA)
    y_base <- zoo::rollapplyr(y, n, function(x) sum(x * wts), fill = NA)



  x_online_true <- roll_sum(x, width = n,
                               weights = wts,
                               online = TRUE)

  x_online_false <- roll_sum(x, width = n,
                                weights = wts,
                                online = FALSE)




  y_online_true <- roll_sum(y, width = n,
                                weights = wts,
                                online = TRUE)

  y_online_false <- roll_sum(y, width = n,
                                weights = wts,
                                online = FALSE)


  out <- data.frame(x = c(all.equal(x_base, x_online_false),
                              all.equal(x_online_true, x_online_false)),
                    y = c(all.equal(y_base, y_online_false),
                              all.equal(y_online_true, y_online_false))
                    )


  rownames(out) <- c("base vs offline  : ",
                     "online vs offline: ")
  colnames(out) <- c(paste("x", NROW(x), sep = "_"),
                     paste("y", NROW(y), sep = "_"))

  out
}

Test inputs:

x_100 <- 1:100
x_1000 <- 1:1000

Case: Decreasing Exponential Weights

Takeaways:

All test cases worked as expected.

rv <- FALSE

run_check(x_100, x_1000,  0.5 ^ (3:0), rev_wts = rv)  # works both ways

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

run_check(x_100, x_1000,  0.51 ^ (3:0), rev_wts = rv)

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

run_check(x_100, x_1000,  0.9 ^ (3:0), rev_wts = rv)

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

run_check(x_100, x_1000,  2 ^ (0:3), rev_wts = rv)

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

Case: Increasing Exponential Weights

Similar test cases as above with their weights reversed.

Takeaways:

base vs offline test cases produce equal results.
online vs offline test cases produce inconsistent results.
Correct results when weights either increase or decrease by half,
i.e, 0.5 ^ (3:0) or rev(0.5 ^ (3:0)) will give the same result
under online or offline.

rv <- TRUE

run_check(x_100, x_1000,  0.5 ^ (3:0), rev_wts = rv)  # works both ways

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

run_check(x_100, x_1000,  0.51 ^ (3:0), rev_wts = rv)

##                                           x_100
## base vs offline  :                         TRUE
## online vs offline:  Mean relative difference: 1
##                                          y_1000
## base vs offline  :                         TRUE
## online vs offline:  Mean relative difference: 1

run_check(x_100, x_1000,  0.9 ^ (3:0), rev_wts = rv)

##                     x_100                      y_1000
## base vs offline  :   TRUE                        TRUE
## online vs offline:   TRUE Mean relative difference: 1

run_check(x_100, x_1000,  2 ^ (0:3), rev_wts = rv)

##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

The text was updated successfully, but these errors were encountered:

jasonjfoster · 2019-11-10T10:59:01Z

Thanks for the report. Looks like the second issue is related to floating point arithmetic, which is (unfortunately) because online algorithms are prone to the loss of precision due to round-off errors. The issue is magnified when both the window is rolling (because observations are subtracted) and the exponential weights are increasing, like you have demonstrated.

In terms of solutions, if both of these conditions are met then I think it would make sense for the roll package to warn the user that the likelihood for the loss of precision is high and either suggest or require the offline methodology. My preference is to require the offline methodology because otherwise the user will need to identify the issue, if applicable (and that is unlikely). What do you think?

Also, the first issue is an overlook on my part and will be resolved, i.e. when weights are something like c(1, 0, 0) then the function should use the offline methodology too.

Eluvias · 2019-11-11T18:35:14Z

I will agree that the way to go is to throw a warning and revert to offline algorithm when (1) we pass rolling weights which grow exponentially and (2) the online algorithm is used (default), e.g., online = TRUE.

Here is a table which shows on which weighting scheme the online or offine algorithm is available:

WEIGHTS	ONLINE	OFFLINE
EQUAL	YES	YES
ARBITRARY	NO	YES
EXP-DECAY	YES	YES
EXP-GROWTH	NO	YES

Looking at this table, this question comes naturally: Why the offline algorithm is not the default method?

jangorecki · 2019-11-11T21:34:13Z

I believe online has a smaller time complexity, and it makes sense to offer faster method as the default one

jasonjfoster · 2019-11-15T11:59:09Z

Thanks for the report. I have fixed the issue in the development version:

# install.packages("devtools")
devtools::install_github("jjf234/roll")

I agree with @jangorecki and also my sense is that most users don’t change the default weights argument either. If so, I would expect exponential decay weights to be the next most common use case (as users typically want to emphasize more recent data in time-series). Because equal and exponential-decay weights are the most common use cases, it makes sense to set online = TRUE as the default.

Eluvias · 2019-11-15T23:04:29Z

Thank you for your prompt response on this issue. I confirm now that the issues have been resolved on the dev. version.

Feel free to close it.

jasonjfoster closed this as completed Nov 16, 2019

jasonjfoster added bug enhancement labels Nov 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Online algorithm and increasing exponential weights #23

Online algorithm and increasing exponential weights #23

Eluvias commented Nov 7, 2019 •

edited

jasonjfoster commented Nov 10, 2019

Eluvias commented Nov 11, 2019 •

edited

jangorecki commented Nov 11, 2019

jasonjfoster commented Nov 15, 2019

Eluvias commented Nov 15, 2019

Online algorithm and increasing exponential weights #23

Online algorithm and increasing exponential weights #23

Comments

Eluvias commented Nov 7, 2019 • edited

Issue description

Minimal example:

Multiple Test Scenarios

Case: Decreasing Exponential Weights

Case: Increasing Exponential Weights

jasonjfoster commented Nov 10, 2019

Eluvias commented Nov 11, 2019 • edited

jangorecki commented Nov 11, 2019

jasonjfoster commented Nov 15, 2019

Eluvias commented Nov 15, 2019

Eluvias commented Nov 7, 2019 •

edited

Eluvias commented Nov 11, 2019 •

edited