Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online algorithm and increasing exponential weights #23

Closed
Eluvias opened this issue Nov 7, 2019 · 5 comments
Closed

Online algorithm and increasing exponential weights #23

Eluvias opened this issue Nov 7, 2019 · 5 comments

Comments

@Eluvias
Copy link

Eluvias commented Nov 7, 2019

Issue description

We don’t get the expected results when online = TRUE and the weights
is a vector of increasing exponential weights, i.e., within the sliding
window the most recent observation has the smallest weight and the
oldest the highest weight (grows exponentially as we go further back in time).

Minimal example:

roll_sum(1:5,
         width = 2L,
         weights = c(1, 0),
         online = TRUE)
## [1]  NA NaN NaN NaN NaN
# NOT EXP. WEIGHTS BUT PROBLEMATIC CASE
roll_sum(1:5,
         width = 3L,
         weights = c(1, 0, 0),
         online = TRUE)
## [1]  NA  NA NaN NaN NaN

When switching to off-line it works as expected:

roll_sum(1:5,
         width = 2L,
         weights = c(1, 0),
         online = FALSE)
## [1] NA  1  2  3  4
roll_sum(1:5,
         width = 3L,
         weights = c(1, 0, 0),
         online = FALSE)
## [1] NA NA  1  2  3

The above examples might be considered corner case because of the
selected weights.

So let’s consider the following example that it seems to work as
expected:

# ONLINE AND INCR. EXP. WEIGHTS
roll_sum(1:5,
         width = 2L,
         weights = c(1, 0.8),
         online = TRUE)
## [1]  NA 2.6 4.4 6.2 8.0
# OFFLINE AND INCR. EXP. WEIGHTS
roll_sum(1:5,
         width = 2L,
         weights = c(1, 0.8),
         online = FALSE)
## [1]  NA 2.6 4.4 6.2 8.0

The above examples are equal, although not identical. In addition, we
used a vector of length 5. So let’s re-run the above examples by
increasing the input size.

# ONLINE AND INCR. EXP. WEIGHTS
res.online <- roll_sum(1:200,
                       width = 2L,
                       weights = c(1, 0.8),
                       online = TRUE)

# OFFLINE AND INCR. EXP. WEIGHTS
res.offline <- roll_sum(1:200,
                        width = 2L,
                        weights = c(1, 0.8),
                        online = FALSE)

all.equal(res.offline, res.online)
## [1] "Mean relative difference: 2.976919"
identical(res.offline, res.online)
## [1] FALSE

By increasing the vector size to 200, the online version didn’t give the
expected result. To get an idea of the different results between online
and offine compare the following:

tail(res.offline)
## [1] 350.0 351.8 353.6 355.4 357.2 359.0
tail(res.online)
## [1]  7362.624  9117.580 11310.825 14051.932 17477.865 21759.831

For completeness, for arbitrary weights the algorithim switches to
offline mode and throws a warning:

# ONLINE AND ARB. WEIGHTS
res.online <- roll_sum(1:200,
                       width = 2L,
                       weights = c(1, 0.8, 0.2),
                       online = TRUE)
## Warning in roll_sum(1:200, width = 2L, weights = c(1, 0.8, 0.2), online =
## TRUE): 'online' is only supported for equal or exponential 'weights'

Multiple Test Scenarios

To investigate further, I have created a custom test function to check
the output from base vs roll-online vs roll-offline for sum function.
The custom function accepts two input vectors and has the option to
reverse the weights.

run_check <- function(x, y, wts, rev_wts = FALSE) {

  n <- length(wts)

  if (rev_wts) {
    wts <- rev(wts)
  }

    x_base <- zoo::rollapplyr(x, n, function(x) sum(x * wts), fill = NA)
    y_base <- zoo::rollapplyr(y, n, function(x) sum(x * wts), fill = NA)



  x_online_true <- roll_sum(x, width = n,
                               weights = wts,
                               online = TRUE)

  x_online_false <- roll_sum(x, width = n,
                                weights = wts,
                                online = FALSE)




  y_online_true <- roll_sum(y, width = n,
                                weights = wts,
                                online = TRUE)

  y_online_false <- roll_sum(y, width = n,
                                weights = wts,
                                online = FALSE)


  out <- data.frame(x = c(all.equal(x_base, x_online_false),
                              all.equal(x_online_true, x_online_false)),
                    y = c(all.equal(y_base, y_online_false),
                              all.equal(y_online_true, y_online_false))
                    )


  rownames(out) <- c("base vs offline  : ",
                     "online vs offline: ")
  colnames(out) <- c(paste("x", NROW(x), sep = "_"),
                     paste("y", NROW(y), sep = "_"))

  out
}

Test inputs:

x_100 <- 1:100
x_1000 <- 1:1000

Case: Decreasing Exponential Weights

Takeaways:

  • All test cases worked as expected.
rv <- FALSE

run_check(x_100, x_1000,  0.5 ^ (3:0), rev_wts = rv)  # works both ways
##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE
run_check(x_100, x_1000,  0.51 ^ (3:0), rev_wts = rv)
##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE
run_check(x_100, x_1000,  0.9 ^ (3:0), rev_wts = rv)
##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE
run_check(x_100, x_1000,  2 ^ (0:3), rev_wts = rv)
##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE

Case: Increasing Exponential Weights

Similar test cases as above with their weights reversed.

Takeaways:

  • base vs offline test cases produce equal results.
  • online vs offline test cases produce inconsistent results.
  • Correct results when weights either increase or decrease by half,
    i.e, 0.5 ^ (3:0) or rev(0.5 ^ (3:0)) will give the same result
    under online or offline.
rv <- TRUE

run_check(x_100, x_1000,  0.5 ^ (3:0), rev_wts = rv)  # works both ways
##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE
run_check(x_100, x_1000,  0.51 ^ (3:0), rev_wts = rv)
##                                           x_100
## base vs offline  :                         TRUE
## online vs offline:  Mean relative difference: 1
##                                          y_1000
## base vs offline  :                         TRUE
## online vs offline:  Mean relative difference: 1
run_check(x_100, x_1000,  0.9 ^ (3:0), rev_wts = rv)
##                     x_100                      y_1000
## base vs offline  :   TRUE                        TRUE
## online vs offline:   TRUE Mean relative difference: 1
run_check(x_100, x_1000,  2 ^ (0:3), rev_wts = rv)
##                     x_100 y_1000
## base vs offline  :   TRUE   TRUE
## online vs offline:   TRUE   TRUE
@jasonjfoster
Copy link
Owner

Thanks for the report. Looks like the second issue is related to floating point arithmetic, which is (unfortunately) because online algorithms are prone to the loss of precision due to round-off errors. The issue is magnified when both the window is rolling (because observations are subtracted) and the exponential weights are increasing, like you have demonstrated.

In terms of solutions, if both of these conditions are met then I think it would make sense for the roll package to warn the user that the likelihood for the loss of precision is high and either suggest or require the offline methodology. My preference is to require the offline methodology because otherwise the user will need to identify the issue, if applicable (and that is unlikely). What do you think?

Also, the first issue is an overlook on my part and will be resolved, i.e. when weights are something like c(1, 0, 0) then the function should use the offline methodology too.

@Eluvias
Copy link
Author

Eluvias commented Nov 11, 2019

I will agree that the way to go is to throw a warning and revert to offline algorithm when (1) we pass rolling weights which grow exponentially and (2) the online algorithm is used (default), e.g., online = TRUE.

Here is a table which shows on which weighting scheme the online or offine algorithm is available:

WEIGHTS ONLINE OFFLINE
EQUAL YES YES
ARBITRARY NO YES
EXP-DECAY YES YES
EXP-GROWTH NO YES

Looking at this table, this question comes naturally: Why the offline algorithm is not the default method?

@jangorecki
Copy link

I believe online has a smaller time complexity, and it makes sense to offer faster method as the default one

@jasonjfoster
Copy link
Owner

Thanks for the report. I have fixed the issue in the development version:

# install.packages("devtools")
devtools::install_github("jjf234/roll")

I agree with @jangorecki and also my sense is that most users don’t change the default weights argument either. If so, I would expect exponential decay weights to be the next most common use case (as users typically want to emphasize more recent data in time-series). Because equal and exponential-decay weights are the most common use cases, it makes sense to set online = TRUE as the default.

@Eluvias
Copy link
Author

Eluvias commented Nov 15, 2019

Thank you for your prompt response on this issue. I confirm now that the issues have been resolved on the dev. version.

Feel free to close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants