slider provides a family of general purpose “sliding window” functions. The API is purposefully very similar to purrr. The goal of these functions is usually to compute rolling averages, cumulative sums, rolling regressions, or other “window” based computations.
There are 3 core functions in slider:
slide()iterates over your data like
purrr::map(), but uses a sliding window to do so. It is type-stable, and always returns a result with the same size as its input.
slide_index()computes a rolling calculation relative to an index. If you have ever wanted to compute something like a “3 month rolling average” where the number of days in each month is irregular, you might like this function.
slide_period()is similar to
slide_index()in that it slides relative to an index, but it first breaks the index up into “time blocks”, like 2 month blocks of time, and then it slides over
.xusing indices defined by those blocks.
Each of these core functions have the same variants as
along with the other combinations of these variants that you might
expect from having previously used purrr.
To learn more about these three functions, read the introduction vignette.
There are also a set of extremely fast specialized variants of
slide_dbl() for the most common use cases. These include
for rolling sums and
slide_mean() for rolling averages. There are
index variants of each of these as well, like
Install the released version from CRAN with:
Install the development version from GitHub with:
The help page for
slide() has many
examples, but here are a few:
The classic example would be to do a moving average.
this with a combination of the
.after arguments, which
control the width of the window and the alignment.
# Moving average (Aligned right) # "The current element + 2 elements before" slide_dbl(1:5, ~mean(.x), .before = 2) #>  1.0 1.5 2.0 3.0 4.0 # Align left # "The current element + 2 elements after" slide_dbl(1:5, ~mean(.x), .after = 2) #>  2.0 3.0 4.0 4.5 5.0 # Center aligned # "The current element + 1 element before + 1 element after" slide_dbl(1:5, ~mean(.x), .before = 1, .after = 1) #>  1.5 2.0 3.0 4.0 4.5
Inf, you can do a “cumulative slide” to compute cumulative
expressions. I think of this as saying “give me everything before the
slide(1:4, ~.x, .before = Inf) #> [] #>  1 #> #> [] #>  1 2 #> #> [] #>  1 2 3 #> #> [] #>  1 2 3 4
.complete, you can decide whether or not
.f should be evaluated
on incomplete windows. In the following example, the requested window
size is 3, but the first two results are computed on windows of size 1
and 2 because partial results are allowed by default. When
is set to
TRUE, the first two results are not computed.
slide(1:4, ~.x, .before = 2) #> [] #>  1 #> #> [] #>  1 2 #> #> [] #>  1 2 3 #> #> [] #>  2 3 4 slide(1:4, ~.x, .before = 2, .complete = TRUE) #> [] #> NULL #> #> [] #> NULL #> #> [] #>  1 2 3 #> #> [] #>  2 3 4
slide() iterates over data frames in a row wise
fashion. Interestingly this means the default of
slide() becomes a
generic row wise iterator, with nice syntax for accessing data frame
There is a vignette specifically about this.
mini_cars <- cars[1:4,] slide(mini_cars, ~.x) #> [] #> speed dist #> 1 4 2 #> #> [] #> speed dist #> 1 4 10 #> #> [] #> speed dist #> 1 7 4 #> #> [] #> speed dist #> 1 7 22 slide_dbl(mini_cars, ~.x$speed + .x$dist) #>  6 14 11 29
This makes rolling regressions trivial!
library(tibble) set.seed(123) df <- tibble( y = rnorm(100), x = rnorm(100) ) # Window size of 20 rows # The current row + 19 before # (see slide_index() for how to do this relative to a date vector!) df$regressions <- slide(df, ~lm(y ~ x, data = .x), .before = 19, .complete = TRUE) df[15:25,] #> # A tibble: 11 × 3 #> y x regressions #> <dbl> <dbl> <list> #> 1 -0.556 0.519 <NULL> #> 2 1.79 0.301 <NULL> #> 3 0.498 0.106 <NULL> #> 4 -1.97 -0.641 <NULL> #> 5 0.701 -0.850 <NULL> #> 6 -0.473 -1.02 <lm> #> 7 -1.07 0.118 <lm> #> 8 -0.218 -0.947 <lm> #> 9 -1.03 -0.491 <lm> #> 10 -0.729 -0.256 <lm> #> 11 -0.625 1.84 <lm>
In many business settings, the value you want to compute is tied to some
index, like a date vector. In these cases, you’ll probably want to
compute sliding windows relative to the index, and not using the fixed
slide() provides. You can use
slide_index() to pass in
.x and an index,
.i, and the window will be calculated relative
to that index.
Here, when computing a “2 day window”, you probably don’t want
"2019-08-18" to be grouped together.
no concept of an index, so when you specify a window size of 2, it will
group these two together.
slide_index(), on the other hand, will do
the right thing.
x <- 1:3 i <- as.Date(c("2019-08-15", "2019-08-16", "2019-08-18")) # slide() has no concept of an "index" slide(x, ~.x, .before = 1) #> [] #>  1 #> #> [] #>  1 2 #> #> [] #>  2 3 # "index aware" slide_index(x, i, ~.x, .before = 1) #> [] #>  1 #> #> [] #>  1 2 #> #> [] #>  3
Essentially what happens is that when we get to
“looks backwards” 1 day to set a window boundary at
Since the date at position 2,
"2019-08-16", is before
it is not included.
Powerfully, you can pass through any object to
.before that computes a
.i - .before. This means that you could also have used a
lubridate period object (which gets even more interesting when you use
slide_index(x, i, ~.x, .before = lubridate::days(1)) #> [] #>  1 #> #> [] #>  1 2 #> #> [] #>  3
slide_period() is different from
slide_index() in that it first
breaks the index into “time blocks” and then slides over
to those blocks. For example, in the monthly period slide below,
broken up into 4 time blocks of “the current block of monthly data, plus
one block before this one”. The locations of those blocks are the
locations that are used to slice
i <- as.Date(c( "2019-01-29", "2019-01-30", "2019-02-05", "2019-04-01", "2019-05-10" )) slide_period(i, i, "month", ~.x, .before = 1) #> [] #>  "2019-01-29" "2019-01-30" #> #> [] #>  "2019-01-29" "2019-01-30" "2019-02-05" #> #> [] #>  "2019-04-01" #> #> [] #>  "2019-04-01" "2019-05-10"
One neat thing to notice is that
slide_period() is aware of the
distance between elements of
.i in the period you specify. The
practical implication of this is that in the above example, group 3 with
2019-04-01 did not include
2019-02-05 in it, because it is more
than 1 month group away.
This package is inspired heavily by SQL’s window functions. The API is similar, but more general because you can iterate over any kind of R object.
There have been multiple attempts at creating sliding window functions
(I personally created
rollify(), and worked a little bit on
tsibble::slide() with Earo Wang).
I believe that slider is the next iteration of these. There are a few reasons for this:
To me, the API is more intuitive, and is more flexible because
.afterlet you completely control the entry point (as opposed to fixed entry points like
It is objectively faster because it is written purely in C.
slide_vec()you can return any kind of object, and are not limited to the suffixed versions:
It iterates rowwise over data frames, consistent with the vctrs framework.
I believe it is overall more consistent, backed by a theory that can always justify the sliding window generated by any combination of the parameters.
Earo and I have spoken, and we have mutually agreed that it would be
best to deprecate
tsibble::slide() in favor of
non-equi joins have been pretty much the only solution to the problem
slide_index() tries to solve. Their solution is robust and quite
fast, and has been a nice benchmark for slider. slider is trying to
solve a much narrower problem, so the API here is more focused.
purrr::map(), the core functions of slider, such as
slide_index(), are optimized in C to be as fast as possible, but there
is overhead involved in calling
.f repeatedly. These functions are
meant to be as general purpose as possible, at the cost of some
performance. This means that slider can be used for more abstract
computations, like rolling regressions, or any other custom function
that you want to use in a rolling fashion.
slider also provides specialized functions for some of the most common
use cases, such as
slide_index_sum(). These compute
their corresponding metric at the C level, using a specialized
algorithm, and are often much faster than their
I’ve found the following references very useful to understand more about window functions:
Code of Conduct
Please note that the slider project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.