-
As far as I understand, reductions are an important part of sktime, but this is my first time seeing this, so I'm not particularly sure why this approach works. I watched some sktime talks (like https://youtu.be/Wf2naBHRo8Q?t=1880) and read the sktime papers [1, 2]. I'm also familiar with the basic forecasting techniques like ARIMA-GARCH and exponential smoothing. What these papers are saying is that "a common example of reduction is to solve classical forecasting through time series regression via a sliding window approach and iteration over the forecasting horizon" [2] and "we first split the training series into fixed-length windows and stack them on top of each other. This gives us a matrix of lagged values in a tabular format, and thus allows us to apply any tabular regression algorithm" [1]. When discussing reduction and sliding windows, both papers cite [3], which goes on to cite papers from the 90s and earlier. Are there any more recent treatments of this "reduction + sliding windows" approach? I found another 2022 talk where the conclusion is: "forecasting can be treated as a tabular ML task [reduction?] and can compete with statistical models" (https://youtu.be/9QtL7m3YS9I?t=2044), but I don't think this talk really explained why such an approach works. In particular, since windows are constructed from dependent data, then, for example, windows Could someone please recommend some papers or maybe textbooks that explain what reduction is, why it works and how to use it in more detail? Sorry if such requests aren't allowed.
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
I think the reference for "reduction" in the context of forecasting and ML, specifically the sliding window tabluation strategies, is indeed the Bontempi paper. In For this more general concept, you can have a look at our 2019 and 2020 papers, or also "Designing Machine Learning Toolboxes: Concepts, Principles and Patterns" (2021) https://arxiv.org/abs/2101.04938
Perhaps the most prominent recent discussion is around the M4 competition, and the M5 competition where the ML approaches reappear in the accurary/probabilistic context. A good starting point is "The M4 Competition: Results, findings, conclusion and way forward".
Regarding the independence question: a common assumption for ML methods on tabular data, i.e., in the supervised learning space, are indeed i.i.d. data/label pairs. However, this is neither sufficient for a method to work - imagine a bad algorithm - nor necessary (see below). It's just a common assumption in the analysis of algorithms. Why it works for time series can be gleaned from an example where the data are stationary. Assume a situation where we have a stationary time series If you now add in more overlapping blocks, the sliding window reduction method just gets "more information", and it's similar enough that it won't make things worse usually. More quantitative statements can be derived, along the lines of using an "effective sample size" (which is smaller due to auto-correlation than Now, I cannot point to a paper that would carry out the full formal approach here to show that this is indeed the case, for abstract ML models, but it also shouldn't be too hard. (note: a key assumption in this discussion was stationarity, or, more weakly, certain properties of the autocorrelation) |
Beta Was this translation helpful? Give feedback.
I think the reference for "reduction" in the context of forecasting and ML, specifically the sliding window tabluation strategies, is indeed the Bontempi paper.
In
sktime
, we use it slightly more generally, in that it is a composition where one estimator of a certain type is used to solve a problem of a different type, e.g., a supervised regressor is used for forecasting as part of an algorithm that can call the regressor interface.For this more general concept, you can have a look at our 2019 and 2020 papers, or also "Designing Machine Learning Toolboxes: Concepts, Principles and Patterns" (20…