Is the Point Forecast calculation wrong? #37

Closed
mhnierhoff opened this Issue Jun 8, 2016 · 18 comments

Projects

None yet

3 participants

@mhnierhoff

Forst of all, really great package with tons of adjustment possibilities.I really like it.
But for me it seems that the point forecast calculation is kind of weird. (Please check the attached image) If it is not wrong, can you please explain me the calculation of it? Thanks in advance.

rplot_060816_060538_pm

@ellisp
Owner
ellisp commented Jun 8, 2016

That does look odd. Can you provide a reproducible example?

@dashaub
Collaborator
dashaub commented Jun 8, 2016

nnetar models currently do not produce prediction intervals (this is, however, on the roadmap for 2016 in the "forecast" package, see here: http://robjhyndman.com/hyndsight/forecast7-part-2/). For now the upper prediction interval is constructed by taking the highest component model prediction interval and the lowest prediction interval is formed with the lowest model prediction interval. If the forecast from the nnetar component model falls outside the range of these intervals, it is possible for your problem to happen. It's not exactly a bug, but it certainly isn't a feature. I'm just accepting that behavior for now until we get proper prediction intervals.

In general you're probably better off leaving weights = "equal" instead of weights = "insample.error". Neural net models will tend to overfit on the training set and have small errors, so they will get higher weights than they deserve in the ensmble, and this type of problem will manifest itself more.

@dashaub
Collaborator
dashaub commented Jun 8, 2016

Also in issue #19 we're considering ways of adjusting the prediction intervals. While that wouldn't exactly solve this problem since the nnetar models don't produce prediction intervals, there is probably utility in allowing alternative methods other than the simple max/min for each prediction intervals.

@dashaub dashaub added the bug label Jun 8, 2016
@ellisp
Owner
ellisp commented Jun 9, 2016

For now, perhaps we should make the upper prediction interval as max(Prediction Interval One, Prediction Interval 2...., point estimate 1, point estimate 2) and lower prediction interval the equivalent. So at least the interval is forced to include the highest and lowest point estimates in the case of contributing models that don't have a prediction interval?

@ellisp ellisp self-assigned this Jun 9, 2016
@mhnierhoff

Hi @ellisp & @dashaub thanks for your quick replies. Although it has now been clarified that it is quasi a bug, below a reproducible example:

suppressPackageStartupMessages(c(
  library(data.table),
  library(lubridate),
  library(xts),
  library(zoo),
  library(magrittr),
  library(plyr),
  library(dplyr),
  library(forecast),
  library(forecastHybrid),
  library(fma)))

####-1-####
beer_Hybrid_Model_1 <- hybridModel( y = beer,
  models = "ns",
  n.args = list(repeats = 5000, size = 25),
  s.args = list(robust = TRUE, biasadj = TRUE),
  weights = "insample.errors", 
  errorMethod = "MASE")

beer_Hybrid_FC_1 <- forecast(beer_Hybrid_Model_1, h = 18)
beer_Hybrid_Acc_1 <- t(accuracy(beer_Hybrid_FC_1))
beer_Hybrid_Acc_1
 Training set

ME 0.21129899
RMSE 3.28243652
MAE 2.39216998
MPE 0.04648739
MAPE 1.69597399
MASE 0.26053336
ACF1 -0.17526242

Plot V1:
v1

####-2-####
beer_Hybrid_Model_2 <- hybridModel(y = beer,
  models = "ns",
  n.args = list(repeats = 5000, size = 25),
  s.args = list(robust = TRUE, biasadj = TRUE))

beer_Hybrid_FC_2 <- forecast(beer_Hybrid_Model_2, h = 18)
beer_Hybrid_Acc_2 <- t(accuracy(beer_Hybrid_FC_2))
beer_Hybrid_Acc_2
 Training set

ME 0.4324798
RMSE 4.1162260
MAE 3.2629172
MPE 0.1657647
MAPE 2.2770645
MASE 0.3553672
ACF1 -0.2103333

Plot V2:
v2

####-3-####
beer_Hybrid_Model_3 <- hybridModel(y = beer,
  models = "ns",
  n.args = list(repeats = 5000, size = 25),
  s.args = list(robust = TRUE, biasadj = TRUE),
  weights = "equal")

beer_Hybrid_FC_3 <- forecast(beer_Hybrid_Model_3, h = 18)
beer_Hybrid_Acc_3 <- t(accuracy(beer_Hybrid_FC_3))
beer_Hybrid_Acc_3
 Training set

ME 0.4319815
RMSE 4.1376981
MAE 3.2780442
MPE 0.1645838
MAPE 2.2877603
MASE 0.3570147
ACF1 -0.2111934

Plot V3:
v3

As you can see the point forecast isn't the mean in all three cases.

Thanks again for your work and your commitment. ๐Ÿ‘

@dashaub
Collaborator
dashaub commented Jun 10, 2016

@ellisp Nice idea

@mhnierhoff I didn't run the first two examples, but the point estimates in the third example look fine to me. Am I missing something?

# All true
beer_Hybrid_FC_3$mean == (beer_Hybrid_Model_3$weights["nnetar"]  * beer_Hybrid_FC_3$nnetar$mean + beer_Hybrid_Model_3$weights["stlm"]  * beer_Hybrid_FC_3$stlm$mean)
@dashaub
Collaborator
dashaub commented Jun 10, 2016

@ellisp On second thought, there is a fairly large potential downside to this: package stability and user transparency. In cases such as the above models where the bug occurs, the offending model probably isn't the best component model (since its forecast strongly disagrees with the others), so pulling the overall point estimate closer to the others would be beneficial for out-of-sample accuracy. However, I don't like the idea of doing this without documenting this unexpected behavior and at least throwing a warning. Furthermore, once prediction intervals are finally implemented in the "forecast" package, we could remove this temporary measure, but then our point forecasts between the old and new versions would not match, something that is very bad.

It might just be better to accept this idiosyncratic behavior for now. The point forecasts are more important than the prediction intervals, so tampering with those seems a little wrong. This scenario also shouldn't arise too often (more likely when the nnetar model gets a large weight and few other models are used, such as here). Finally, which upper/lower prediction interval do we pull the point forward to? If we're using the default level = c(80, 95), the offending point forecast will be pulled to a different number than if we use level = 70 or level = 80, etc, fan = TRUE. another undesirable characteristic of this approach.

@mhnierhoff Here are the other two models. They look fine too.

# All true
beer_Hybrid_FC_1$mean == (beer_Hybrid_Model_1$weights["nnetar"]  * beer_Hybrid_FC_1$nnetar$mean + beer_Hybrid_Model_1$weights["stlm"]  * beer_Hybrid_FC_1$stlm$mean)

# All true
beer_Hybrid_FC_2$mean == (beer_Hybrid_Model_2$weights["nnetar"]  * beer_Hybrid_FC_2$nnetar$mean + beer_Hybrid_Model_2$weights["stlm"]  * beer_Hybrid_FC_2$stlm$mean)
@mhnierhoff

@dashaub Doing it like you for the hybrid model of the first posted screenshot I get this error message:

Error in NextMethod(.Generic) : cannot assign 'tsp' to zero-length vector

@dashaub
Collaborator
dashaub commented Jun 10, 2016 edited

Do the individual left sides or right sides run fine? If the error is on the right side, which part?

@mhnierhoff

@dashaub Sorry, I don't know what you mean by right and left side?
Independent of that, I've recalculated the very first forecast after your package update. Again, the Point Forecast isn't really the mean in my opion and so I calculated the mean from the 95% upper and lower bound and inserted it as a red line into the attached plot.

rplot01_061316_110612_am

@dashaub
Collaborator
dashaub commented Jun 13, 2016

@mhnierhoff To be clear, as far as I've seen here it is the prediction intervals that are at fault here. The mean forecast is constructed at each forecast horizon by taking the individual models' point forecasts and multiplying by the model's weight. The prediction intervals will have this bug if you use a nnetar model because the nnetar model currently does not produce a prediction interval. Try running your forecasts again with other component models such as aest, and the intervals should appear normal.

@mhnierhoff
mhnierhoff commented Jun 13, 2016 edited

@dashaub Okay, I got it. But unfortunately I get the best accuracy values when I use a ns model. And I have compared the point forecast of a normal stlf model from the forecast package with the manually calculated hybrid point forecast and they are really close.

@dashaub
Collaborator
dashaub commented Jun 13, 2016

@mhnierhoff Keep in mind that insample errors are usually a poor measure of how the model will forecast in the future, so using withheld data or cross validation will give a better performance metric; I'd personally never use weights = "insample.errors". If you're already using CV or a withheld test set for selecting the best model and the prediction intervals/point forecast discrepancy still bothers you, you can make the prediction intervals wider by setting level to a larger number, e.g. 99, 99.9, etc until the point forecast eventually is captured in the intervals. They still won't be the "mean" in the sense of being in the middle of the prediction intervals, but they will asymptotically converge to the mean as level approaches 100, and as we know in statistics, asymptotic convergence indicates everything is ok ;)

@mhnierhoff

@dashaub Thanks a lot for your helpful replies. ๐Ÿ‘

@dashaub
Collaborator
dashaub commented Oct 13, 2016

@mhnierhoff Good news, "forecast" v7.3 was released today and included an argument in forecast.nnetar() to create prediction intervals for nnetar objects. I've modified the code so these are now allowed in the ensemble. Could you grab the development version (0.3.0) of the package from github and see if this has fixed your problem?

@mhnierhoff

@dashaub Hi, thanks for the reminder. I tested it with the code above and unfortunately the "forecast" function won't work. I don't get any error message but it runs infinitely.
I've updated to forecast v7.3 and used your dev version.

@dashaub
Collaborator
dashaub commented Oct 17, 2016

Rerunning the example above works for me. I changed repeats to 50 since it would be very slow otherwise. Keep in mind running forecast.nnetar() to create prediction intervals will be very slow since it has to run simulations; on my i7 processor this takes about 1 minute with only 50 repeats. The time should scale linearly with repeats, so repeats = 5000 would take about 500 minutes. Try it again with smaller repeats. The results look much better than above.

beer_Hybrid_Model_1 <- hybridModel( y = beer,
  models = "ns",
  n.args = list(repeats = 50, size = 25),
  s.args = list(robust = TRUE, biasadj = TRUE),
  weights = "insample.errors", 
  errorMethod = "MASE")
beer_Hybrid_FC_1 <- forecast(beer_Hybrid_Model_1, h = 18)
plot(beer_Hybrid_FC_1)

rplot

@mhnierhoff

Changing from 500 to 50 repeats make also work for me, so everything works now. Thanks a lot for your efforts.

@dashaub dashaub closed this Oct 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment