Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cvts? #63

Closed
savvaskef opened this issue Jan 14, 2017 · 9 comments
Closed

cvts? #63

savvaskef opened this issue Jan 14, 2017 · 9 comments

Comments

@savvaskef
Copy link

where can i find a tutorial/vignette/example about cvts?I know there is an example in the manual but i did nor understand how to use cvmod1 (fully packed with properties)constituents.Can you provide a link or an example.And a request on behalf of wide range of i would like to suggest a methodology. Why not building an example from easy to complex progressively adding parameters for the functions and comments on background needed(for illustration I am not very comfortable with logarithms and they appear as parametes everywhere...shouldn't there be a short definition of how they ae used , ie their properties related to the example)

Thnx again on behalf of may "students"

@dashaub
Copy link
Collaborator

dashaub commented Jan 15, 2017

A vignette for cvts() would definitely be a good addition to add. I recently added some better clarification of the documentation on the Github package version and will be releasing an update to CRAN within the next few weeks. I'll add some more clarification comments of the existing examples as well.

Until this lands on CRAN, hopefully these clarification will help:

  • The models slot contains each of the models objects fit to each cv fold.
  • The forecasts slot of the returned object contains each of the individual forecasts returned from each fold of cv. Each forecast should have the length set by maxHorizon
  • The residuals slot holds the corresponding residuals from each individual fold. It will similary have a total of maxHorizon rows corresponding to the forecast error for the 1st, 2nd, ... nth future period from that individual model's cv fold. The number of rows in the matrix correspond to the total number of folds.

One of the most interesting things to do is run accuracy() on the cvts object. This can be used to compare the accuracy of several forecasting methods on the time series.

accuracy(cvts(AirPassengers, FUN = thetam))
accuracy(cvts(AirPassengers, FUN = stlm))

Not sure what you mean about logarithms appearing everywhere.

@ganesh-krishnan
Copy link
Contributor

I wonder if we should add the characteristics of the cvts call itself to the cvts object. Was the object a rolling fit? What was the maxHorizon? What was the windowSize? etc.

@dashaub
Copy link
Collaborator

dashaub commented Jan 16, 2017

Good idea, I'll save those in the object.

@savvaskef
Copy link
Author

is it possible for you to clarify a couple of terms for all those that do not understand what the algorithm does? for example 1)what are folds? 2)what is a rolling fit (and consequently what is windowsize and maxhorizon) ?

a description of the algorithm would be very helpful(I bet you can explain in a couple of paragraphs all of those but it is missing from the manual)

@savvaskef
Copy link
Author

also related seems to be the cv.errors in forecastHybrid...what is the statistic according to which different models are weighted?is it rolling or for the whole series?

@ganesh-krishnan
Copy link
Contributor

ganesh-krishnan commented Jan 16, 2017

@savvaskef the documentation appears plenty clear enough. Regardless, here is some clarification.

Edit: apologies, looks like the documentation was edited recently

Regarding folds, this is not as relevant to time series cross validation and is only meant to serve as an analogy to regular cross validation (non time-series data). In non time-series data, if you perform k-fold cross validation, you will split the dataset into k partitions or folds. For each fold, you will train the model on the other folds and test on the current fold. As an example, let us say you are performing 5 fold cross validation. Then the dataset will be split into folds 1-5. Fold 1 will be held out and folds 2-5 will be used to fit the model. This process will be repeated for fold 2-5. For fold 2, folds 1, 3, 4, 5 will be used for model fitting.

In non time series data, each of the rows or cases or independent of each other and can thus be sampled independently. This is however, not the case for time series models. Model fitting for time series depends on the observations being sequential. In order to get an idea of the generalization performance of a time series model, the sampling has to be in line with the model fitting procedure. Two ways to do this are to use rolling (or sliding) windows and non-rolling windows. You can read up about it here. It also has nice diagrams to help you understand the procedures.

Regarding horizons, this is standard forecasting terminology. A simple google search led me to this link. I suggest you spend some time reading up on time series since you seem to have some very basic questions.

@dashaub
Copy link
Collaborator

dashaub commented Jan 16, 2017

@savvaskef Glad to get feedback on the clarity of this function and how to improve it. When writing the documentation one of my concerns was that it may not be clear to others. I think a vignette could help here a lot, particularly one that includes the type of graphics in Rob's blog post. I'll add this to the package roadmap. @ganesh-krishnan is right that you should probably read up on the terminology outside of just the cvts documentation. The documentation does assume some existing knowledge and probably wouldn't be comprehensible without it. That said, I chose default values here for cvts() that should be sensible for most decently-long time series if you just want to use cross validation in hybridModel() with errorMethod = "cv.errors". I can also suggest just playing around with it with different input time series and values of the maxHorizon, rolling, windowSize parameters and examining the resulting cvts object to see what is going on. Even examining the code for the function could help. The source is all available here after all, so you can see exactly what it is doing.

@dashaub
Copy link
Collaborator

dashaub commented Jan 17, 2017

Related #66

@dashaub dashaub closed this as completed Jan 17, 2017
@russellcameronthomas
Copy link

russellcameronthomas commented Aug 3, 2018

@dashaub If you are writing a vignette for cvts(), I would like you to include examples where you show how to cross validate a hybrid model that has been previously fit using hybridModel(), where model weights have been generated and special arguments are sent to some component models.

It seems obvious that there should be a separate function to cross validate a hybrid model after it is fit. Am I missing something????

I know that there is the cv.errors option in hybridModel(), but I can't get it to work successfully when I include the stlm model because I get the error "series is not periodic or has less than two periods". I've tried everything I can to eliminate this error, with no success. I can successfully run stlm() separately on the same data and the same parameters.


General comment: your package is very good and very useful. Generally the documentation is good (at least better than most). But like nearly all R packages, even better documentation is, by far, the best way to improve usability and popularity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants