-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] panel forecasting should also work to forecast only a single time series (not all) #4209
Comments
Hm, it would be much appreciated if you could try whether you can cause the problem with an on-board dataset of To generate datasets, you can use the generators from the (and if you cannot cause this with an onboard dataset, that would also be useful information, with code that you tried) |
I couldn't quite find a panel dataset. Perhaps I missed something? In any case, I think I have quite the minimal example below with just 9 rows and 3 time series in one dataset. My repro:
results in
I believe the key here is the "instances" value of the second time series (1). Only the first time series (0) is actually present because I just so happen to be interested in only 0 this time. But I may have more queries at a later point. At that later point, I may just query for a few points of time series 1, or perhaps 2. Or maybe I'm using sktime completely the wrong way here. In that case, I would very much appreciate your feedback, of course 🙂 |
@romanlutz, sorry for the late reply, was travelling. I think the issue is a feature limitation of There is currently no way to tell Would appreciate your thoughts. |
Thanks, @fkiraly, no worries at all! We all live lives outside of open source 😄 I'm usually working with Azure ML forecasting models and they all support it. Hmm, this is kind of a blocker for my scenario. I am building a little interactive visualization and need to be able to query for individual time series forecasts rather than everything. I could have a model per time series, of course, but that seems a bit tedious. Alternatively, I could get all the forecasts and then discard what I don't need. In any case, it seems like this is more of a feature request than a bug 🤣 Should we rephrase it or close and open a new item with a clearly stated feature request? I understand that it's not necessarily something that'll happen anytime soon, but capturing the request may be useful regardless. |
Hm, can you provide a code snippet in Azure ML forecasting, same scenario? |
Yes, if we were to build the interface, then this is probably what it would default to, for most forecasters. I could also look into whether there's any quick way to enable that for global forecasters. |
FYI @danbartl, @KishManani, @ilkersigirci - would this be a useful feature for reducers? |
Absolutely, sounds like a good thing to have. |
I agree, it would be a good addition for reduced models. |
I think we'd certainly want the ability to train on multiple instances but then perform inference on a subset of instances. This is quite easy to do with global forecasting models (i.e, via reduction). |
I see - seems like a common enough scenario. I/we'll need to think about how exactly the interface should look like. E.g., how do we tell the forecaster that we want a forecast only for series 0 and 1? I think it needs to be encoded in the |
@romanlutz, I rephrased this as a feature request. Would be interesting to see some code - hypothetical or real (Azure ML, e.g.,) how it could look like |
I think this is the closest to what you're asking for: https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-forecast#forecasting-with-a-trained-model |
hm, it is hard to read in terms of specification. Where is information being passed about the instance/subset that you want to forecast for? |
We only pass the rows in the test data that we want forecasts for. The time series identifying columns are just normal columns in the dataset (unlike sktime where they are in the index). |
Ah, so it's basically a forecasting horizon equivalent that also has the instance identifier? |
Here's a notebook: https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-forecast-function/auto-ml-forecasting-function.ipynb Some snippets that may be helpful:
This is just bundled into a AutoML config that is passed to AzureML which ... deals with it. When it's done you can download a There's also a section about forecasting away from the data which I found interesting. Not related to this at all, but you might find that interesting, too 🙂 |
Getting back to your earlier comment regarding what this should look like @fkiraly
The reason this works a little easier for Azure ML is that the time series ID columns are part of X, so in other words X always exists even if there are no features. I don't really see how it would relate to |
update by @fkiraly - summarizing the below, I think this is a combination of feature request - "forecasters should support forecasting on a subset of instances, if panel data" - and frustrated user expectation that it already works. Overall, interesting idea, and labelling this as "API design" in addition to discuss how it should look like.
Describe the bug
If I train a model on multiple time series but provide only a single one of the time series for
predict
I get aKeyError
.To Reproduce
results in
X_test.iloc[:20]
corresponds to the first time series ("tropicana"), but "dominicks" is not in this subset. It is very much part of the training data, though.Expected behavior
I would hope that this is supported since I may sometimes need to predict just for one time series and sometimes for another or multiple.
Additional context
Versions
The text was updated successfully, but these errors were encountered: