Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction on unseen data #83

Closed
randomgitdude opened this issue Oct 7, 2020 · 10 comments
Closed

Prediction on unseen data #83

randomgitdude opened this issue Oct 7, 2020 · 10 comments
Labels
question Further information is requested

Comments

@randomgitdude
Copy link

Hi,
Firstly - thank you for your time, work and commitment that went into this package. All is good stuff. Yet one on thing I'm kinda struggling is how to check the predictions on the data that is not visible in the trainer class (from the documentation). I guess I should append it with the original data - but do you have any good practices that you can share ?

@AlexMRuch
Copy link

Hi @randomgitdude, check out #67 and the updated tutorials on https://github.com/jdb78/pytorch-forecasting/blob/master/docs/source/tutorials/stallion.ipynb,

@jdb78 jdb78 added the question Further information is requested label Oct 8, 2020
@randomgitdude
Copy link
Author

@AlexMRuch Thank you for pointing out to that updated tutorial. However I have few questions:

  1. encoder_data = data[lambda x: x.time_idx > x.time_idx.max() - max_encoder_length]
    This indeed creates a new data set - but what about scaling these features ? AFAIC only TimeSeriesDataSet class does that - and in the case of not calling the class upfront on the new datasets we are feeding the NN with unseen datasets "as-is" without any per-processing that it was initially trained with.

  2. I manged to the get predictions with the following steps:
    -> Loading the new data
    -> Standard pre-processing (assiging categorical variables)
    -> Assigning a date index - the same way as for the training set
    date_map = dict( zip( pd.bdate_range( "2006-01-01", "2099-01-01", ), np.arange( 0, len( pd.bdate_range( "2006-01-01", "2030-01-01", ) ), ) + 1, ) ) df_valid["Date_Time"] = df_valid["Date_Time"].copy().map(date_map).astype(int)
    -> Creating abc= TimeSeriesDataSet.from_dataset(training, df_valid, predict=True, stop_randomization=True)
    -> Then testing_sample = abc.to_dataloader(train=False)
    -> Finally raw_predictions = best_tft.predict(testing_sample )

But I wonder if this is the correct approach or am I missing something ?

@AlexMRuch
Copy link

For point 1, do you mean scaling the future target and covariate data or do you mean scaling the past historical data that has been trained on? If you mean the latter, I'm not sure as my time series skills are not that great and @jdb78 may have more thoughts. For historical data, I think you can still do all the scaling you need on the data DataFrame before this step, as the lambda is only slicing off a section of the DataFrame.

For point 2, I'm glad to hear you got the predictions working. I implemented the forecasting methods just as @jdb78 did in the tutorial and the results have face validity with what I'd expect (and they do differ from the evaluation plots), so that's about all I can speak to the question of whether the approach is correct.

@randomgitdude
Copy link
Author

For point 1, do you mean scaling the future target and covariate data or do you mean scaling the past historical data that has been trained on?

Future target data.

For historical data, I think you can still do all the scaling you need on the data DataFrame before this step,

I was referring to future data. For the historical data - the data is actually scaled in the training class. Thus my assumption is that it should be used to scale future data. Why ? Well, simply because you have to scale the future values according to the mean and std of the training data not according to the mean or std of the future values.

As for point no. 2 maybe @jdb78 can elaborate ?

@AlexMRuch
Copy link

Ah, yeah, I definitely see your point and am curious to know what is best-practice as well! Thanks for clarifying!

@jdb78
Copy link
Owner

jdb78 commented Oct 9, 2020

Issue #51 sheds some light on it. There are basically two approaches. Both are implemented in PyTorch Forecasting.

@randomgitdude
Copy link
Author

The first option is off the table for various reasons - at least IMHO.
Now, the second - so the EncoderNormalizer inherits from pytorch_forecasting.data.encoders.TorchNormalizer which inherits from the standard sklearn's sklearn.base.BaseEstimator, sklearn.base.TransformerMixin. So far so good - but in that case should I use it in the TimeSeriesDataSet class or before feeding it into it ? Becasuse if I get this correctly TimeSeriesDataSet only normalizes the target with the EncoderNormalizer.

@jdb78
Copy link
Owner

jdb78 commented Oct 10, 2020

In practice there should be minimal leakage by normalising on the entire training set instead of the encoder if the variable in question is not the target. Probably, normalising something else on the encoder sequence only would not work because the normalisation would not be stable. If you want to contribute this feature, feel invited to raise a PR!

@randomgitdude
Copy link
Author

Ok - so few questions:

  1. Why the normalization would not be stable ?
  2. In the give case - does normalizing the target yields any benefit ?
  3. Is abc= TimeSeriesDataSet.from_dataset(training, df_valid, predict=True, stop_randomization=True) a viable way of per-processing an unseen datasets ?

@jdb78
Copy link
Owner

jdb78 commented Oct 12, 2020

Sure.

  1. Calculating variance when most of the values are constant is likely to be difficult (e.g. price is mostly constant). You can imagine the normalization vastly changing by just moving a few timesteps. This prevents learning useful information.
  2. NNs have troubles with outputting unnormalised numbers. It is possible but you start to get issues because all the non-linearities are built for values between -2 and 2. Further, normalisation makes values across timeseries comparable, hence facilitating transfer learning.
  3. Yes, because it means that you copy the pre-processors from training to abc.

Hope this is helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants