prediction problems #39

LIngerwsk · 2020-12-15T16:26:29Z

hi，I have read something about #5, but I am still confused about the applicatility of prediction problems. Now, I am working to apply the transformer to the load forecasting. I want to use the load value of the first 168 moments to predict the load value of the next 24 moments, so I set the input_shape(x) as [batch_size, 168, 1],which is also the input of the encoderlayer, also, I set the target_shape(y) as [batch_size, 24, 1], which is also part of the input of the decoderlayer. Obviously, this is not work in your code, because the K is mismatch(168!=24). The output_shape is still [batch_size, 168, 1], instead of [batch_size, 24, 1], which is I want. Then, I want to know whether the original transformer or your transformer can apply to my problem?
What's more, why the K is the same in the encoderlayer and decoderlayer in your transformer? I have searched other code, where the sequence_length(K) may can be different in the encoderlayer and decoderlayer?
Thank you very much!

maxjcohen · 2020-12-16T08:45:55Z

Hi, I understand the problem you are trying to solve. I fixed K to be equal in both input and output shapes in order to avoid having to deal with predictions problem, as neither me nor this repo are qualified to address these problems. All the modifications of the original transformer that I implemented are made for many to many coherent time series problems, when the prediction of time step k depends mostly on input time steps k:k-Δ.

That being said, it doesn't mean that the Transformer can't be adapted for predictions problems, but it will require some modifications.

I would start, as you suggested, by releasing the equality condition on the input and output sequences. This can also be achieved by adding an output embedding layer after the last decoder.
The auto regressive nature of the Transformer becomes essential, so make sure to mask subsequent predictions in the decoder.
Most alternative implementations aiming at reducing the quadratic complexity, such as ChunkMHA or WindowMHA in this repo, should be avoided, as the model is no longer coherent.

LIngerwsk · 2020-12-16T13:02:36Z

thank for your answer. Though your repo is not applicable for the prediction problems. Then how should I modificate the model to apply to my problem. I have changed the embedding layer and replaced it by Linear layer, also, I changed the output layer with the sigmoid layer. What elso should I do to apply the transformer model to my problem. What's more,

I fixed K to be equal in both input and output shapes in order to avoid having to deal with predictions problem, as neither me nor this repo are qualified to address these problems
I can't quite make out what you mean. Wheather your repo can solve the prediction problem. If no, why?

maxjcohen · 2020-12-21T08:16:03Z

As I said, in order to adapt this repo to your problem you could start by:

* I would start, as you suggested, by releasing the equality condition on the input and output sequences. This can also be achieved by adding an output embedding layer after the last decoder.

* The auto regressive nature of the Transformer becomes essential, so make sure to mask subsequent predictions in the decoder.

* Most alternative implementations aiming at reducing the quadratic complexity, such as `ChunkMHA` or `WindowMHA` in this repo, should be avoided, as the model is no longer coherent.

maxjcohen closed this as completed Mar 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prediction problems #39

prediction problems #39

LIngerwsk commented Dec 15, 2020

maxjcohen commented Dec 16, 2020

LIngerwsk commented Dec 16, 2020

maxjcohen commented Dec 21, 2020

prediction problems #39

prediction problems #39

Comments

LIngerwsk commented Dec 15, 2020

maxjcohen commented Dec 16, 2020

LIngerwsk commented Dec 16, 2020

maxjcohen commented Dec 21, 2020