Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prediction problems #39

Closed
LIngerwsk opened this issue Dec 15, 2020 · 3 comments
Closed

prediction problems #39

LIngerwsk opened this issue Dec 15, 2020 · 3 comments

Comments

@LIngerwsk
Copy link

hi,I have read something about #5, but I am still confused about the applicatility of prediction problems. Now, I am working to apply the transformer to the load forecasting. I want to use the load value of the first 168 moments to predict the load value of the next 24 moments, so I set the input_shape(x) as [batch_size, 168, 1],which is also the input of the encoderlayer, also, I set the target_shape(y) as [batch_size, 24, 1], which is also part of the input of the decoderlayer. Obviously, this is not work in your code, because the K is mismatch(168!=24). The output_shape is still [batch_size, 168, 1], instead of [batch_size, 24, 1], which is I want. Then, I want to know whether the original transformer or your transformer can apply to my problem?
What's more, why the K is the same in the encoderlayer and decoderlayer in your transformer? I have searched other code, where the sequence_length(K) may can be different in the encoderlayer and decoderlayer?
Thank you very much!

@maxjcohen
Copy link
Owner

Hi, I understand the problem you are trying to solve. I fixed K to be equal in both input and output shapes in order to avoid having to deal with predictions problem, as neither me nor this repo are qualified to address these problems. All the modifications of the original transformer that I implemented are made for many to many coherent time series problems, when the prediction of time step k depends mostly on input time steps k:k-Δ.

That being said, it doesn't mean that the Transformer can't be adapted for predictions problems, but it will require some modifications.

  • I would start, as you suggested, by releasing the equality condition on the input and output sequences. This can also be achieved by adding an output embedding layer after the last decoder.
  • The auto regressive nature of the Transformer becomes essential, so make sure to mask subsequent predictions in the decoder.
  • Most alternative implementations aiming at reducing the quadratic complexity, such as ChunkMHA or WindowMHA in this repo, should be avoided, as the model is no longer coherent.

@LIngerwsk
Copy link
Author

thank for your answer. Though your repo is not applicable for the prediction problems. Then how should I modificate the model to apply to my problem. I have changed the embedding layer and replaced it by Linear layer, also, I changed the output layer with the sigmoid layer. What elso should I do to apply the transformer model to my problem. What's more,

I fixed K to be equal in both input and output shapes in order to avoid having to deal with predictions problem, as neither me nor this repo are qualified to address these problems
I can't quite make out what you mean. Wheather your repo can solve the prediction problem. If no, why?

@maxjcohen
Copy link
Owner

As I said, in order to adapt this repo to your problem you could start by:

* I would start, as you suggested, by releasing the equality condition on the input and output sequences. This can also be achieved by adding an output embedding layer after the last decoder.

* The auto regressive nature of the Transformer becomes essential, so make sure to mask subsequent predictions in the decoder.

* Most alternative implementations aiming at reducing the quadratic complexity, such as `ChunkMHA` or `WindowMHA` in this repo, should be avoided, as the model is no longer coherent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants