You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I replaced the nn.Embedding layer (token embeddings) with a simple linear layer.
But the model fails to model the data.
Also trained with many different configurations. But It fails to even overfit a really small time series data (100 steps). Since it cannot overfit to a small data, I think that there is something missing or wrong.
Any idea how to properly apply this model to time series?
Maybe there is a problem with pos embeddings? or LayerNorms?
The text was updated successfully, but these errors were encountered:
found some ideas. below paper applies convolution across time and then uses the outputs as Q,K,V for multi-head attention mechanism. (didn't dig much but probably applies some transformation to conv outputs as in the normal case). https://arxiv.org/abs/1907.00235
I replaced the nn.Embedding layer (token embeddings) with a simple linear layer.
But the model fails to model the data.
Also trained with many different configurations. But It fails to even overfit a really small time series data (100 steps). Since it cannot overfit to a small data, I think that there is something missing or wrong.
Any idea how to properly apply this model to time series?
Maybe there is a problem with pos embeddings? or LayerNorms?
The text was updated successfully, but these errors were encountered: