New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue related to LayerNorm #8
Comments
Hey! X_qkv = Attn(layerNorm(X_q), layerNorm(X_kv))
X_qkv = X_qkv + X_q # if required
X_qkv = X_qkv + MLP(layerNorm(X_qkv)) So "on paper" everything should be ok 🥲.
I think this is not a problem :) Could you clarify the description or some details of the problem / model architecture? |
Hi, Thank you for answering me quickly,
This is my model architecture. |
Hi there! |
I still got the 0 metric for after the NormLayer as I mentioned. Have you tried integrate the rep to specific tasks? |
Yep, I've tried to train simple text and image classifiers from scratch. |
I'm terribly sorry for the late reply. I could not make it work as I expected. I guess it's all it could be better to integrate to Transformer model than other tasks. Thank for the reply! |
No problem :) |
Hello, man. First of all thank for your effort a lot. I can see that It was taken your time quite much to write a clear code.
How ever, I just have a small question about Cross Attention class:
When I integrated the repository to my program as the last layer . The outputs of these LayerNorm were always 0.
When I removed these Norm layers, The code run pretty well but much worse than the simple method (let's say simply concatenate the inputs and queries).
p/s: To be more specific, My queries and inputs were taken from 2 separated nets.
Do you have any idea about it?
Once again, thank you for your great work a lot.
The text was updated successfully, but these errors were encountered: