Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue related to LayerNorm #8

Closed
quangnguyenbn99 opened this issue Nov 15, 2021 · 7 comments
Closed

Issue related to LayerNorm #8

quangnguyenbn99 opened this issue Nov 15, 2021 · 7 comments
Assignees

Comments

@quangnguyenbn99
Copy link

quangnguyenbn99 commented Nov 15, 2021

Hello, man. First of all thank for your effort a lot. I can see that It was taken your time quite much to write a clear code.
How ever, I just have a small question about Cross Attention class:

        self.kv_layer_norm = nn.LayerNorm(kv_dim)
        self.q_layer_norm = nn.LayerNorm(q_dim)
        self.qkv_layer_norm = nn.LayerNorm(q_dim)

When I integrated the repository to my program as the last layer . The outputs of these LayerNorm were always 0.
When I removed these Norm layers, The code run pretty well but much worse than the simple method (let's say simply concatenate the inputs and queries).
p/s: To be more specific, My queries and inputs were taken from 2 separated nets.
Do you have any idea about it?
Once again, thank you for your great work a lot.

@esceptico esceptico self-assigned this Nov 15, 2021
@esceptico
Copy link
Owner

Hey!
This follows from the original article (page 17).

X_qkv = Attn(layerNorm(X_q), layerNorm(X_kv))
X_qkv = X_qkv + X_q  # if required
X_qkv = X_qkv + MLP(layerNorm(X_qkv))

So "on paper" everything should be ok 🥲.

p/s: To be more specific, My queries and inputs were taken from 2 separated nets.

I think this is not a problem :)

Could you clarify the description or some details of the problem / model architecture?

@quangnguyenbn99
Copy link
Author

quangnguyenbn99 commented Nov 16, 2021

Hi, Thank you for answering me quickly,
As I mention I took 2 output from (1 MLP, 1 RNN), and carry them to PerceiverIO

A_inputs - > RNN_Out(Batch, 500, 64) -> Inputs
B_inputs -> Graphnet_Out (Last_layer) (Batch, 1, 300) -> Queries
-> PerceiverIO_Out(Batch, 1, 300). But It was all ZEROS.

This is my model architecture.

@esceptico
Copy link
Owner

Hi there!
I've fixed some bug with attention scaling (#9)
Can you pull the master branch and check again?

@quangnguyenbn99
Copy link
Author

quangnguyenbn99 commented Nov 24, 2021

I still got the 0 metric for after the NormLayer as I mentioned. Have you tried integrate the rep to specific tasks?

@esceptico
Copy link
Owner

Yep, I've tried to train simple text and image classifiers from scratch.

@quangnguyenbn99
Copy link
Author

I'm terribly sorry for the late reply. I could not make it work as I expected. I guess it's all it could be better to integrate to Transformer model than other tasks. Thank for the reply!
Please close the issue for your convenience

@esceptico
Copy link
Owner

No problem :)
Anyway, Perceiver IO is being added to the Transformers soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants