Issue related to LayerNorm #8

quangnguyenbn99 · 2021-11-15T11:50:13Z

Hello, man. First of all thank for your effort a lot. I can see that It was taken your time quite much to write a clear code.
How ever, I just have a small question about Cross Attention class:

        self.kv_layer_norm = nn.LayerNorm(kv_dim)
        self.q_layer_norm = nn.LayerNorm(q_dim)
        self.qkv_layer_norm = nn.LayerNorm(q_dim)

When I integrated the repository to my program as the last layer . The outputs of these LayerNorm were always 0.
When I removed these Norm layers, The code run pretty well but much worse than the simple method (let's say simply concatenate the inputs and queries).
p/s: To be more specific, My queries and inputs were taken from 2 separated nets.
Do you have any idea about it?
Once again, thank you for your great work a lot.

The text was updated successfully, but these errors were encountered:

esceptico · 2021-11-15T12:31:30Z

Hey!
This follows from the original article (page 17).

X_qkv = Attn(layerNorm(X_q), layerNorm(X_kv))
X_qkv = X_qkv + X_q  # if required
X_qkv = X_qkv + MLP(layerNorm(X_qkv))

So "on paper" everything should be ok 🥲.

p/s: To be more specific, My queries and inputs were taken from 2 separated nets.

I think this is not a problem :)

Could you clarify the description or some details of the problem / model architecture?

quangnguyenbn99 · 2021-11-16T00:11:32Z

Hi, Thank you for answering me quickly,
As I mention I took 2 output from (1 MLP, 1 RNN), and carry them to PerceiverIO

A_inputs - > RNN_Out(Batch, 500, 64) -> Inputs
B_inputs -> Graphnet_Out (Last_layer) (Batch, 1, 300) -> Queries
-> PerceiverIO_Out(Batch, 1, 300). But It was all ZEROS.

This is my model architecture.

esceptico · 2021-11-18T09:17:07Z

Hi there!
I've fixed some bug with attention scaling (#9)
Can you pull the master branch and check again?

quangnguyenbn99 · 2021-11-24T09:29:46Z

I still got the 0 metric for after the NormLayer as I mentioned. Have you tried integrate the rep to specific tasks?

esceptico · 2021-11-25T11:14:17Z

Yep, I've tried to train simple text and image classifiers from scratch.

quangnguyenbn99 · 2021-12-01T10:19:50Z

I'm terribly sorry for the late reply. I could not make it work as I expected. I guess it's all it could be better to integrate to Transformer model than other tasks. Thank for the reply!
Please close the issue for your convenience

esceptico · 2021-12-06T09:09:34Z

No problem :)
Anyway, Perceiver IO is being added to the Transformers soon.

esceptico self-assigned this Nov 15, 2021

esceptico closed this as completed Dec 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue related to LayerNorm #8

Issue related to LayerNorm #8

quangnguyenbn99 commented Nov 15, 2021 •

edited

esceptico commented Nov 15, 2021

quangnguyenbn99 commented Nov 16, 2021 •

edited

esceptico commented Nov 18, 2021

quangnguyenbn99 commented Nov 24, 2021 •

edited

esceptico commented Nov 25, 2021

quangnguyenbn99 commented Dec 1, 2021

esceptico commented Dec 6, 2021

Issue related to LayerNorm #8

Issue related to LayerNorm #8

Comments

quangnguyenbn99 commented Nov 15, 2021 • edited

esceptico commented Nov 15, 2021

quangnguyenbn99 commented Nov 16, 2021 • edited

esceptico commented Nov 18, 2021

quangnguyenbn99 commented Nov 24, 2021 • edited

esceptico commented Nov 25, 2021

quangnguyenbn99 commented Dec 1, 2021

esceptico commented Dec 6, 2021

quangnguyenbn99 commented Nov 15, 2021 •

edited

quangnguyenbn99 commented Nov 16, 2021 •

edited

quangnguyenbn99 commented Nov 24, 2021 •

edited