You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I try to change the model into B/16 MLP-Mixer.
is this setting, the patch number ( sequence length) != MLP token mixing dimension.
But the code will report an error when it implements "x = layers.Add()([x, token_mixing])" because the two operation numbers have different shapes.
Take an example,
B/16 Settings:
image 3232, 2D hidden layer 768, PP= 16*16, token mixing mlp dimentsion= 384, channel mlp dimension = 3072.
Thus patch number ( sequence length) = 4, table value shape= (4, 768)
When the code runs x = layers.Add()([x, token_mixing]) in the token mixing layer.
rx shape=[4, 768], token_mixing shape = [384, 768]
It is strange why the MLP-Mixer paper could set different parameters "patch number ( sequence length) != MLP token mixing dimensio"
The text was updated successfully, but these errors were encountered:
Hi, I just found an error in the code.
The mlp_block needs to be changed like this:
def mlp_block(x, mlp_dim):
y = layers.Dense(mlp_dim)(x)
y = tf.nn.gelu(y)
return layers.Dense(x.shape[-1])(y)
Notice: the token output dimension needs to be equal to the initial input[-1].
In the original version, the dimension differs. I think the setting patches number = MLP token mixing dimension, makes us ignore this problem.
def mlp_block(x, mlp_dim):
x = layers.Dense(mlp_dim)(x)
x = tf.nn.gelu(x)
return layers.Dense(x.shape[-1])(x)
That is why the original paper mentions: "Note that DS is selected independently of the number of input patches."
I try to change the model into B/16 MLP-Mixer.
is this setting, the patch number ( sequence length) != MLP token mixing dimension.
But the code will report an error when it implements "x = layers.Add()([x, token_mixing])" because the two operation numbers have different shapes.
Take an example,
B/16 Settings:
image 3232, 2D hidden layer 768, PP= 16*16, token mixing mlp dimentsion= 384, channel mlp dimension = 3072.
Thus patch number ( sequence length) = 4, table value shape= (4, 768)
When the code runs x = layers.Add()([x, token_mixing]) in the token mixing layer.
rx shape=[4, 768], token_mixing shape = [384, 768]
It is strange why the MLP-Mixer paper could set different parameters "patch number ( sequence length) != MLP token mixing dimensio"
The text was updated successfully, but these errors were encountered: