You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
According to the code here, assume that n_layers=24, then key_to_depths["encoder/layer_23/"] = 24 which is the depth for last encoder layer, but the learning rate for last layer is: learning_rate * (layer_decay ** (24+ 2 - 24)) = learning_rate * (layer_decay ** (2)).
That's what confused me. Why the learning rate for last layer is learning_rate * (layer_decay ** (2)) rather than learning_rate? Do I ignore anything?
The text was updated successfully, but these errors were encountered:
For the layerwise learning rate decay we count task-specific layer added on top of the pre-trained transformer as additional layer of the model, so the learning rate for the last layer of ELECTRA should be learning_rate * 0.8. But you've still found a bug, where instead it is learning_rate * 0.8^2.
The bug happened because there used to be a pooler layer in ELECTRA before we removed the next-sentence-prediction task. In that case the learning rates per layer were
task-specific softmax: learning_rate
pooler: learning_rate * 0.8
transformer layer 24: learning_rate * 0.8^2
transformer layer 23: learning_rate * 0.8^3
...
However, when we removed the pooling layer, we didn't fix the learning rates correspondingly. I guess in practice this didn't hurt performance much, so I'm leaving it as-is to keep result reproducible for now.
electra/model/optimization.py
Lines 188 to 193 in 7911132
According to the code here, assume that
n_layers=24
, thenkey_to_depths["encoder/layer_23/"] = 24
which is the depth for last encoder layer, but the learning rate for last layer is:learning_rate * (layer_decay ** (24+ 2 - 24)) = learning_rate * (layer_decay ** (2))
.That's what confused me. Why the learning rate for last layer is
learning_rate * (layer_decay ** (2))
rather thanlearning_rate
? Do I ignore anything?The text was updated successfully, but these errors were encountered: