Hi, I'm training the stage-2 of QueST for LIBERO-90 with the autoencoder trained on LIBERO-90 at stage-0.
While monitoring the dashboard of wandb, I noticed that the grad-norm increases and saturates at 100 due to gradient clipping.
Why does this happen?
Isn't grad-norm supposed to be decreasing as the training progresses?
Thanks!
Hi, I'm training the stage-2 of QueST for LIBERO-90 with the autoencoder trained on LIBERO-90 at stage-0.
While monitoring the dashboard of wandb, I noticed that the grad-norm increases and saturates at 100 due to gradient clipping.
Why does this happen?
Isn't grad-norm supposed to be decreasing as the training progresses?
Thanks!