Why does the grad-norm saturate in Stage-2 ?

Hi, I'm training the stage-2 of QueST for LIBERO-90 with the autoencoder trained on LIBERO-90 at stage-0.

While monitoring the dashboard of wandb, I noticed that the grad-norm increases and saturates at 100 due to gradient clipping.
Why does this happen?
Isn't grad-norm supposed to be decreasing as the training progresses?

Thanks!