You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am using ncrfpp on my own dataset.
Adam can converge normally in fewer than 20 epochs.
However, optimizing with SGD is extremely hard. I got gradient explosion or non-convergence most of the time.
Removing dropouts and l2 regularization and using very small lr makes the training converge, but extremely slow.
Could you share your parameters used for training with SGD?
Many thanks!
The text was updated successfully, but these errors were encountered:
Hi,
I am using ncrfpp on my own dataset.
Adam can converge normally in fewer than 20 epochs.
However, optimizing with SGD is extremely hard. I got gradient explosion or non-convergence most of the time.
Removing dropouts and l2 regularization and using very small lr makes the training converge, but extremely slow.
Could you share your parameters used for training with SGD?
Many thanks!
The text was updated successfully, but these errors were encountered: