Optimize with sgd #52

qiuwei · 2018-07-11T13:44:09Z

Hi,
I am using ncrfpp on my own dataset.
Adam can converge normally in fewer than 20 epochs.

However, optimizing with SGD is extremely hard. I got gradient explosion or non-convergence most of the time.
Removing dropouts and l2 regularization and using very small lr makes the training converge, but extremely slow.

Could you share your parameters used for training with SGD?
Many thanks!

jiesutd · 2018-07-11T14:42:44Z

Hi @qiuwei , the advance optimizers indeed have a faster converge speed.

You can find our hyperparameters in the COLING paper https://arxiv.org/pdf/1806.04470.pdf

The final model performance / converge speed heavily depends on your dataset.

jiesutd closed this as completed Jul 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize with sgd #52

Optimize with sgd #52

qiuwei commented Jul 11, 2018

jiesutd commented Jul 11, 2018

Optimize with sgd #52

Optimize with sgd #52

Comments

qiuwei commented Jul 11, 2018

jiesutd commented Jul 11, 2018