Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

How can i use Adam optimizer instead of SGD? #50

Closed
SongJeongHyun opened this issue May 28, 2018 · 1 comment
Closed

How can i use Adam optimizer instead of SGD? #50

SongJeongHyun opened this issue May 28, 2018 · 1 comment

Comments

@SongJeongHyun
Copy link

Hi! First of all, thanks for your code. I am recently studying your paper "Regularizing and Optimizing LSTM Language Models".

I want to compare Adam optimizer and SGD optimizer with applying NT-ASGD which u proposed.

I tried your command with some addition and your python code.

"python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save SGD_PTB.pt --optimizer sgd"
"python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save Adam_PTB.pt --optimizer adam"

The thing is that the first command does work good, but the second command work but doesn't calculate loss and ppl and bpc. I copied the log of it below. Please give me any possible solution for this if you don't mind.

| end of epoch 14 | time: 48.42s | valid loss nan | valid ppl nan | valid bpc nan

| epoch 15 | 200/ 663 batches | lr 30.00000 | ms/batch 67.94 | loss nan | ppl nan | bpc nan
| epoch 15 | 400/ 663 batches | lr 30.00000 | ms/batch 68.18 | loss nan | ppl nan | bpc nan
| epoch 15 | 600/ 663 batches | lr 30.00000 | ms/batch 67.13 | loss nan | ppl nan | bpc nan

| end of epoch 15 | time: 48.31s | valid loss nan | valid ppl nan | valid bpc nan

| epoch 16 | 200/ 663 batches | lr 30.00000 | ms/batch 67.27 | loss nan | ppl nan | bpc nan
| epoch 16 | 400/ 663 batches | lr 30.00000 | ms/batch 65.48 | loss nan | ppl nan | bpc nan
| epoch 16 | 600/ 663 batches | lr 30.00000 | ms/batch 67.29 | loss nan | ppl nan | bpc nan

| end of epoch 16 | time: 48.28s | valid loss nan | valid ppl nan | valid bpc nan

| epoch 17 | 200/ 663 batches | lr 30.00000 | ms/batch 67.21 | loss nan | ppl nan | bpc nan
| epoch 17 | 400/ 663 batches | lr 30.00000 | ms/batch 65.92 | loss nan | ppl nan | bpc nan
| epoch 17 | 600/ 663 batches | lr 30.00000 | ms/batch 66.32 | loss nan | ppl nan | bpc nan

@keskarnitish
Copy link
Contributor

keskarnitish commented May 29, 2018

The LR for Adam is typically 1E-3. The default LR is 30. Add --lr 1E-3 to your script and it should work :)

Closing now; feel free to reopen if necessary.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants