Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider switching from RMSPropOptimizer to AdamOptimizer #27

Closed
Chazzz opened this issue Nov 27, 2018 · 5 comments
Closed

Consider switching from RMSPropOptimizer to AdamOptimizer #27

Chazzz opened this issue Nov 27, 2018 · 5 comments

Comments

@Chazzz
Copy link
Contributor

Chazzz commented Nov 27, 2018

I've been consistently getting 68-69% word accuracy using the AdamOptimizer. I like that Adam improves accuracy fairly consistently, whereas the jitter present in RMSProp makes the program more likely to terminate before reaching 68% or higher. I measured a ~25% per-epoch time penalty in using Adam, and it generally takes more epochs to reach a higher accuracy percentage (good problem to have).

I also experimented with various batch sizes with no meaningful improvement, though Adam with a default learning rate tends to do better with larger batch sizes.

Results:
AdamOptimizer (Tuned) Batch size 50
rate = 0.001 if self.batchesTrained < 10000 else 0.0001 # decay learning rate
end result: ('Epoch:', 68)
Character error rate: 13.104371%. Word accuracy: 69.008696%.
Character error rate: 13.082070%. Word accuracy: 69.026087%. (best)
end result: ('Epoch:', 46)
Character error rate: 13.577769%. Word accuracy: 68.295652%.
Character error rate: 13.600071%. Word accuracy: 68.452174%. (best)
end result: ('Epoch:', 55)
Character error rate: 13.198626%. Word accuracy: 68.782609%.
Character error rate: 12.984522%. Word accuracy: 69.165217%. (best)

@githubharald
Copy link
Owner

githubharald commented Nov 27, 2018

thank you for your research. Some questions:

  1. Did you use the decaying learning rate or just the constant default value (0.001) provided with the TF Adam implementation?
  2. Any other changes to the default values of the Adam optimizer?
  3. Did I understand this correctly: one epoch takes +25% more time, and it also takes a larger number of epochs to train the model? How much is the overall increase of the training time (approximately)?
  4. Just to be sure - you're using the default decoder (best path / greedy)?

@Chazzz
Copy link
Contributor Author

Chazzz commented Nov 27, 2018

  1. Decaying the learning rate over time gave me an improvement from 66% accuracy to 69%. I used rate = 0.001 if self.batchesTrained < 10000 else 0.0001
  2. I didn't modify the other Adam parameters.
  3. As you can see from my results, even with the exact same parameters the models can take 30%-50% more or less time to train from run to run. I believe the per-epoch improvement between the optimizers are comparable, but the AdamOptimizer takes more epochs on average mostly due to less premature stoppages. So a 25% increase in training time is probably a reasonable estimate, but I'm not claiming a lot of accuracy here. Do you have general performance/training time data for the baseline build?
  4. I didn't modify the decoder so I believe I'm using the default.

@githubharald
Copy link
Owner

githubharald commented Nov 28, 2018

I think I'll stay with the RMSProp optimizer, because one of my goals for this SimpleHTR model is to be trainable on CPUs in a reasonable amount of time, and I don't want to increase the training time any further.
However, I'll add some information to the README and point to this issue, such that other users know about your findings.

@githubharald
Copy link
Owner

added your findings to README.

@Chazzz
Copy link
Contributor Author

Chazzz commented Nov 28, 2018

This is a reasonable compromise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants