Slow training due to TensorBoard verbosity #26

Flandan · 2017-05-09T10:25:21Z

Hey @Sentdex, watching latest youtube video got me wondering why your two titans were training at the same rate. I thought it might be a cpu bottleneck somewhere in the code so I had a look through and the only obvious thing I could see was the TensorBoard output.

After a quick test, with short training sessions changing the verbosity from 2 to 0, I got around 20x speedup.
The line in alexnet.py:

model = tflearn.DNN(network, checkpoint_path='model_alexnet',
                    max_checkpoints=1, tensorboard_verbose=2, tensorboard_dir='log')

To:

model = tflearn.DNN(network, checkpoint_path='model_alexnet',
                    max_checkpoints=1, tensorboard_verbose=0, tensorboard_dir='log')

I can now train a reasonable model in under an hour. My guess is that all the data processing/saving for TensorBoard is happening on the cpu with extra slowdown writing it to the drive. Therefore it cant keep up. So if you don't need the extra verbosity its a good idea to turn it down.

If anyone can confirm this I would appreciate it, I was a bit shocked and cynical when I saw the epochs flying by!

Edit: It also results in a log file that is a few Megabyte vs a few hundred and TensorBoard no longer struggles to load it. Hopefully it will stop those crashes too.

Sentdex · 2017-05-09T15:13:22Z

This is a great point. I like the extra data from tensorboard, but this will indeed make a huge change for our larger model that takes days/weeks to train.. I'll go ahead and change this in the official code. Good to leave tensorboard verbose when first validating a model, but a bad idea for the long haul.

Sentdex closed this as completed May 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow training due to TensorBoard verbosity #26

Slow training due to TensorBoard verbosity #26

Flandan commented May 9, 2017 •

edited

Loading

Sentdex commented May 9, 2017

Slow training due to TensorBoard verbosity #26

Slow training due to TensorBoard verbosity #26

Comments

Flandan commented May 9, 2017 • edited Loading

Sentdex commented May 9, 2017

Flandan commented May 9, 2017 •

edited

Loading