Skip to content

Conversation

@taylanbil
Copy link
Collaborator

This will give us the option to disable rate smoothing if we so desire.

0.8 is the default value from xla_model.py

@taylanbil taylanbil requested review from dlibenzi and jysohn23 August 21, 2019 21:37
@taylanbil
Copy link
Collaborator Author

taylanbil commented Aug 21, 2019

this covers all the files that use RateTracker

% grep RateTracker xla/test                                       
grep: cpp: Is a directory
test_train_cifar.py:    tracker = xm.RateTracker(smooth_factor=FLAGS.smooth_factor)
test_train_imagenet.py:    tracker = xm.RateTracker(smooth_factor=FLAGS.smooth_factor)
test_train_mnist.py:    tracker = xm.RateTracker(smooth_factor=FLAGS.smooth_factor)

@dlibenzi
Copy link
Collaborator

Can't we choose a sane value instead?
Exposing tunables to models is OK but IMHO should be done only for meaningful cases, otherwise if we expose everything the model code becomes a bit too complex.

@taylanbil
Copy link
Collaborator Author

I was thinking one may want to not smooth at all. AFAIK that's what we do on the TF side.

@dlibenzi
Copy link
Collaborator

I was thinking one may want to not smooth at all. AFAIK that's what we do on the TF side.

Does a 0.0 smoothing achieves stable results?
Can we choose a smaller one instead 0.8 as default?

@taylanbil
Copy link
Collaborator Author

I mean, sometimes we will want to smooth, sometimes not, depending on personal preference and also how variable the rate is w// the model and dataset in hand. I'm not unhappy with the current value actually. Happy to drop the pr.

@jysohn23
Copy link
Collaborator

Does a 0.0 smoothing achieves stable results?
Can we choose a smaller one instead 0.8 as default?

Do you think then setting smoothing_factor to 0.0 is reasonable? On the TF side we log the average examples/sec since last logging event AFAIK with TPUEstimator and would like to compare to that. Based on what I've seen, yes, 0.0 smoothing is stable.

@dlibenzi
Copy link
Collaborator

That depends on how frequently you call rate().
If you call it at every step, a smoothing of 0.0 might not be good. Every 20 steps, probably yes.
IOW the frequency of calling rate() provides another smoothing factor.
I'd be inclined to choose something between, say, 0.2 and 0.6 and call it a day.

@dlibenzi
Copy link
Collaborator

Another approach is to leave models as they are, and instead of the usual 0.8 default, use something like:

xu.getenv_as('RATE_TRACKER_SMOOTHING', float, 0.8)

@taylanbil taylanbil merged commit 0cfb181 into pytorch:master Aug 23, 2019
@taylanbil taylanbil deleted the argsmooth branch August 23, 2019 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants