New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

function to set random seeds for keras session #120

Merged
merged 4 commits into from Sep 6, 2017

Conversation

Projects
None yet
3 participants
@jjallaire
Member

jjallaire commented Sep 5, 2017

PR for review, not yet ready for merge.

I tried this with the mnist_mlp.R example (https://github.com/rstudio/keras/blob/master/vignettes/examples/mnist_mlp.R) and did indeed get fully identical results!

@topepo Let's incorporate #119 into this PR (if we set the seeds here I'm not sure whether we require the explicit seed arguments. Also, even if we do want explicit seed arguments we could do a missing check for them and then insert the calls to sample there (the idea being that NULL means auto/default).

Note that this only covers the TF backend and does disable some parallelism (we might want to note this in a message that is a side effect of the function)

R/seed.R Outdated
if (is_backend("tensorflow")) {
tf <- tensorflow::tf
session_conf = tf$ConfigProto(intra_op_parallelism_threads = 1L,

This comment has been minimized.

@terrytangyuan

terrytangyuan Sep 5, 2017

Member

I was thinking whether we should expose this to arguments to users. Even though multiple threads maybe potential source, they may still want to keep the parallelism. Maybe we should specially note this somewhere or some type of warning so users are aware of this, e.g. they may wonder why the program is much slower.

This comment has been minimized.

@jjallaire

jjallaire Sep 5, 2017

Member

Good point, just added both the option to toggle this behavior as well as printing of a message notifying the user that GPU and/or CPU parallelism has been disabled.

@jjallaire

This comment has been minimized.

Member

jjallaire commented Sep 5, 2017

@topepo Let us know if you think this is enough to provide reproducibility or if we additionally need to change the seed default as is done in your other PR.

@topepo

This comment has been minimized.

Collaborator

topepo commented Sep 5, 2017

Here's what I did...

I'm calling keras through caret about 30 times. I still see variability (and another trend I'll mention at the end). I ran the same code a bunch of times under different conditions (EDIT: I haven't enabled the gpu on this computer).

In every case, I reset the R seed to same value before every model fit prior to calling caret::train. The ROC value reported below are for one tuning parameter and are the average of 10 resampled ROC values. The time is the total execution time for train. Code and Rout files are keras_repro_caret.zip

First with the changes in https://github.com/rstudio/keras/pull/120.

no set_keras_seed:

  • run 1: ROC = 0.801, time = 73.7
  • run 2: ROC = 0.835, time = 103.6

set_keras_seed(, disable_parallel_cpu = FALSE)

  • run 1: ROC = 0.845, time = 153.7
  • run 2: ROC = 0.832, time = 189.8

set_keras_seed(, disable_parallel_cpu = TRUE)

  • run 1: ROC = 0.858, time = 229.1
  • run 2: ROC = 0.827, time = 270.9

Now run the same way but with the sample.int modifications in my PR

no set_keras_seed:

  • run 1: ROC = 0.840, time = 75.4
  • run 2: ROC = 0.833, time = 102.9

set_keras_seed(, disable_parallel_cpu = FALSE)

  • run 1: ROC = 0.835, time = 142.4
  • run 2: ROC = 0.847, time = 187.3

set_keras_seed(, disable_parallel_cpu = TRUE)

  • run 1: ROC = 0.859, time = 227.7
  • run 2: ROC = 0.835, time = 266.0

Any thoughts?

Should I insert the set_keras_seed into the underlying model code so that it executes before every model fit? I don't think that would be workable (for caret since I don't want to make choices about disabling parallelism).

===

(Aside: Note that the execution times are always increasing and are pretty close between overall runs with and without sample.int. I've seen some weird things with execution times when repeatedly calling keras. In some cases, the same model fit takes 6000s to finish (but that didn't happen here).)

@jjallaire

This comment has been minimized.

Member

jjallaire commented Sep 5, 2017

If you are going to train over and over again in the same session you should first destroy the Keras/TF session. You can do that with:

K <- backend()
K$clear_session()

That may be something I need to do as well within set_keras_seed, as I'm not sure whether re-setting the random seed within an existing TF session is even valid. This could also explain performance problems if more and more memory is getting consumed and not freed.

I also believe that in general the Keras/TF workflows are single-session oriented, so calling over and over again in the same session with different seeds could cause other problems (e.g. caches that were populated with one seed). I'm not 100% sure about this though.

@topepo

This comment has been minimized.

Collaborator

topepo commented Sep 5, 2017

I've been running the rBayesianOptimization package with keras and I use the same clear_session code but still see the long training time issue.

@jjallaire

This comment has been minimized.

Member

jjallaire commented Sep 5, 2017

@topepo

This comment has been minimized.

Collaborator

topepo commented Sep 5, 2017

In caret, I added those lines prior to each model fit in the test file that I sent (I can't add them afterwards because there is always a prediction step or two to be done). The results is that all the calls come in at 50ish seconds. The ROC values are still different but the training time is consistent.

For the Bayesian work, the keras call follows a previous train call to get some substrate models to start the optimization. I'll split those up into different sessions, clear the session each time, and see how it works.

@topepo

This comment has been minimized.

Collaborator

topepo commented Sep 5, 2017

I also believe that in general the Keras/TF workflows are single-session oriented, so calling over and over again in the same session with different seeds could cause other problems (e.g. caches that were populated with one seed). I'm not 100% sure about this though.

One other thing that I've noticed but have no definitive data on... I've tried using the keras package in separate R sessions that are running concurrently on the same machine. It doesn't fail but I feel like it does add some instability (e.g. the long training times) that I don't see otherwise. Again, I could be overfitting here.

@jjallaire

This comment has been minimized.

Member

jjallaire commented Sep 6, 2017

@jjallaire

This comment has been minimized.

Member

jjallaire commented Sep 6, 2017

I've updated this PR to reexport a function from the tensorflow package (use_session_with_seed()) as I think we'll want this same functionality for tfestimators and other tf front end packages as well.

@jjallaire jjallaire merged commit 0c95b04 into master Sep 6, 2017

0 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@jjallaire jjallaire deleted the feature/set-session-seed branch Mar 24, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment