New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducible results #42

Closed
NamLQ opened this Issue Jun 15, 2017 · 9 comments

Comments

Projects
None yet
4 participants
@NamLQ

NamLQ commented Jun 15, 2017

Hi everyone!

Each time I train a model on the same data set, I receive the different result. I already set the seed set.seed(1) but it does not work. How can I make the results the same for each run?
Thank you very much!

library(keras)
# Generate dummy data
set.seed(1)
data <- matrix(runif(1000*100), nrow = 1000, ncol = 100)
labels <- matrix(round(runif(1000, min = 0, max = 1)), nrow = 1000, ncol = 1)

# repeat running the code from this:
model <- keras_model_sequential()

# add layers and compile the model
model %>% 
    layer_dense(units = 32, activation = 'relu', input_shape = c(100)) %>% 
    # layer_dropout(rate = 0.25) %>%
    # layer_batch_normalization() %>%
    layer_dense(units = 1, activation = 'relu') %>% 
    compile(
        optimizer = 'adam',
        loss = 'mean_squared_error',
        metrics = 'mean_squared_error'
    )

# Train the model, iterating on the data in batches of 32 samples
set.seed(1)
model %>% fit(data, labels, epochs=10, batch_size=32)
@terrytangyuan

This comment has been minimized.

Member

terrytangyuan commented Jun 16, 2017

It's extremely hard to get a deterministic model since you'll have to control the random seeds properly in each Keras layer, e.g. layers like dropout, initializers, etc. have randomness built in. These will have to be controlled well in Keras' Python front-end. If you really want a very robust model, you really should try to take advantage of this randomness nature when train the model. Hope this helps. Closing this for now.

@topepo

This comment has been minimized.

Collaborator

topepo commented Sep 2, 2017

Can we reopen this?

R (and R users) highly prize reproducibility. It is difficult to generate teaching materials, reports, books, and other items if we get different results on every run.

I understand the issue about seeds in different places. Perhaps we could set them to something similar to sample.int(10^5, 1) so that in each location that a seed is required, it could pull from the global random number stream. In that way, we could set the seed for the session and the constituent layers, initializes, and other parts get a deterministic value for the seed.

@terrytangyuan

This comment has been minimized.

Member

terrytangyuan commented Sep 2, 2017

Is this issue happening in Python API as well? If so, this should really be an issue there.

@jjallaire

This comment has been minimized.

Member

jjallaire commented Sep 2, 2017

Here is what we have currently in the FAQ: https://keras.rstudio.com/articles/faq.html#how-can-i-obtain-reproducible-results-using-keras-during-development

If there is something we can do in the package that facilitates better reproducibility I'm all for it, just let me know what you think we need to do.

@terrytangyuan

This comment has been minimized.

Member

terrytangyuan commented Sep 2, 2017

@jjallaire

This comment has been minimized.

Member

jjallaire commented Sep 2, 2017

@jjallaire jjallaire reopened this Sep 2, 2017

@terrytangyuan

This comment has been minimized.

Member

terrytangyuan commented Sep 3, 2017

@jjallaire Sounds good to me! I took a stab at rstudio/tensorflow#174 for this. This will be useful for small computations that can run in single thread but unfortunately not too useful for multi-thread though. I do believe people nowadays really try take advantage of this non-reproducibility to create more robust models as well as achieve model parallelism.

I would say give it a shot once that PR got merged. I think np and random should give a pretty good coverage of the functions in Keras that use random generators.

@topepo

This comment has been minimized.

Collaborator

topepo commented Sep 5, 2017

To get it from the keras side: #119

@jjallaire

This comment has been minimized.

@jjallaire jjallaire closed this Sep 14, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment