Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducible results #42

Closed
NamLQ opened this issue Jun 15, 2017 · 9 comments
Closed

reproducible results #42

NamLQ opened this issue Jun 15, 2017 · 9 comments

Comments

@NamLQ
Copy link

NamLQ commented Jun 15, 2017

Hi everyone!

Each time I train a model on the same data set, I receive the different result. I already set the seed set.seed(1) but it does not work. How can I make the results the same for each run?
Thank you very much!

library(keras)
# Generate dummy data
set.seed(1)
data <- matrix(runif(1000*100), nrow = 1000, ncol = 100)
labels <- matrix(round(runif(1000, min = 0, max = 1)), nrow = 1000, ncol = 1)

# repeat running the code from this:
model <- keras_model_sequential()

# add layers and compile the model
model %>% 
    layer_dense(units = 32, activation = 'relu', input_shape = c(100)) %>% 
    # layer_dropout(rate = 0.25) %>%
    # layer_batch_normalization() %>%
    layer_dense(units = 1, activation = 'relu') %>% 
    compile(
        optimizer = 'adam',
        loss = 'mean_squared_error',
        metrics = 'mean_squared_error'
    )

# Train the model, iterating on the data in batches of 32 samples
set.seed(1)
model %>% fit(data, labels, epochs=10, batch_size=32)
@terrytangyuan
Copy link
Contributor

It's extremely hard to get a deterministic model since you'll have to control the random seeds properly in each Keras layer, e.g. layers like dropout, initializers, etc. have randomness built in. These will have to be controlled well in Keras' Python front-end. If you really want a very robust model, you really should try to take advantage of this randomness nature when train the model. Hope this helps. Closing this for now.

@topepo
Copy link
Collaborator

topepo commented Sep 2, 2017

Can we reopen this?

R (and R users) highly prize reproducibility. It is difficult to generate teaching materials, reports, books, and other items if we get different results on every run.

I understand the issue about seeds in different places. Perhaps we could set them to something similar to sample.int(10^5, 1) so that in each location that a seed is required, it could pull from the global random number stream. In that way, we could set the seed for the session and the constituent layers, initializes, and other parts get a deterministic value for the seed.

@terrytangyuan
Copy link
Contributor

Is this issue happening in Python API as well? If so, this should really be an issue there.

@jjallaire
Copy link
Member

Here is what we have currently in the FAQ: https://keras.rstudio.com/articles/faq.html#how-can-i-obtain-reproducible-results-using-keras-during-development

If there is something we can do in the package that facilitates better reproducibility I'm all for it, just let me know what you think we need to do.

@terrytangyuan
Copy link
Contributor

terrytangyuan commented Sep 2, 2017 via email

@jjallaire
Copy link
Member

jjallaire commented Sep 2, 2017 via email

@jjallaire jjallaire reopened this Sep 2, 2017
@terrytangyuan
Copy link
Contributor

@jjallaire Sounds good to me! I took a stab at rstudio/tensorflow#174 for this. This will be useful for small computations that can run in single thread but unfortunately not too useful for multi-thread though. I do believe people nowadays really try take advantage of this non-reproducibility to create more robust models as well as achieve model parallelism.

I would say give it a shot once that PR got merged. I think np and random should give a pretty good coverage of the functions in Keras that use random generators.

@topepo
Copy link
Collaborator

topepo commented Sep 5, 2017

To get it from the keras side: #119

@jjallaire
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants