New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No reproducible using tensorflow backend #2280

Open
zhuoqiang opened this Issue Apr 12, 2016 · 79 comments

Comments

Projects
None yet
@zhuoqiang

zhuoqiang commented Apr 12, 2016

with theano backend (CPU or GPU without cnDNN), I could train reproducible model by

fixed_seed_num = 1234
nunpy.random.seed(fixed_seed_num)
random.seed(fixed_seed_num) # not sure if needed or not

While in pure tensorflow without keras wrapper, it could also be reproducible by

tersorflow.set_random_seed(fixed_seed_num)

Don't know why, but in Keras + tensorflow backend, non of the above could give reproducible training model.

Environment:

  • Keras: v0.3.2
  • tensorflow: v0.7.1
  • Macbook OS X v10.11.4
  • tensorflow is using CPU

BTW: it would be great if keras could expose an unified API for reproducible training. something like:

keras.set_random_seed(fixed_seed_num)
@fchollet

This comment has been minimized.

Collaborator

fchollet commented Apr 12, 2016

BTW: it would be great if keras could expose an unified API for reproducible training

Right. I will look into it. Or does anybody else want to take a look at it?

@bplank

This comment has been minimized.

bplank commented Jul 30, 2016

Is there any update on this?

I could train reproducible models on theano (setting the seed before the keras import #439), but not when using the tensorflow backend.

related: #850

@kudkudak

This comment has been minimized.

Contributor

kudkudak commented Aug 23, 2016

I run into this problem in edward, here is fix we went with after rather long discussion: blei-lab/edward#184. Long story short - it is pretty hard to seed tensorflow if you have single shared session. Would be very interested to hear if there is a better solution :)

@fish128

This comment has been minimized.

fish128 commented Oct 17, 2016

Any update on this?

@bluelight773

This comment has been minimized.

bluelight773 commented Nov 5, 2016

Correct me if I'm wrong, but looks like this issue is still open and there is no way currently in Keras with a TensorFlow backend to get reproducible results. Any update? Workaround?

@kudkudak

This comment has been minimized.

Contributor

kudkudak commented Nov 5, 2016

Well, there is this hack blei-lab/edward#184, I can propose PR with that to Keras if that makes sense, @fchollet ?

The solution is to simply add set_seed() function, but raise an error if someone calls it after a TF variable is created. You cannot reseed after some Variable was created, as the previous seed was used to create initializers for it.

@pibkac

This comment has been minimized.

pibkac commented Dec 7, 2016

Any news on that issue? @bluelight773 I think when running it on the CPU it's reproducible - but that is not really an option most of the time

@JacobIsrael123

This comment has been minimized.

JacobIsrael123 commented Dec 20, 2016

@fchollet @zhuoqiang Could you confirm this?

Maybe there is a workaround by programming use both Keras and Tensorflow following this post:
https://blog.keras.io/keras-as-a-simplified-interface-to-tensorflow-tutorial.html

Use Keras pre-defined model to speed up building your model. But use Tensorflow for input, output and optimization.
Take a look at this code, it seems could reproduce the result.
I use CentOS 7 server, with Tesla K40. It always shows 0.6268 for the result.

>>> keras.__version__
'1.1.1'
>>> tf.__version__
'0.12.0-rc1'

You should seed it by

import numpy as np
np.random.seed(42)
import tensorflow as tf
tf.set_random_seed(42)
"""
Different behaviors during training and testing

Some Keras layers (e.g. Dropout, BatchNormalization) behave differently at training time and testing time.
You can tell whether a layer uses the "learning phase" (train/test) by printing layer.uses_learning_phase,
a boolean: True if the layer has a different behavior in training mode and test mode, False otherwise.

If your model includes such layers, then you need to specify the value of the learning phase as part of feed_dict,
so that your model knows whether to apply dropout/etc or not.

To make use of the learning phase, simply pass the value "1" (training mode) or "0" (test mode) to feed_dict:
"""
import numpy as np
np.random.seed(42)
import tensorflow as tf
tf.set_random_seed(42)
sess = tf.Session()
from keras.layers import Dropout, Dense, LSTM
from keras import backend as K
K.set_session(sess)
from keras.objectives import categorical_crossentropy
from keras.metrics import categorical_accuracy as accuracy

# load data
from tensorflow.examples.tutorials.mnist import input_data
mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)

img = tf.placeholder(tf.float32, shape=(None, 784))
labels = tf.placeholder(tf.float32, shape=(None, 10))

x = Dense(128, activation='relu')(img)
x = Dropout(0.5)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
preds = Dense(10, activation='softmax')(x)

loss = tf.reduce_mean(categorical_crossentropy(labels, preds))

# train_step = tf.train.GradientDescentOptimizer(0.5).minimize(loss)
train_step = tf.train.RMSPropOptimizer(learning_rate=0.001).minimize(loss)
# train_step = tf.train.AdagradOptimizer(learning_rate=0.001).minimize(loss)
# train_step = tf.train.AdadeltaOptimizer(learning_rate=0.001).minimize(loss)

with sess.as_default():
    sess.run(tf.global_variables_initializer())
    for i in range(100):
        batch = mnist_data.train.next_batch(50)
        train_step.run(feed_dict={img: batch[0],
                                  labels: batch[1],
                                  K.learning_phase(): 1})

acc_value = accuracy(labels, preds)
with sess.as_default():
    print acc_value.eval(feed_dict={img: mnist_data.test.images,
                                    labels: mnist_data.test.labels,
                                    K.learning_phase(): 0})
@nejumi

This comment has been minimized.

nejumi commented Jan 25, 2017

I heard that Keras is going to be merged in TensorFlow. Can I expect that the problem of reproducibility is solved at the same time? If YES, it will be great improvement for Kaggle usage!

@brannondorsey

This comment has been minimized.

brannondorsey commented Jan 28, 2017

@nejumi, ditto. This lack of support makes it really hard to run experiments with Keras & TF. I appreciate the convos and solutions here but really hoping this gets fixed soon.

@diogoff

This comment has been minimized.

Contributor

diogoff commented Mar 7, 2017

In principle, this should do it:

import numpy as np
np.random.seed(...)
import tensorflow as tf
tf.set_random_seed(...)

However, there is still non-determinism in cuDNN.

With theano it is possible to ensure reproducibility of cuDNN by setting dnn.conv flags: #2479 (comment)

With tensorflow, how do we set those flags?

lewfish added a commit to azavea/raster-vision that referenced this issue Mar 10, 2017

Remove random seed init
According to keras-team/keras#2280, experiments aren't
reproducible when using Keras/Tensorflow at the moment. So,  we might as well
remove seeding the random number generator, since it's not doing what we expect.
@pibkac

This comment has been minimized.

pibkac commented Mar 10, 2017

For some time, I had at least reproducible results when running the training on the CPU. However even that seems not to work any more. Anyone experienced the same?

@iaguas

This comment has been minimized.

iaguas commented Mar 30, 2017

I'm looking for a way of reproducing keras code, but I'm supposing that it's not possible. Am I right?

@iaguas

This comment has been minimized.

iaguas commented Mar 30, 2017

Thanks @diogoff but my problem is that I have tensorflow as backend and also I utilize cuDNN. It's the case that you are looking too for a solution.

@diogoff

This comment has been minimized.

Contributor

diogoff commented Mar 30, 2017

I gave up on reproducibility because I found that when forcing deterministic behavior in cuDNN, training would be much slower (e.g. from 15 secs/epoch to 30 secs/epoch).

@pylang

This comment has been minimized.

pylang commented Apr 1, 2017

IMO this is a critical issue that merits a high priority. Running a complex model for several minutes is meaningless unless results can be reproduced.

Running the same cell multiple times has given results that differ by several orders of magnitude. I can confirm the latest suggestion does not work for Keras 2.0.2/ TensorFlow 1.0 backend/Anaconda 4.2/Windows 7

import numpy as np
np.random.seed(123)
import tensorflow as tf
tf.set_random_seed(123)
@diogoff

This comment has been minimized.

Contributor

diogoff commented Apr 1, 2017

@pylang are you using cuDNN?

@pylang

This comment has been minimized.

pylang commented Apr 1, 2017

@diogoff I have not taken extra steps to install cuDNN. My assumption is no, though I am unaware how verify this absolutely.

@diogoff

This comment has been minimized.

Contributor

diogoff commented Apr 1, 2017

Try:
$ ls -las /usr/local/cuda/include/*dnn*
and
$ ls -las /usr/local/cuda/lib64/*dnn*

If you see libcudnn.so installed, you have it and probably tensorflow is using it.

If I remember, tensorflow will print some warning/info messages on startup, saying which libraries it has loaded. On my system, libcudnn.so was one of them.

@pylang

This comment has been minimized.

pylang commented Apr 1, 2017

I searched all files on my Windows machine and found none by that name, nor any system files with "cudnn" (only folders included in Anaconda's TensorFlow site package). I also don't see any warnings aside from the "TensorFlow backend" warning upon import. Seeing that

I have not directly installed the driver, find no library files under this name, and see no unusual warnings at import, I conclude I do not have cudnn installed.

@pylang

This comment has been minimized.

pylang commented Apr 1, 2017

On another note ... I perceived the main issue with non-reproducible results in keras may be related to how the weights are randomized for each call.

I did discover (late last night), that the kernel_initializer has a number of options for setting up a distribution from which (I assume) the weights are drawn. I have not run substantial tests to make a conclusion nor investigated these options further yet, but my initial tests seem to suggest that selecting different initializers influences the reproducibility of results. For instance, the default initializer is called "glorot_uniform". I played with some other distributions and managed to get more reproducible results, although with much higher error.

Since there are many variables, perhaps we should post, simple example here, e.g. single Dense layer, 1 input linear regression. The results should be consistent for all implementers. We can then confirm the results across different machines for different users.

@diogoff

This comment has been minimized.

Contributor

diogoff commented Apr 3, 2017

I picked up mnist_cnn.py from the examples and set up keras.json in this way:

{
    "image_data_format": "channels_first",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "tensorflow"
}

I ran python mnist_cnn.py a couple of times and the results did not seem to be reproducible.

Then I edited mnist_cnn.py and inserted the following code between from __future__ import print_function (line 8) and import keras (line 9):

import numpy as np
np.random.seed(123)
import tensorflow as tf
tf.set_random_seed(123)

The results now look sufficiently reproducible to me. The small differences I assume are due to the use of cuDNN.

I tried running without cuDNN:
$ TF_USE_CUDNN=0 python mnist_cnn.py
but it seems it's not possible:
UnimplementedError (see above for traceback): Conv2D for GPU is not currently supported without cudnn

@diogoff

This comment has been minimized.

Contributor

diogoff commented Apr 3, 2017

If I switch the backend to Theano:

{
    "image_data_format": "channels_first",
    "epsilon": 1e-07,
    "floatx": "float32",
    "backend": "theano"
}

and insert the following code between lines 8-9 in mnist_cnn.py:

import numpy as np
np.random.seed(123)

and then run:
$ THEANO_FLAGS="dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic" python mnist_cnn.py
the results are fully reproducible.

@pylang

This comment has been minimized.

pylang commented Apr 4, 2017

@diogoff for clarity, what do you consider fully reproducible? Do you know how close your loss results between runs? I'd like to compare notes.

@diogoff

This comment has been minimized.

Contributor

diogoff commented Apr 4, 2017

With fully reproducible, I mean I always get exactly the same results in every run:

loss: 0.3336 - acc: 0.8981 - val_loss: 0.0788 - val_acc: 0.9759
loss: 0.1214 - acc: 0.9642 - val_loss: 0.0548 - val_acc: 0.9828
loss: 0.0893 - acc: 0.9733 - val_loss: 0.0443 - val_acc: 0.9847
loss: 0.0735 - acc: 0.9783 - val_loss: 0.0391 - val_acc: 0.9871
loss: 0.0666 - acc: 0.9804 - val_loss: 0.0363 - val_acc: 0.9872
loss: 0.0590 - acc: 0.9825 - val_loss: 0.0369 - val_acc: 0.9873
loss: 0.0542 - acc: 0.9836 - val_loss: 0.0338 - val_acc: 0.9889
loss: 0.0505 - acc: 0.9850 - val_loss: 0.0314 - val_acc: 0.9889
loss: 0.0467 - acc: 0.9861 - val_loss: 0.0299 - val_acc: 0.9896
loss: 0.0451 - acc: 0.9867 - val_loss: 0.0319 - val_acc: 0.9898
loss: 0.0421 - acc: 0.9874 - val_loss: 0.0297 - val_acc: 0.9894
loss: 0.0405 - acc: 0.9880 - val_loss: 0.0309 - val_acc: 0.9895
Test loss: 0.0309449151449    <-- exactly the same up to the last digit
Test accuracy: 0.9895

Keras 2.0.2, Theano 0.9.0 with libgpuarray, CUDA 8.0, cuDNN 5.1.10

@nyxjemk

This comment has been minimized.

nyxjemk commented Nov 22, 2017

Yes, I'm using it for a ranking task. The small difference unfortunately makes a difference there.
Any further news on this topic?

@VanitarNordic

This comment has been minimized.

VanitarNordic commented Nov 22, 2017

Just do it on CPU, no other option. speed does not make a huge difference in this case.

@nyxjemk

This comment has been minimized.

nyxjemk commented Nov 25, 2017

Mh.. I'm using CPU, still having this issue. Do I have to shut off multiprocessing to achieve the reproducibility? This seems not very handy, as it would take aeons to complete..

@VanitarNordic

This comment has been minimized.

VanitarNordic commented Nov 25, 2017

@nyxjemk

This comment has been minimized.

nyxjemk commented Nov 25, 2017

Yes, I only left out the line with the multiprocessing.

# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)

I will try, but I think this will take just too long - not very practical. Using theano as backend results are reproducible without any limitations, so I thought there must be also a way to achieve this with tensor flow. But I will try it out.

@VanitarNordic

This comment has been minimized.

VanitarNordic commented Nov 25, 2017

no, you can try this with tensorflow either, not just theano. test and inform us

@nyxjemk

This comment has been minimized.

nyxjemk commented Nov 25, 2017

With theano results are reproducible for me, but for some other reasons I'd like to stick with tensorflow.
As expected it takes way longer: ~10x for the system I'm using, I will test it with 1 epoch to see if results are the same after running it twice (even this is very slow..).
I think there must be a better way. As said above, with Theano as backend I'm getting fully reproducible results, with full multiprocessing support.
I think there should be also a way with TensorFlow achieving reproducible results with multiprocessing.. from my perspective this is a really huge handy cap, -90% performance can not be the way to go.

@VanitarNordic

This comment has been minimized.

VanitarNordic commented Nov 25, 2017

the bad news is that theano will not continue its supports for the library. therefore we have no choice of sticking to tensorflow

@nyxjemk

This comment has been minimized.

nyxjemk commented Nov 25, 2017

yes.. therefore I was hoping there might be another way.
Adding the line:
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
does indeed solve the problem, but it is really really slow compared to normal. Too slow in my opinion to make often use of it.
Depending on the task there might be also more or less significant difference in the results on different runs.
So I hope there might be a future solution or maybe some workaround.

Is this problem TensorFlow specific, or is it just the combination of Keras and TF which leads to this issue?

@stgrmks

This comment has been minimized.

stgrmks commented Nov 26, 2017

hey guys,

I can reproduce my results by adding np.random.seed(1) and rn.random.seed(1) with keras + theano. the latter probably is not even necessary.
Furthermore, I can reproduce my results with pure tensorflow, by just adding np.random.seed(1) and tf.set_random_seed(1) and with keras + cntk by just adding np.random.seed(1) and _cntk_py.set_fixed_random_seed(1).
This was done with and without a dropoutlayer; perfectly reproducible.

I went with the keras FAQ suggestion and tried to reproduce my results with keras + tf without success.
one interesting observation:
if I use only a small amount of data or a small amount of units (LSTM with 10 units), the results are almost reproducible with keras + tf. almost means, the last 5 digits do very. but as soon as I add more data or units (say 200), the results vary a lot more. I tried with dropout layer and without. same conclusion.
also, giving the seed directly to kernel initializer and/or dropout-layer did not make any difference regarding the reproducibility

in summary:
I tried the keras FAQ suggestion without sucess for keras + tf. pure tensorflow, keras + theano and keras + cntk are perfectly reproducible even with dropout involved.

theano: 0.9.0.dev
tf: 1.3.0
keras: 2.1.1
cntk: 2.3

all computations were done on gpu

@zyavrik

This comment has been minimized.

zyavrik commented Nov 26, 2017

I tried the same as you with keras+tensorflow with resnet-v2 model from keras applications with no success. Results are not reproducible.

@stgrmks

This comment has been minimized.

stgrmks commented Jan 8, 2018

any news?

@ankahira

This comment has been minimized.

ankahira commented Jan 24, 2018

Any progress with this besides using a single thread.

@bhardwajvijay

This comment has been minimized.

bhardwajvijay commented Feb 18, 2018

This worked for me

# importing the libraries
import numpy as np
import tensorflow as tf
import random as rn

import os
os.environ['PYTHONHASHSEED'] = '0'

from keras import backend as k

# Running the below code every time
np.random.seed(27)
rn.seed(27)
tf.set_random_seed(27)

sess = tf.Session(graph=tf.get_default_graph())
k.set_session(sess)

## Creating model
m = Sequential()
m.add(...

## Compiling and fitting
m.compile(...
m.fit(...
@VanitarNordic

This comment has been minimized.

VanitarNordic commented Mar 10, 2018

it is not clear when Tensorflow will fix the damn thing, which still exists

@hainguyenct

This comment has been minimized.

hainguyenct commented Mar 23, 2018

This code works for me:

config = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1,
                        allow_soft_placement=True, device_count = {'CPU': 1})
session = tf.Session(config=config)
from keras import backend as K
K.set_session(session)

but it might make slowdown the learning?

@Sucran

This comment has been minimized.

Sucran commented May 21, 2018

@VanitarNordic Hi, the code in Keras Documents works for me, but I still got small inconsistency after the third or forth digit as you said. When first several epoch, the inconsistency will become larger and larger, and two same code will become totally different in training values after about 10 epoch.
Do you have any idea how to solve this issue, or what raise this problem ?
Thank you for sharing your experience.

@VanitarNordic

This comment has been minimized.

VanitarNordic commented May 21, 2018

@Sucran

Upgrade to the Tensorflow 1.8 and see what happens. You can use this Anaconda package:

conda install -c hesi_m keras

which installs TF-1.8 and Keras-2.1.6 for you on the CPU. Test and let me know the results.

@Sucran

This comment has been minimized.

Sucran commented May 21, 2018

@VanitarNordic the gpu software environment of my lab is Ubuntu 14.04, CUDA 8 and cudnn 6.0, I install tensorflow 1.4.0 and keras-2.1.6, and I could not update to tensorflow 1.8 because of software env limitation and I could not update the software env myself since others are using it. I am sorry for that...

@VanitarNordic

This comment has been minimized.

VanitarNordic commented May 21, 2018

I don't think you could be able to make reproducible results using Tensorflow-GPU. I have also a GPU on the system but Tensorflow uses CPU because I have installed the CPU version. You can create another anaconda environment for this purpose and test the idea

@Sucran

This comment has been minimized.

Sucran commented May 21, 2018

@VanitarNordic Yes, I agree with you. It is difficult to make reproducible results using TF-GPU. In my cases, although two same code got loss values with small value difference in each epoch, the data curve is almost the same but not identical (may be we can already say this is a reproducible results). However, I
still hope someone could figure out what cases this issue. I think may be it could be identical in CPU just like you said. Thank you for giving these advises.

dmitriydligach pushed a commit to dmitriydligach/Phenotype that referenced this issue Jun 28, 2018

@MartinThoma

This comment has been minimized.

Contributor

MartinThoma commented Aug 7, 2018

@abali96 Do you have any idea why setting PYTHONHASHSEED would be necessary?

@ageron

This comment has been minimized.

Contributor

ageron commented Aug 8, 2018

@MartinThoma, setting the PYTHONHASHSEED environment variable to 0 ensures that python's built-in hash() function outputs the same result across multiple runs of the program (without this, the hash() function is only stable within a single run of the program). This hash() function is used everywhere, for example when you create a set or a dict. Try running this:

>>> set("abcdefghijklmnopqrstuvwxyz")
{'p', 'f', 'g', 'i', 'n', 'o', 'k', 'c', 'h', 'b', 'v', 'a', 'd', 's', 'u', 'q', 'j', 'z', 'm', 'r', 'w', 'l', 't', 'x', 'y', 'e'}
>>> set("abcdefghijklmnopqrstuvwxyz")
{'p', 'f', 'g', 'i', 'n', 'o', 'k', 'c', 'h', 'b', 'v', 'a', 'd', 's', 'u', 'q', 'j', 'z', 'm', 'r', 'w', 'l', 't', 'x', 'y', 'e'}

If I stop the Python shell and I run the same commands, I get a different result:

>>> set("abcdefghijklmnopqrstuvwxyz")
{'c', 'y', 'q', 'g', 'a', 'u', 'd', 'k', 'w', 'j', 'm', 's', 'e', 'o', 'b', 'h', 'l', 'r', 't', 'x', 'z', 'n', 'p', 'v', 'f', 'i'}
>>> set("abcdefghijklmnopqrstuvwxyz")
{'c', 'y', 'q', 'g', 'a', 'u', 'd', 'k', 'w', 'j', 'm', 's', 'e', 'o', 'b', 'h', 'l', 'r', 't', 'x', 'z', 'n', 'p', 'v', 'f', 'i'}

However, if I start python like this:

PYTHONHASHSEED=0 python

Then I always get the same result, even across multiple restarts of the Python shell:

>>> set("abcdefghijklmnopqrstuvwxyz")
{'x', 'i', 'r', 'p', 'd', 'c', 'l', 'y', 'h', 'm', 'z', 'k', 'o', 'a', 'g', 'f', 'u', 'e', 'w', 'n', 'b', 'q', 'j', 't', 's', 'v'}

However, I noticed that setting this environment variable within the Python program did not have any effect. It only worked when setting it before the program starts. Looking at Python's source code, it seems that this environment variable is read upon startup, so I think it's no use setting this environment variable after startup, I'll submit a fix to Keras's documentation.

Hope this helps,
Aurélien

gabrielam2018 referenced this issue in gabrielam2018/bmgabi1974-gmail.com Aug 13, 2018

@thanhnguyentang

This comment has been minimized.

thanhnguyentang commented Nov 21, 2018

I think this non-deterministic behaviour is pretty much not due to Keras, but Tensorflow itself (e.g., https://stackoverflow.com/questions/45865665/getting-reproducible-results-using-tensorflow-gpu?newreg=4a6ec43834884576a175961e7f2188db).

I have tried to run the pure Tensorflow code fully_connected_feed.py from Tensorflow repo with the following settings (as recommended above by other responses):

import tensorflow as tf
import numpy as np 
import random 
import os 
os.environ['PYTHONHASHSEED'] = '0'
np.random.seed(2019)
random.seed(2019)
tf.set_random_seed(2019)

session_conf = tf.ConfigProto(intra_op_parallelism_threads=1,
                              inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)

and set shuffle=False in line 78 of fully_connected_feed.py but could not obtain reproducibility.

I could not obtain reproducibility even when runnning the code on CPU.
Note: For the Keras + Theano backend, I have also obtained a perfect reproducibility.

@AE51

This comment has been minimized.

AE51 commented Nov 28, 2018

I've also inserted the explicit kernel (and bias) initialization:
x = layers.Dense(64, activation='relu', kernel_initializer=keras.initializers.glorot_uniform(seed=123))(x),
and this has worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment