-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No reproducible using tensorflow backend #2280
Comments
Right. I will look into it. Or does anybody else want to take a look at it? |
I run into this problem in edward, here is fix we went with after rather long discussion: blei-lab/edward#184. Long story short - it is pretty hard to seed tensorflow if you have single shared session. Would be very interested to hear if there is a better solution :) |
Any update on this? |
Correct me if I'm wrong, but looks like this issue is still open and there is no way currently in Keras with a TensorFlow backend to get reproducible results. Any update? Workaround? |
Well, there is this hack blei-lab/edward#184, I can propose PR with that to Keras if that makes sense, @fchollet ? The solution is to simply add set_seed() function, but raise an error if someone calls it after a TF variable is created. You cannot reseed after some Variable was created, as the previous seed was used to create initializers for it. |
Any news on that issue? @bluelight773 I think when running it on the CPU it's reproducible - but that is not really an option most of the time |
@fchollet @zhuoqiang Could you confirm this? Maybe there is a workaround by programming use both Keras and Tensorflow following this post: Use Keras pre-defined model to speed up building your model. But use Tensorflow for input, output and optimization.
You should seed it by
|
I heard that Keras is going to be merged in TensorFlow. Can I expect that the problem of reproducibility is solved at the same time? If YES, it will be great improvement for Kaggle usage! |
@nejumi, ditto. This lack of support makes it really hard to run experiments with Keras & TF. I appreciate the convos and solutions here but really hoping this gets fixed soon. |
In principle, this should do it:
However, there is still non-determinism in cuDNN. With theano it is possible to ensure reproducibility of cuDNN by setting dnn.conv flags: #2479 (comment) With tensorflow, how do we set those flags? |
According to keras-team/keras#2280, experiments aren't reproducible when using Keras/Tensorflow at the moment. So, we might as well remove seeding the random number generator, since it's not doing what we expect.
For some time, I had at least reproducible results when running the training on the CPU. However even that seems not to work any more. Anyone experienced the same? |
I'm looking for a way of reproducing keras code, but I'm supposing that it's not possible. Am I right? |
Thanks @diogoff but my problem is that I have tensorflow as backend and also I utilize cuDNN. It's the case that you are looking too for a solution. |
I gave up on reproducibility because I found that when forcing deterministic behavior in cuDNN, training would be much slower (e.g. from 15 secs/epoch to 30 secs/epoch). |
IMO this is a critical issue that merits a high priority. Running a complex model for several minutes is meaningless unless results can be reproduced. Running the same cell multiple times has given results that differ by several orders of magnitude. I can confirm the latest suggestion does not work for Keras 2.0.2/ TensorFlow 1.0 backend/Anaconda 4.2/Windows 7
|
@pylang are you using cuDNN? |
@diogoff I have not taken extra steps to install cuDNN. My assumption is no, though I am unaware how verify this absolutely. |
Try: If you see libcudnn.so installed, you have it and probably tensorflow is using it. If I remember, tensorflow will print some warning/info messages on startup, saying which libraries it has loaded. On my system, libcudnn.so was one of them. |
I searched all files on my Windows machine and found none by that name, nor any system files with "cudnn" (only folders included in Anaconda's TensorFlow site package). I also don't see any warnings aside from the "TensorFlow backend" warning upon import. Seeing that I have not directly installed the driver, find no library files under this name, and see no unusual warnings at import, I conclude I do not have cudnn installed. |
On another note ... I perceived the main issue with non-reproducible results in keras may be related to how the weights are randomized for each call. I did discover (late last night), that the Since there are many variables, perhaps we should post, simple example here, e.g. single |
I picked up
I ran Then I edited
The results now look sufficiently reproducible to me. The small differences I assume are due to the use of cuDNN. I tried running without cuDNN: |
If I switch the backend to Theano:
and insert the following code between lines 8-9 in
and then run: |
@diogoff for clarity, what do you consider fully reproducible? Do you know how close your loss results between runs? I'd like to compare notes. |
With fully reproducible, I mean I always get exactly the same results in every run:
Keras 2.0.2, Theano 0.9.0 with libgpuarray, CUDA 8.0, cuDNN 5.1.10 |
I don't think you could be able to make reproducible results using Tensorflow-GPU. I have also a GPU on the system but Tensorflow uses CPU because I have installed the CPU version. You can create another anaconda environment for this purpose and test the idea |
@VanitarNordic Yes, I agree with you. It is difficult to make reproducible results using TF-GPU. In my cases, although two same code got loss values with small value difference in each epoch, the data curve is almost the same but not identical (may be we can already say this is a reproducible results). However, I |
@abali96 Do you have any idea why setting |
@MartinThoma, setting the >>> set("abcdefghijklmnopqrstuvwxyz")
{'p', 'f', 'g', 'i', 'n', 'o', 'k', 'c', 'h', 'b', 'v', 'a', 'd', 's', 'u', 'q', 'j', 'z', 'm', 'r', 'w', 'l', 't', 'x', 'y', 'e'}
>>> set("abcdefghijklmnopqrstuvwxyz")
{'p', 'f', 'g', 'i', 'n', 'o', 'k', 'c', 'h', 'b', 'v', 'a', 'd', 's', 'u', 'q', 'j', 'z', 'm', 'r', 'w', 'l', 't', 'x', 'y', 'e'} If I stop the Python shell and I run the same commands, I get a different result: >>> set("abcdefghijklmnopqrstuvwxyz")
{'c', 'y', 'q', 'g', 'a', 'u', 'd', 'k', 'w', 'j', 'm', 's', 'e', 'o', 'b', 'h', 'l', 'r', 't', 'x', 'z', 'n', 'p', 'v', 'f', 'i'}
>>> set("abcdefghijklmnopqrstuvwxyz")
{'c', 'y', 'q', 'g', 'a', 'u', 'd', 'k', 'w', 'j', 'm', 's', 'e', 'o', 'b', 'h', 'l', 'r', 't', 'x', 'z', 'n', 'p', 'v', 'f', 'i'} However, if I start python like this: PYTHONHASHSEED=0 python Then I always get the same result, even across multiple restarts of the Python shell: >>> set("abcdefghijklmnopqrstuvwxyz")
{'x', 'i', 'r', 'p', 'd', 'c', 'l', 'y', 'h', 'm', 'z', 'k', 'o', 'a', 'g', 'f', 'u', 'e', 'w', 'n', 'b', 'q', 'j', 't', 's', 'v'} However, I noticed that setting this environment variable within the Python program did not have any effect. It only worked when setting it before the program starts. Looking at Python's source code, it seems that this environment variable is read upon startup, so I think it's no use setting this environment variable after startup, I'll submit a fix to Keras's documentation. Hope this helps, |
I think this non-deterministic behaviour is pretty much not due to Keras, but Tensorflow itself (e.g., https://stackoverflow.com/questions/45865665/getting-reproducible-results-using-tensorflow-gpu?newreg=4a6ec43834884576a175961e7f2188db). I have tried to run the pure Tensorflow code
and set I could not obtain reproducibility even when runnning the code on CPU. |
I've also inserted the explicit kernel (and bias) initialization: |
I tried multiple version of tensorflow from 1.2.1 till 1.13.1 all of them have issues on CPU. However when I set the backend as CNTK I am able to get perfect matching results. |
I'm getting this issue. I'm using this example https://www.depends-on-the-definition.com/lstm-with-char-embeddings-for-ner/ with the ner dataset. https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus Everytime I run the training, it produces a completely different metrics for fscore. precision, and recall. THe lowest I had was 71% fscore and the highest was 83%. I think that is a huge variation. I tried the same on two machines one with GPU and one using only CPU, and results are the same. unreproducable results. I was using keras 2.2.2 + tensorflow 1.10 (unable to use latest version due to unresolved bug in keras >= 2.2.3) |
Getting same issue +1 |
any update on this please? |
Still can't get reproducible results even after the usage of this method
Please, consider it as high priority as it becomes really difficult to do research with TF + Keras. |
Hi @alberduris, please read my comment about PYTHONHASHSEED: you cannot set it within your program, you have to set it before starting Python (or Jupyter). Check out my video for more details. |
Putting the following code in the beginning, I can consistently reproduce the result 100% if I only use Dense layer.
However, I get different results if I insert this one line "model.add(Conv2D(32, 3, activation='relu'))" before "model.add(Flatten())". Input> flatten > dense produces consistent result, but input > conv2d > flatten > dense produces different result every time I run the code. I'd appreciate any guidance. |
@jsl303 , it's no use setting To convince yourself, try starting Python multiple times and run this command: $ python3
...
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
oeqsytnmfprwbvhldxijzcugak
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
oeqsytnmfprwbvhldxijzcugak
>>> exit()
$ python3
...
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
qjufgnolbdewycpitkzvarxsmh
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
qjufgnolbdewycpitkzvarxsmh
>>> exit() As you can see, although the order is consistent within one Python execution, it is not consistent across multiple runs. If you try to set $ PYTHONHASHSEED=0 python3
...
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
xirpdclyhmzkoagfuewnbqjtsv
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
xirpdclyhmzkoagfuewnbqjtsv
>>> exit()
$ PYTHONHASHSEED=0 python3
...
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
xirpdclyhmzkoagfuewnbqjtsv
>>> print("".join(set("abcdefghijklmnopqrstuvwxyz")))
xirpdclyhmzkoagfuewnbqjtsv
>>> exit() Hope this helps. |
Thanks for the explanation!
I don't use any sets or dictionaries in my code. I tried what you
suggested anyways in case. You're right. it did not make any difference.
I still get different results whenever I rerun the model with a Conv2D.
|
@jsl303 , glad I could help. But even if you don't use any sets or dictionaries in your code, if any library you call iterates over sets or dictionaries, the result will not be deterministic across runs. So I really recommend setting PYTHONHASHSEED=0 outside of Python. Moreover, if you are using a GPU, then it won't be deterministic because some GPU operations used by TensorFlow (through CuDNN and CUDA) are just not perfectly deterministic (such as |
I am getting 100% reproducible results after following Ageron's video. Below are the variables I set OS: Windows I first created a PYTHONHASHSEED environment variable in Windows environment variables and set it to 0. I opened jupyter notebook through Anaconda prompt and added the below code at the start of the program.
I then used the seed while splitting dataset, initializing weights in NN's .
Hope this helps. The only problem I see now is to reproduce the results, I have to restart the kernel every time. I hope someone can help in this regard which avoids restarting the kernel. Update: with the below in the first cell and now I don't have to restart the kernel.
Just running this cell before the desired seed iteration helps. -Kiran Varma |
After searching for 2-3 days, a solution that I saw somewhere worked for me. Changed the optimizer to Adagrad from Adam and now I am having consistent results. Trying to find the reason for this |
Has this been fixed? What's the recommended solution? |
@pylang . Here's a summary on how to get 100% reproducibility (at the cost of performance!):
If you follow these guidelines, you should get 100% reproducible results. But as you can see, it comes at a high cost (especially dropping the GPU!), so you may ask yourself: is it worth the effort? Perhaps instead of perfect reproducibility, you could run the code multiple times and ensure that it produces approximately the same output on average, and the variance is low. Hope this helps. |
I found your comment on here. The way how I get a reproducible results is to put tf.random.set_seed() in your function. for example. """ Hope others solve this problem as well! |
你好,已收到你的邮件。 --魏浩
|
For everyone who had the same issue as described here but still came across this issue, please have a look at these links:
|
with theano backend (CPU or GPU without cnDNN), I could train reproducible model by
While in pure tensorflow without keras wrapper, it could also be reproducible by
Don't know why, but in Keras + tensorflow backend, non of the above could give reproducible training model.
Environment:
BTW: it would be great if keras could expose an unified API for reproducible training. something like:
The text was updated successfully, but these errors were encountered: