Saved model behaves differently on different machines #7676

basaldella · 2017-08-17T15:41:08Z

After studying #439, #2228, #2743, #6737 and the new FAQ about reproducibility, I was able to get consistent, reproducible result on my development machines using Theano. If I run my code twice, I get the exact same results.

The problem is that the results are reproducible only on the same machine. In other words, if I

Train a model on machine A
Evaluate the model using predict
Save the model (using save_model, or model_to_json and save_weights)
Transfer the model to machine B and load it
Evaluate again the model on machine B using predict

The results of the two predicts are different. Using CPU or GPU makes no difference - after I copy the model file(s) from a machine to another, the performance of predict changes dramatically.

The only difference on the two machines is the hardware (I use my laptop's 980M and a workstation with a Titan X Pascal) and the NVIDIA driver version, which is slightly older on the workstation. Both computers run Ubuntu 16.04 LTS and Cuda 8 with cuDNN. All libraries are on the same version on both machines, and the Python version is the same as well (3.6.1).

Is this behavior intended? I expect that running a pre-trained model on with the same architecture and weights on two different machines yields the same results, but this doesn't seem the case.

On a side note, a suggestion: on the FAQs about reproducibility, it should be explicitly stated that the development version of Theano is needed to get reproducible results.

The text was updated successfully, but these errors were encountered:

RunshengSong · 2017-08-30T00:40:33Z

same here with Tensorflow backend. I trained my models in my local machine (Ubuntu 16.04 LTS). When I tested my model on AWS EC2 instance I got different prediction numbers.

Have you solved this?

basaldella · 2017-08-30T07:56:23Z

No, I'm still experiencing this problem.

You can check if any of the solutions posted in #4875 work for you. What versions of the libraries are you running?

RunshengSong · 2017-08-31T00:16:41Z

I am running Keras 2.0.6.

From my experience the version of packages is not the problem (at least for now). I did some experiments today and found out that if I remove PCA from my modeling pipeline everything works fine. Did you use sklearn's PCA to reduce the dimension of your inputs? If so you might want to try to remove it. It solved my problem for now.

I don't know why this happens. This post says that the non-deterministic in weights could cause problems, but this doesn't explain why the Neural Networks model behaves the same in the same machine. Inspired by this post I guess the different results of dimension reduction lead to this problem.

rsmith49 · 2017-08-31T19:33:14Z

Not sure if this will help at all, but I was dealing with this for a day and a half before I realized it was a difference in how the machines handled hashing words before I passed them to my Embedding layer.

basaldella · 2017-09-01T07:12:25Z

@RunshengSong, I don't use sklearn. @rsmith49, are you talking about setting PYTHONHASHSEED?

halhenke · 2017-09-01T11:48:43Z

I found the same thing as @rsmith49 - had a word2vec model that would act as if the weights had been completely re-initialized when i loaded them from disk in a new session. After also saving/pickling the dicts that mapped words to ints and reloading them from disk also when i started a new training session the model behaved as expected.

rsmith49 · 2017-09-01T16:47:51Z

@basaldella Yes, turns out my issue was more along the lines of #4875, and was inconsistent between different Python sessions, not just different machines.

basaldella · 2017-09-13T12:28:54Z

@halhenke I'm also using a word2vec model, but I use GloVe's pre-trained weights following this tutorial, so I guess that shouldn't be the issue. Are you using pre-trained weights as well?

wangchenouc · 2017-10-24T14:26:26Z

@basaldella Have you fixed this issue?
It seems that I have the same problem. I re-traind a model with fine tuning InceptionV3 with my own images on a GPU machine. After training, the accuracy could up to 91% which I am happy with it. During the training the improved model was saved with callbacks. So I can load the best retrained model with model.load_model(model_path), and I tested it with one image. The predict results are always the same and correct (because I know what this image belong to).
the results is like this: [[ 0.00197385 0.01141251 0.02262068 0.9121536 0.00810914 0.01657074
0.00370198 0.00617629 0.00972648 0.00531203 0.00224261]]

Now, I try to copy the retrained model (HDF5 file) to my laptop, and load the model again, and test the model with the same image, then I got a totally different result.
[[ 0.00373867 0.22160383 0.10066977 0.35440436 0.02839879 0.17799987
0.01744748 0.02645957 0.0299265 0.03026218 0.00908909]]

The python environment are the same in the two machine with keras 2.0.8:
The result are always the same in the same machine.
The weights are the same after I load the model file.
......I checked many things.

Why the results are different in the two machine? Is there somebody know about this?

basaldella · 2017-10-24T19:52:15Z

@wangchenouc no, I was absolutely not able to fix this issue. If you have any news please tag me in your issue as well. I'm actually thinking on switching to a lower level framework because I'm not able to solve this problem.

wangchenouc · 2017-10-25T16:01:21Z

@basaldella Please look at this #8149

basaldella · 2017-10-25T16:12:58Z

@wangchenouc thanks, but this does not solve my issue. In my case the versions of Keras are the same on both machines.

wangchenouc · 2017-10-26T07:39:34Z

@basaldella Just compare the version of keras is not enough. Maybe you need to compare every function codes that you used. Try use a very simple cases like what I did, and it's easy to compare your different step by step. I spend 3 whole days to debug the codes step by step, and solved my problem finally.

Good luck to you!

basaldella · 2017-10-26T07:45:38Z

@wangchenouc I know. But I'm cloning the same repo on 2 machines, installing the same python&libraries versions with a script, and still, I have no luck on getting the same results.

Thanks for encouraging me though :)

philiprekers · 2018-01-12T09:56:28Z

Any news on this issue? I'm running into the same problem.
Two instances - identical because it's the same hardware set up and the second has been installed from an image of the first one.
When I model.save() my model in the first instance and load_model() in the second, the results seem to be random when evaluating in the second instance. Accuracy also drops to unreasonable values (from .97 to .52).

What are the possible causes other than differences in code/set up/hardware? I've been searching for solutions for the last 3 days and nothing seems to work.

dterg · 2018-01-30T17:32:48Z

I've looked at the several potential solutions reported here and related threads but no luck either @Philipduerholt . In my case my last layer is a softmax and when I predict the same training data (not even test), I get equal probabilities between my classes i.e. the model is completely random.

philiprekers · 2018-01-31T08:38:55Z

It worked, finally. And in hindsight it looks simple. I'll try to include all relevant points:
I'll be talking about training instance and production instance.
I use TensorFlow backend.
Python version 3.5.4.
Keras version 2.0.5.

I pickle everything I use as input or as ID-map (like word_id_dictionaries).
e.g. at the end I have word2index.dict, label2index.dict (most people would use .pkl).
For evaluation I also pickle X_test and y_test.
Build and train your model in training instance.
I use a Sequencial model with Embedding layer (some had problems with that).
I use ModelCheckpoint() and save it as a list (callbacks_list), file names end on .hdf5.
model.fit has callbacks = callbacks_list.
After training, I choose the most promising saved model.
I can load_model('models/most_promising.hdf5') in the same instance and evaluate.
This works as expected.
I transfer the .hdf5 file and all pickled files to production instance.
In production I make sure all package versions are equal to training instance.
Best to use something like conda env.
I import: from keras import backend as K
Immediately after imports i set learning phase: K.set_learning_phase(0)
I initialize/load all the things:
- model = load_model(model_path)
- with open(word2index_path, "rb") as f:
  word2index = pickle.load(f)
- etc.
Evaluation works as expected.
Predict works as expected.

I hope it helps.

ghost · 2018-03-21T15:48:53Z

I had the same problem...
upgrading Keras on both machines to version 2.1.5 solved problem for me.

han963xiao · 2018-11-10T14:08:45Z

i

I had the same problem...
upgrading Keras on both machines to version 2.1.5 solved problem for me.
Amazing！the solution is each machine should have the same version keras！Same inputs on differernt version will have different output....

xiaoleitw · 2019-02-23T08:23:22Z

running into the same problem and looking for solution

xiaoleitw · 2019-02-23T08:24:26Z

It worked, finally. And in hindsight it looks simple. I'll try to include all relevant points:
I'll be talking about training instance and production instance.
I use TensorFlow backend.
Python version 3.5.4.
Keras version 2.0.5.

I pickle everything I use as input or as ID-map (like word_id_dictionaries).

e.g. at the end I have word2index.dict, label2index.dict (most people would use .pkl).

For evaluation I also pickle X_test and y_test.

Build and train your model in training instance.

I use a Sequencial model with Embedding layer (some had problems with that).

I use ModelCheckpoint() and save it as a list (callbacks_list), file names end on .hdf5.

model.fit has callbacks = callbacks_list.

After training, I choose the most promising saved model.

I can load_model('models/most_promising.hdf5') in the same instance and evaluate.

This works as expected.

I transfer the .hdf5 file and all pickled files to production instance.

In production I make sure all package versions are equal to training instance.

Best to use something like conda env.

I import: from keras import backend as K

Immediately after imports i set learning phase: K.set_learning_phase(0)

I initialize/load all the things:

model = load_model(model_path)

with open(word2index_path, "rb") as f:
word2index = pickle.load(f)

etc.

Evaluation works as expected.

Predict works as expected.

I hope it helps.

this is not working for me :(

shrutimittal90 · 2019-03-12T10:14:02Z

tf.set_random_seed(0) worked for me

jewelcai · 2019-07-30T12:01:11Z

tf.set_random_seed(0) worked for me

where should this line be placed? Before sess = tf.Session(config=config)?

urmilanayak · 2020-06-11T00:17:39Z

I am facing same problem in Golang, following are approach

Train a model on Ubuntu 18.04 (using Python, Tensorflow and Keras)
Optimized, Freezed and Saved model to be used in Tensorflow Go API
LoadSavedModel on Ubuntu 18.04 using Tensorflow Go API
LoadSavedModel on Raspberry Pi 4 using Tensorflow Go API

The weights for all layer's are different, when loaded in Ubuntu (step 3) and Raspberry Pi (step 4). Which is causing the different softmax prediction.

Sample weights on different environment:
These are just sample weights, however all the weights in all layers are different.
Tensorflow API version used to load model: Go Tensorflow (r2.0), Tensorflow C (r2.0), Golang (1.13.6)

Loaded weights in Ubuntu:
[0.5031438 -0.062892914 -0.10482144 -0.04192853 0.7127869 0.46121502 -0.3983221 ....]

Loaded weight in Raspberry Pi for the same layer and same filter as above
[0.49415612 -0.07188058 -0.11380911 -0.050916195 0.70379925 0.45222735 -0.40730977 ....]

How to solve this?

alankongfq · 2020-08-30T10:53:30Z

I am facing the same issue with this but in abit weird fashion.
Long story short. I had setup 3 pipeline.

Train pipeline -- train on Azure ML with TF 2.0 - NC6s V2 (Cloud VM) --- training ok
Testing pipeline -- testing on a local machine with TF.2.3 RTX2070 --- predict results ok
Deployment pipeline -- NC6s V2 for inference with TF2.3. (Cloud VM) --- erractic behavior of model.predict

Pipeline 2 and 3, the environment is kept the same with the same code. The only difference is the hardware and GPU.
What baffled me is that the results prediction on local machine was as expected but when deployed in the cloud vm, it sometimes work, sometimes dont. What is even more weird is that if I run a inference on a few images in a sequence -- let say [image1, image2, image3] -- image1 and image3 would predict ok, but image2 would not have a complete prediction. For image2, most part of the prediction is working except for the last few tiles of the image.

I am at a lost here because i dont know where to start debugging and I cant just spin up my VM to test as it cost money. I am not sure if it is related to some memory issue or weights initlaization etc. Anyone has any pointers?

tu-curious · 2020-10-23T07:40:03Z

@alankongfq : Not even going that far, I found out after a whole day of debugging that my tf 2.1 model gives different predictions when run on CPU vs GPU keeping EVERYTHING else same (same machine, same OS, fixed saved weights, no randomness anywhere). I knew there are precision differences between the two devices, didn't realize they can be so significant. I think it has to do with particular NN architectures as well. With lots of parameters, sometimes a little error in each parameter accumulates to a BIG error in final predictions. The first point of the first answer to this SO question tries to tell the same thing- https://stackoverflow.com/questions/43221730/tensorflow-same-code-but-get-different-result-from-cpu-device-to-gpu-device
This guy also linked some closed Tensorflow GitHub issues which conclude that this is the expected behavior and not a bug. Hope this helps.

alankongfq · 2020-11-03T08:36:11Z

Hi @tu-curious, thanks for the pointers, will take a look at this closely when I had the time.

Dave-Vedant · 2021-03-04T15:19:46Z

I am experiencing the same problem, I trained a simple 4 dense layers neural network in the ubuntu 20.04 system and it gives me a max accuracy result of 94.05%. Where the same model in google colab is giving me an accuracy of 99.96%. I am wondering what is the reason after that? I also trained them multiple times and in each machine, the accuracy is constant in each result with (+-0.5%) but in different machines, they have large difference of 4.0%. Why???

hanzhuangsyr · 2021-03-29T23:58:34Z

I just fixed this problem. I thought that it was something wrong with Keras or TensorFlow, but it turns out that there is a bug in my code. This bug couldn't be found on my Windows computer, but this bug appears on my Linux computer. Wasted a lot of time.

wangchenouc mentioned this issue Oct 24, 2017

Can't transfer trained model between computers. Totally different results almost identical systems. #8149

Closed

fchollet closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saved model behaves differently on different machines #7676

Saved model behaves differently on different machines #7676

basaldella commented Aug 17, 2017 •

edited

Loading

RunshengSong commented Aug 30, 2017 •

edited

Loading

basaldella commented Aug 30, 2017

RunshengSong commented Aug 31, 2017 •

edited

Loading

rsmith49 commented Aug 31, 2017

basaldella commented Sep 1, 2017

halhenke commented Sep 1, 2017

rsmith49 commented Sep 1, 2017

basaldella commented Sep 13, 2017

wangchenouc commented Oct 24, 2017

basaldella commented Oct 24, 2017

wangchenouc commented Oct 25, 2017

basaldella commented Oct 25, 2017

wangchenouc commented Oct 26, 2017

basaldella commented Oct 26, 2017

philiprekers commented Jan 12, 2018

dterg commented Jan 30, 2018 •

edited

Loading

philiprekers commented Jan 31, 2018 •

edited

Loading

ghost commented Mar 21, 2018

han963xiao commented Nov 10, 2018

xiaoleitw commented Feb 23, 2019

xiaoleitw commented Feb 23, 2019

shrutimittal90 commented Mar 12, 2019

jewelcai commented Jul 30, 2019

urmilanayak commented Jun 11, 2020

alankongfq commented Aug 30, 2020

tu-curious commented Oct 23, 2020 •

edited

Loading

alankongfq commented Nov 3, 2020

Dave-Vedant commented Mar 4, 2021

hanzhuangsyr commented Mar 29, 2021

Saved model behaves differently on different machines #7676

Saved model behaves differently on different machines #7676

Comments

basaldella commented Aug 17, 2017 • edited Loading

RunshengSong commented Aug 30, 2017 • edited Loading

basaldella commented Aug 30, 2017

RunshengSong commented Aug 31, 2017 • edited Loading

rsmith49 commented Aug 31, 2017

basaldella commented Sep 1, 2017

halhenke commented Sep 1, 2017

rsmith49 commented Sep 1, 2017

basaldella commented Sep 13, 2017

wangchenouc commented Oct 24, 2017

basaldella commented Oct 24, 2017

wangchenouc commented Oct 25, 2017

basaldella commented Oct 25, 2017

wangchenouc commented Oct 26, 2017

basaldella commented Oct 26, 2017

philiprekers commented Jan 12, 2018

dterg commented Jan 30, 2018 • edited Loading

philiprekers commented Jan 31, 2018 • edited Loading

ghost commented Mar 21, 2018

han963xiao commented Nov 10, 2018

xiaoleitw commented Feb 23, 2019

xiaoleitw commented Feb 23, 2019

shrutimittal90 commented Mar 12, 2019

jewelcai commented Jul 30, 2019

urmilanayak commented Jun 11, 2020

alankongfq commented Aug 30, 2020

tu-curious commented Oct 23, 2020 • edited Loading

alankongfq commented Nov 3, 2020

Dave-Vedant commented Mar 4, 2021

hanzhuangsyr commented Mar 29, 2021

basaldella commented Aug 17, 2017 •

edited

Loading

RunshengSong commented Aug 30, 2017 •

edited

Loading

RunshengSong commented Aug 31, 2017 •

edited

Loading

dterg commented Jan 30, 2018 •

edited

Loading

philiprekers commented Jan 31, 2018 •

edited

Loading

tu-curious commented Oct 23, 2020 •

edited

Loading