-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic results on GPU #34
Comments
Hi, It's perfectly normal, especially if you run GPU. Some optimizations that Tensorflow performs are non-deterministic, so you'll get slightly different results every time. There's usually no need to make it perfectly deterministic, and it can come with a speed penalty, but if you're interested, take a look at this article. |
Hi @emedvedev , I trained the net on GPU, freeze the model and running the frozen graph(model) on CPU. The results are not slightly different, but way off. I'll checkout the article you've referred. Thank you. |
Interesting article-- @tumusudheer maybe we can investigate uses of the non-deterministic functions and change them out. I've run into the same issue, but with longer phrases I often get actual different predicted text. |
Hi @ckirmse, Sure, sounds good to me. I'll keep posted if I find a fix. Please let me know if find a fix. Thanks. |
I believe the source of non-deterministic behavior is how the CNN is initialized. Namely, this line: cnn_model = CNN(self.img_data, True) should actually be: cnn_model = CNN(self.img_data, not self.forward_only) Otherwise, dropout (which will randomly remove connections from the output of the CNN) is performed even during testing. |
@reidjohnson oh, good catch! Committed the fix. @tumusudheer @ckirmse could you verify that this behavior is fixed (or at least significantly reduced) in the latest |
Oh wow, yeah that's quite poor! Good catch. I'll test out tonight. |
This would have affected the exported graphs, too, I think. |
OK, confirmed that this fixed the non-determinism of prediction for me. That's really good! As for the exported graphs--it should be building a test/prediction graph (no dropout), not using what's in the checkpoint, right @emedvedev ? |
@ckirmse I think so. |
@ckirmse @emedvedev i'm not sure that is the case and that may actually be our issue here. as far as I can tell, our export stuff pulls from the checkpoint_state, which is only saved during training, meaning it's likely saving the model as prepared for training. this is probably why |
@mattfeury yeah I agree--that's what I was trying to say but I now realize my statement was vague. I meant to say "exporting should be building a test/prediction graph (no dropout), but as of right now it is using what's in the checkpoint which does have dropout, so that needs to be changed". I'm hopeful that fixing that will fix #25. |
ok i'm going to try and get up to speed with that code and see what i can do |
Hi All, The fix Here is the code I've used to freeze binary graph without weights.
Just use the same parameters as 'aocr test' After this I've used freeze_graph utility from tensorflow as follows:
The both steps can be combined into a single step. The final test_frozen_graph.pb working well for me. Hope it helps. |
Hi @emedvedev , Closing this as the fix is working great. Thanks. |
Hi @emedvedev ,
I ran test on same image multiple times using the readme command.
aocr test ./datasets/testing.tfrecords
Every time I ran the command, I'm getting same predicted word as output, but the inference probabilities are changing (including loss as well).
Run1:
Step 1 (1.096s). Accuracy: 100.00%, loss: 0.000364, perplexity: 1.00036, probability: 93.33% 100%
Run2:
Step 1 (0.988s). Accuracy: 100.00%, loss: 0.000260, perplexity: 1.00026, probability: 92.58% 100%
I've observed the same behavior when I used frozen checkpoint as well (probabilities are changing for the same image). Any reason why this is happening as it should not happen. Please let me know how to fix it.
The text was updated successfully, but these errors were encountered: