-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Description
I have a quite simple CNN (6 conv layers, 3 fc layers). I implemented it in keras and again in plain tensorflow.
Running with a learning rate of 0.01 with keras works fine (using categorical_crossentropy and SGD and tensorflow backend). When running my plain tensorflow implementation (learning rate 0.01, softmax_cross_entropy_with_logits, MomentumOptimizer) I receive an error:
"Invalid argument: ReluGrad input is not finite. : Tensor had Inf values"
I have to lower the learning rate down to 0.0001 to make it converge in plain tensorflow.
I realized that keras has its own cross_entropy and sgd implementation. When using the keras cross_entropy implementation in my plain tensorflow code I do not receive errors with high learning rates anymore. However the model does not converge with high learning rates. Why does it work so much better with keras? Is it the custom SGD implementation that is better than the tensorflow MomentumOptimizer implementation?
I would like my plain tensorflow model also to converge with learning rate 0.01.