-
Notifications
You must be signed in to change notification settings - Fork 109
Fix Keras->ONNX shape and BatchNorm conversion #213
Conversation
Change size handling to use 'None' instead of None Fix BatchNorm parameter conversion to correctly carry over mean and variance
keras2onnx/common/utils.py
Outdated
@@ -51,7 +51,7 @@ def set_logger_level(lvl): | |||
|
|||
@with_variable('batch_size') | |||
def get_default_batch_size(): | |||
return 'N' | |||
return 'None' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 'N' is used to specify the batch size as the symbol 'N'. Then in output shape, it also contains this 'N', so we know it is a match. If we use 'None', we cannot match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll revert this change, thanks
scale_tensor_name = scope.get_unique_variable_name('scale') | ||
container.add_initializer(scale_tensor_name, onnx_proto.TensorProto.FLOAT, params[0].shape, gamma) | ||
container.add_initializer(scale_tensor_name, onnx_proto.TensorProto.FLOAT, params[0].shape, params[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a unit test to cover this? Please check test_layers.py
for examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it's already covered in test_batch_normalization_2()
(in test_layers.py
). Interestingly, both before and after pass the test, as the unit tests only run once (i.e. against a single input), the moving mean and variance will in fact be 0 and 1. In this case, the resulting values are actually the same:
# Substituting 0, 1 for mean, variance
gamma := params[0] / sqrt(1 + epsilon) == params[0]
beta := params[1] - params[0] * 0 / sqrt(1 + epsilon) == params[1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test_batch_normalization_3
scale_tensor_name = scope.get_unique_variable_name('scale') | ||
container.add_initializer(scale_tensor_name, onnx_proto.TensorProto.FLOAT, params[0].shape, gamma) | ||
container.add_initializer(scale_tensor_name, onnx_proto.TensorProto.FLOAT, params[0].shape, params[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gama and beta actually was introduced by this PR:
onnx/onnxmltools#36, I don't have too much detail why this change was made.
Can you add some test cases to prove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added test_batch_normalization_3 which covers non-default (0/1) mean variance values and confirmed it fails with previous code and passes with these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just test this test_batch_normalization_3 on original code, and it actually passes, see here. Seems that the original code is also correct (an equivalent way to do this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm interesting. Perhaps there's something in my local environment that is making it fail instead, but I'm surprised as I'm able to follow the numerical differences. I'll dig further when I have a chance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I reinstalled keras2onnx on my machine (and also tried a virtualenv) and both are showing the original code passing. No idea how I was seeing failures before, very possible the two are equivalent. I'll keep looking
Please also fix the ci build failure...thanks. |
Corrected BatchNorm epsilon conversion Added BatchNorm test cases for non-default mean/variance values
@wschin, Can you still remember why gama/beta was calculated in this way? |
ci build failure is because of something else, I just fixed it. |
@@ -20,7 +20,7 @@ | |||
def _infer_variable_type(tensor): | |||
tensor_shape = [] | |||
if tensor.shape not in (tf.TensorShape(None), tf.TensorShape([])): | |||
tensor_shape = [d.value for d in tensor.shape] | |||
tensor_shape = [d.value if d.value is not None else 'None' for d in tensor.shape] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break the right behavior was designed in ONNX. If the right way cannot work for WINML, we need add a program option to be backward compatible with WINML.
@@ -47,26 +47,23 @@ def convert_keras_batch_normalization(scope, operator, container): | |||
if not op.center: | |||
params.insert(1, np.zeros(params[1].shape, dtype=float)) | |||
|
|||
gamma = params[0] / np.sqrt(params[3] + op.epsilon) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the paper, https://arxiv.org/pdf/1502.03167v3.pdf, the formula in p3. the original calculation is correct in the inference mode. Do you have any example to show the new change would be more accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is as follows. The Keras op has the following parameters:
param[0] := gamma
param[1] := beta
param[2] := moving_mean
param[3] := moving_variance
These are then applied as described in the paper (under Algorithm 1
):
x_norm = (x - moving_mean) / sqrt(moving_variance + epsilon)
y = gamma * x_norm + beta
The ONNX operator has the same parameters, but with slightly different naming
scale := gamma
B := beta
mean := moving_mean
var := moving_variance
However, the previous change appears to have interpreted these values instead as:
scale = gamma / sqrt(moving_variance + epsilon) # Most similar to x_norm's definition?
B = beta - gamma * moving_mean / sqrt(moving_variance + epsilon) # Unsure how to interpret this
# Perhaps trying to treat the scale and B(ias) tensors as the x_norm and y terms
# as opposed to as gamma and beta?
My understanding is it should just be a direct translation of gamma -> scale etc, but it's possible I missed the secret behind these formulas
Close it since the original code can pass the test. |
Yep I see where I was mistaken, sorry for the noise! |
#213) * fix bilstm/bigru bug when bw lst/gru's output is not followed by reverse nodes * add check before doing bigru rewrite * fix missed comma
Problem:
Solution:
Validation:
Converted YOLOv3 Keras models to ONNX, verified much better output tensor parity between Keras and ONNX (WinML) evaluation