Skip to content
This repository has been archived by the owner on Oct 13, 2021. It is now read-only.

Fix Keras->ONNX shape and BatchNorm conversion #213

Closed
wants to merge 3 commits into from
Closed

Fix Keras->ONNX shape and BatchNorm conversion #213

wants to merge 3 commits into from

Conversation

dhung-msft
Copy link

Problem:

  • Shapes previously used None to denote free dimensions, which were translated to ONNX as 0
  • BatchNorm parameters were incorrectly translated

Solution:

  • Changed shape handling to use 'None' (correctly translated to 'None' in ONNX)
  • Fixed BatchNorm parameter conversion to correctly carry over beta, gamma, mean, and variance

Validation:
Converted YOLOv3 Keras models to ONNX, verified much better output tensor parity between Keras and ONNX (WinML) evaluation

Change size handling to use 'None' instead of None
Fix BatchNorm parameter conversion to correctly carry over mean and
variance
@CLAassistant
Copy link

CLAassistant commented Aug 28, 2019

CLA assistant check
All committers have signed the CLA.

@@ -51,7 +51,7 @@ def set_logger_level(lvl):

@with_variable('batch_size')
def get_default_batch_size():
return 'N'
return 'None'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 'N' is used to specify the batch size as the symbol 'N'. Then in output shape, it also contains this 'N', so we know it is a match. If we use 'None', we cannot match

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert this change, thanks

scale_tensor_name = scope.get_unique_variable_name('scale')
container.add_initializer(scale_tensor_name, onnx_proto.TensorProto.FLOAT, params[0].shape, gamma)
container.add_initializer(scale_tensor_name, onnx_proto.TensorProto.FLOAT, params[0].shape, params[0])
Copy link
Collaborator

@jiafatom jiafatom Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a unit test to cover this? Please check test_layers.py for examples.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's already covered in test_batch_normalization_2() (in test_layers.py). Interestingly, both before and after pass the test, as the unit tests only run once (i.e. against a single input), the moving mean and variance will in fact be 0 and 1. In this case, the resulting values are actually the same:

# Substituting 0, 1 for mean, variance
gamma := params[0] / sqrt(1 + epsilon) == params[0]
beta := params[1] - params[0] * 0 / sqrt(1 + epsilon) == params[1]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test_batch_normalization_3

scale_tensor_name = scope.get_unique_variable_name('scale')
container.add_initializer(scale_tensor_name, onnx_proto.TensorProto.FLOAT, params[0].shape, gamma)
container.add_initializer(scale_tensor_name, onnx_proto.TensorProto.FLOAT, params[0].shape, params[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gama and beta actually was introduced by this PR:
onnx/onnxmltools#36, I don't have too much detail why this change was made.
Can you add some test cases to prove it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added test_batch_normalization_3 which covers non-default (0/1) mean variance values and confirmed it fails with previous code and passes with these changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just test this test_batch_normalization_3 on original code, and it actually passes, see here. Seems that the original code is also correct (an equivalent way to do this)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm interesting. Perhaps there's something in my local environment that is making it fail instead, but I'm surprised as I'm able to follow the numerical differences. I'll dig further when I have a chance

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I reinstalled keras2onnx on my machine (and also tried a virtualenv) and both are showing the original code passing. No idea how I was seeing failures before, very possible the two are equivalent. I'll keep looking

@jiafatom
Copy link
Collaborator

Please also fix the ci build failure...thanks.

Corrected BatchNorm epsilon conversion
Added BatchNorm test cases for non-default mean/variance values
@wenbingl wenbingl requested a review from wschin August 29, 2019 18:52
@wenbingl
Copy link
Member

@wschin, Can you still remember why gama/beta was calculated in this way?

@jiafatom
Copy link
Collaborator

Please also fix the ci build failure...thanks.

ci build failure is because of something else, I just fixed it.

@@ -20,7 +20,7 @@
def _infer_variable_type(tensor):
tensor_shape = []
if tensor.shape not in (tf.TensorShape(None), tf.TensorShape([])):
tensor_shape = [d.value for d in tensor.shape]
tensor_shape = [d.value if d.value is not None else 'None' for d in tensor.shape]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break the right behavior was designed in ONNX. If the right way cannot work for WINML, we need add a program option to be backward compatible with WINML.

@@ -47,26 +47,23 @@ def convert_keras_batch_normalization(scope, operator, container):
if not op.center:
params.insert(1, np.zeros(params[1].shape, dtype=float))

gamma = params[0] / np.sqrt(params[3] + op.epsilon)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the paper, https://arxiv.org/pdf/1502.03167v3.pdf, the formula in p3. the original calculation is correct in the inference mode. Do you have any example to show the new change would be more accurate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is as follows. The Keras op has the following parameters:

param[0] := gamma
param[1] := beta
param[2] := moving_mean
param[3] := moving_variance

These are then applied as described in the paper (under Algorithm 1):

x_norm = (x - moving_mean) / sqrt(moving_variance + epsilon)
y = gamma * x_norm + beta

The ONNX operator has the same parameters, but with slightly different naming

scale := gamma
B     := beta
mean  := moving_mean
var   := moving_variance

However, the previous change appears to have interpreted these values instead as:

scale = gamma / sqrt(moving_variance + epsilon) # Most similar to x_norm's definition?
B     = beta - gamma * moving_mean / sqrt(moving_variance + epsilon) # Unsure how to interpret this
# Perhaps trying to treat the scale and B(ias) tensors as the x_norm and y terms
# as opposed to as gamma and beta?

My understanding is it should just be a direct translation of gamma -> scale etc, but it's possible I missed the secret behind these formulas

@jiafatom
Copy link
Collaborator

Close it since the original code can pass the test.

@jiafatom jiafatom closed this Aug 31, 2019
@dhung-msft
Copy link
Author

Yep I see where I was mistaken, sorry for the noise!

wenbingl pushed a commit that referenced this pull request Oct 22, 2019
#213)

* fix bilstm/bigru bug when bw lst/gru's output is not followed by reverse nodes

* add check before doing bigru rewrite

* fix missed comma
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants