Saw in comment
# Because onnx.GroupNorm() need size=group for weight and bias
# But the torch's aten function's input need size=channel, the size mismatched
# So we have to use onnx.InstanceNorm() to simulate
and previous discussion #644 (comment).
Reviving the thread to see if something can be done to use ONNX GroupNorm. The current workaround with InstanceNorm produces way more nodes which are not ideal for performance.
E.g.:
cc @justinchuby @xiaowuhu