Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STFT op has the wrong expected shape #14316

Open
milesial opened this issue Jan 16, 2023 · 1 comment
Open

STFT op has the wrong expected shape #14316

milesial opened this issue Jan 16, 2023 · 1 comment
Labels
core runtime issues related to core runtime

Comments

@milesial
Copy link

Describe the issue

The STFT op has the wrong "expected" shape that doesn't match it's output (correct) shape because of some off-by-one error.

2023-01-17 00:16:53.852917009 [W:onnxruntime:, execution_frame.cc:828 VerifyOutputSizes] Expected shape from model of {1,3002,201,2} does not match actual shape of {1,3001,201,2} for output Y
onnx output shape (1, 3001, 201, 2)

Originally from pytorch/pytorch#92087 (comment)

This prevents adding new nodes after the STFT node since there will be expected shape mismatch.

To reproduce

Here is a standalone repro that creates a graph with only a STFT op, and we see the runtime throw the issue. Also happens with optimization enabled.

from onnx import TensorProto
from onnx import numpy_helper, TensorProto
from onnx.helper import make_model, make_node, make_graph, make_tensor_value_info, make_operatorsetid
from onnx.checker import check_model
import numpy as np
import onnxruntime as rt

X = make_tensor_value_info('X', TensorProto.FLOAT, [1, 480400])
frame_step = numpy_helper.from_array(np.array(160, dtype=np.int64), name='frame_step')
window = numpy_helper.from_array(np.ones(400, dtype=np.float32), name='window')
frame_length = numpy_helper.from_array(np.array(400, dtype=np.int64), name='frame_length')

Y = make_tensor_value_info('Y', TensorProto.FLOAT, [None, None, None, None])
node = make_node('STFT', ['X', 'frame_step', 'window', 'frame_length'], ['Y'])

graph = make_graph([node], 'test2', [X], [Y], [frame_step, window, frame_length])
onnx_model = make_model(graph, opset_imports=[make_operatorsetid('', 17)])
check_model(onnx_model)

with open("test.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())
    
    
sess_options = rt.SessionOptions()
sess_options.graph_optimization_level = rt.GraphOptimizationLevel.ORT_DISABLE_ALL
session = rt.InferenceSession('test.onnx', sess_options)
print('onnx output shape', session.run(None, {'X': np.random.randn(1, 480400).astype(np.float32)})[0].shape)

2023-01-17 00:16:53.852917009 [W:onnxruntime:, execution_frame.cc:828 VerifyOutputSizes] Expected shape from model of {1,3002,201,2} does not match actual shape of {1,3001,201,2} for output Y

onnx output shape (1, 3001, 201, 2)

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@HectorSVC HectorSVC added the core runtime issues related to core runtime label Jan 17, 2023
@jstoecker
Copy link
Contributor

The shape inference logic in the ONNX definition looks a bit strange to me:

bool is_onesided = static_cast<bool>(getAttribute(ctx, "onesided", 0));
if (is_onesided) {
  dft_size = is_onesided ? ((dft_size >> 1) + 1) : dft_size;
}

auto n_dfts = static_cast<int64_t>((signal_size - dft_size) / static_cast<float>(frame_step_value)) + 1;

// The output has the following shape: [batch_size][frames][dft_unique_bins][2]
ONNX_NAMESPACE::TensorShapeProto result_shape_proto;
result_shape_proto.add_dim()->set_dim_value(input_shape.dim(0).dim_value()); // batch size
result_shape_proto.add_dim()->set_dim_value(n_dfts);
result_shape_proto.add_dim()->set_dim_value(dft_size);
result_shape_proto.add_dim()->set_dim_value(2);

It looks like the author intended dft_size to represent the window/frame length, but then repurposed it to represent the dft_unique_bins mentioned in the description. That makes the math for computing n_dfts wrong. I would instead expect something like this:

bool is_onesided = static_cast<bool>(getAttribute(ctx, "onesided", 0));
-if (is_onesided) {
-  dft_size = is_onesided ? ((dft_size >> 1) + 1) : dft_size;
-}
+ int64_t dft_unique_bins = is_onesided ? ((dft_size >> 1) + 1) : dft_size;

auto n_dfts = static_cast<int64_t>((signal_size - dft_size) / static_cast<float>(frame_step_value)) + 1;

// The output has the following shape: [batch_size][frames][dft_unique_bins][2]
ONNX_NAMESPACE::TensorShapeProto result_shape_proto;
result_shape_proto.add_dim()->set_dim_value(input_shape.dim(0).dim_value()); // batch size
result_shape_proto.add_dim()->set_dim_value(n_dfts);
-result_shape_proto.add_dim()->set_dim_value(dft_size);
+result_shape_proto.add_dim()->set_dim_value(dft_unique_bins);
result_shape_proto.add_dim()->set_dim_value(2);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime
Projects
None yet
Development

No branches or pull requests

3 participants