Summary
Paraformer::Forward in runtime/onnxruntime/src/paraformer.cpp reads an ONNX int32 tensor (outputTensor[1], encoder_out_lens) as int64_t*, dereferencing 8 bytes from a 4-byte allocation. AddressSanitizer catches a heap-buffer-overflow read on every single inference call.
Location
runtime/onnxruntime/src/paraformer.cpp (current main), line 512 (and consumers at lines 529, 531, 538, 540):
auto outputTensor = m_session_->Run(...);
...
auto encoder_out_lens = outputTensor[1].GetTensorMutableData<int64_t>();
...
result = GreedySearch(floatData, *encoder_out_lens, outputShape[2]); // line 538
For at least the paraformer-large-contextual ONNX model (speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx from ModelScope), outputTensor[1] is allocated as a 4-byte int32 tensor. Treating it as int64_t* and dereferencing reads 8 bytes, overflowing 4 bytes past the buffer.
ASAN evidence
Rebuilt with -fsanitize=address and triggered a single inference call:
==1==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x... at pc 0x... thread T...
READ of size 8 at 0x... thread T...
#0 0x... in funasr::Paraformer::Forward[abi:cxx11](float**, int*, bool, ...) /onnxruntime/src/paraformer.cpp:538
#1 0x... in FunTpassInferBuffer ... /onnxruntime/src/funasrruntime.cpp:575
#2 0x... in funasr_tpass_offline_infer ... /onnxruntime/src/funasr_capi.cpp:287
0x...d44 is located 0 bytes to the right of 4-byte region [0x...d40, 0x...d44)
allocated by thread T... here:
#0 0x... in __interceptor_posix_memalign
#1 0x... in libonnxruntime.so.1.14.0
#2 0x... in libonnxruntime.so.1.14.0
The allocation is exactly 4 bytes (the int32 element). The read is 8 bytes (int64_t). Triggered every time the model runs, on the very first inference.
Why production typically doesn't crash
The 4-byte overrun reads into ONNX runtime's posix_memalign padding / glibc tcache freelist metadata, which usually contains zeros. *encoder_out_lens then truncates back to int when passed to GreedySearch/BeamSearch, and the result happens to equal the correct value.
This is undefined behavior. Any change in glibc allocator behavior (a tcache fill pattern change in a future glibc version, swapping in jemalloc, ASAN/MSAN/HWASAN builds, or simply different load patterns) can make those 4 bytes return garbage and produce wrong decoder output — silently, with no crash to flag it.
Proposed fix
GreedySearch(float*, int n_len, ...) and BeamSearch(... int len, ...) (declared in the same file) take their length argument as int. Reading the tensor as int32_t* and passing *encoder_out_lens (4 bytes) to a function expecting int is semantically identical to the current int64_t* → truncate path, but without the UB:
- auto encoder_out_lens = outputTensor[1].GetTensorMutableData<int64_t>();
+ auto encoder_out_lens = outputTensor[1].GetTensorMutableData<int32_t>();
Verified: with the one-line change, ASAN no longer reports the OOB on inference, and the decoded text is byte-identical to what the existing build produces. Tested on paraformer-large-contextual end-to-end with hundreds of requests.
Suggested follow-up
If the ONNX model can in principle emit either int32 or int64 for this output across model variants, the proper fix is to inspect the tensor element type at runtime via GetTensorTypeAndShapeInfo().GetElementType() and branch. But for at least the published ModelScope paraformer-large-contextual model, the emitted dtype is unambiguously int32, and the current code is reading it incorrectly.
Summary
Paraformer::Forwardinruntime/onnxruntime/src/paraformer.cppreads an ONNXint32tensor (outputTensor[1],encoder_out_lens) asint64_t*, dereferencing 8 bytes from a 4-byte allocation. AddressSanitizer catches a heap-buffer-overflow read on every single inference call.Location
runtime/onnxruntime/src/paraformer.cpp(currentmain), line 512 (and consumers at lines 529, 531, 538, 540):For at least the
paraformer-large-contextualONNX model (speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnxfrom ModelScope),outputTensor[1]is allocated as a 4-byte int32 tensor. Treating it asint64_t*and dereferencing reads 8 bytes, overflowing 4 bytes past the buffer.ASAN evidence
Rebuilt with
-fsanitize=addressand triggered a single inference call:The allocation is exactly 4 bytes (the int32 element). The read is 8 bytes (
int64_t). Triggered every time the model runs, on the very first inference.Why production typically doesn't crash
The 4-byte overrun reads into ONNX runtime's
posix_memalignpadding / glibc tcache freelist metadata, which usually contains zeros.*encoder_out_lensthen truncates back tointwhen passed toGreedySearch/BeamSearch, and the result happens to equal the correct value.This is undefined behavior. Any change in glibc allocator behavior (a tcache fill pattern change in a future glibc version, swapping in jemalloc, ASAN/MSAN/HWASAN builds, or simply different load patterns) can make those 4 bytes return garbage and produce wrong decoder output — silently, with no crash to flag it.
Proposed fix
GreedySearch(float*, int n_len, ...)andBeamSearch(... int len, ...)(declared in the same file) take their length argument asint. Reading the tensor asint32_t*and passing*encoder_out_lens(4 bytes) to a function expectingintis semantically identical to the currentint64_t*→ truncate path, but without the UB:Verified: with the one-line change, ASAN no longer reports the OOB on inference, and the decoded text is byte-identical to what the existing build produces. Tested on
paraformer-large-contextualend-to-end with hundreds of requests.Suggested follow-up
If the ONNX model can in principle emit either
int32orint64for this output across model variants, the proper fix is to inspect the tensor element type at runtime viaGetTensorTypeAndShapeInfo().GetElementType()and branch. But for at least the published ModelScopeparaformer-large-contextualmodel, the emitted dtype is unambiguouslyint32, and the current code is reading it incorrectly.