Describe the bug
When using the MSVC compiler with /utf-8, std::println truncates overly long UTF-8 strings at internal buffer boundaries (replaced with U+FFFD replacement characters) when formatting arguments are used (std::println("{}", str)).
Reproduction Code and Output
#include <print>
int main()
{
std::println("{}", "这是一段超长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长的文本。");
}
Compilation Command: cl.exe /std:c++latest /utf-8 repro.cpp
Compiler Version: 用于 x86 的 Microsoft (R) C/C++ 优化编译器 19.50.35718 版
Expected Behavior: The UTF-8 string is output completely and correctly.
Observed Behavior: 这是一段超长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长���的文本。
Possible Cause
The issue is likely caused by the output mechanism splitting the long string into small, fixed-size chunks (e.g., 256 bytes) before sending them to the console.
The failure is suspected to lie in the specialized handler responsible for committing these chunks to the console: _Fmt_iterator_flush<_Print_to_unicode_console_it>.
This handler, which manages the UTF-8 to Console conversion, appears to simply pass the raw byte chunk's range (_First to _Last) to the underlying write function without ensuring the chunk contains complete UTF-8 characters:
// https://github.com/microsoft/STL/blob/main/stl/inc/print
template <>
struct _Fmt_iterator_flush<_Print_to_unicode_console_it> {
static _Print_to_unicode_console_it _Flush(
const char* const _First, const char* const _Last, _Print_to_unicode_console_it _Output) {
_STD _Print_noformat_unicode_to_console_nonlocking(_Output._Get_console_handle(), {_First, _Last});
return _Output;
}
};
If a chunk ends in the middle of a multi-byte UTF-8 character, committing the incomplete sequence at this point may cause the downstream MultiByteToWideChar conversion to fail, resulting in the observed U+FFFD characters. This suggests the necessary UTF-8 boundary check logic may be missing from this specific specialization before the data is written to the console handle.
Describe the bug
When using the MSVC compiler with
/utf-8,std::printlntruncates overly long UTF-8 strings at internal buffer boundaries (replaced with U+FFFD replacement characters) when formatting arguments are used (std::println("{}", str)).Reproduction Code and Output
Compilation Command:
cl.exe /std:c++latest /utf-8 repro.cppCompiler Version:
用于 x86 的 Microsoft (R) C/C++ 优化编译器 19.50.35718 版Expected Behavior: The UTF-8 string is output completely and correctly.
Observed Behavior:
这是一段超长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长长���的文本。Possible Cause
The issue is likely caused by the output mechanism splitting the long string into small, fixed-size chunks (e.g., 256 bytes) before sending them to the console.
The failure is suspected to lie in the specialized handler responsible for committing these chunks to the console: _Fmt_iterator_flush<_Print_to_unicode_console_it>.
This handler, which manages the UTF-8 to Console conversion, appears to simply pass the raw byte chunk's range (_First to _Last) to the underlying write function without ensuring the chunk contains complete UTF-8 characters:
If a chunk ends in the middle of a multi-byte UTF-8 character, committing the incomplete sequence at this point may cause the downstream MultiByteToWideChar conversion to fail, resulting in the observed U+FFFD characters. This suggests the necessary UTF-8 boundary check logic may be missing from this specific specialization before the data is written to the console handle.