Fix UTF iterators end too early. #9797

Uhf7 · 2021-04-24T22:07:38Z

Fixes the bug in Utf16_Iter described here: #9599 (comment). I introduced this bug in 9599, it is a regression. Sorry for that. The problem is: If the last character of an UTF-16-coded file is greater then 0x7F, and hence needs more than one byte in UTF-8-encoding, only the 1st byte of the UTF-8 sequence arrives in the text buffer.

A similar bug does exist in Utf8_Iter, which is fixed too. This bug, at least, is no regression. It is also harder to reproduce. When writing an UTF-16-coded file, and

the code point of the last character in the text buffer is above 0x0FFFF, which means, two 16-bit codes need to be written instead of one, and
the position of the last character in the text buffer is 65536 (or any multiple of this),

then only the first 16-bit code is written to the file.

The 65536 comes from the size of the intermediate buffer which is used while conversion:

notepad-plus-plus/PowerEditor/src/Utf8_16.cpp

Line 344 in 6750be3

static const int bufSize = 64*1024;

Fix Utf iterators end too early.

6ad8ec5

donho self-assigned this Apr 25, 2021

chcg added bug regression labels Apr 25, 2021

donho added the accepted label Apr 26, 2021

donho closed this in 9734d81 Apr 26, 2021

sasumner mentioned this pull request May 4, 2021

Opening File Failing (Buffer Boundary) #8966

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix UTF iterators end too early. #9797

Fix UTF iterators end too early. #9797

Uhf7 commented Apr 24, 2021

Fix UTF iterators end too early. #9797

Fix UTF iterators end too early. #9797

Conversation

Uhf7 commented Apr 24, 2021