remove iconv dependence in unicode.c #186

JoesCat · 2023-10-24T06:34:21Z

Best to submit this PR with a large number of changes before going further into other functions where unicode.c can be used as well.

…ines The simplified 'make check' is based on similar structure I put together for libspiro:2013-07-22 'Run "make check" to test spiro.c UNIT TEST' and again for libuninameslist 20170319, plus some other projects. This simplified 'make check' works okay for older configure.ac/Makefile.am found in older/mature linux distros since the testsuite.at in 2012..2013 had trouble building with some distros. There is no testsuite.at here.

ucs2_strlen() was only able to act as a strlen() type function but was inaccurate for working with chars 0x10000..0x10ffff which are coded as pairs {0xd800..0xdbff}:{0xdc00..0xdfff} which is one char but uses two utf-16 values. String is tested as utf-16le regardless of CPU endian. mode=0: acts like the original ucs2_strlen(), which is still useful to find a count useful for buffers. mode=1: is a hybrid, which will count one char for the code pairs used to create values 0x10000..0x10ffff, and does a soft fail to count code as separate values if the code pair isn't grouped together. This could be useful in some situations. mode=2: does a strict check and returns length=-1 if the code pairs is out of sequence. This is useful to get the right char count and verify that the extended char code pairs are in the right sequence.

Convert utf16 coded characters to utf8. This also fixes the output buffer to allow for a worst-case of 1024 x 4 utf8 chars which is possible if all chars are from range 0x10000..0x10ffff. This is a subset of a converter made for fontforge, based on utf8_idpb(). See function: 2013-10-06, Expanded utf8_idpb() to output up to 0x7FFFFFFF

Convert utf8 coded characters to utf16. This also fixes the output buffer to allow for a worst-case of 1024 utf16 chars in range 0x10000..0x10ffff. This is an optimized subset of a converter made for fontforge with added checks for bad utf-8 codes. Default behaviour for this function returns a zero length string. Added another mode to return NULL since it's possible to receive zero length strings that are not errors (example: empty line).

JoesCat added 4 commits October 19, 2023 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove iconv dependence in unicode.c #186

remove iconv dependence in unicode.c #186

JoesCat commented Oct 24, 2023

remove iconv dependence in unicode.c #186

Are you sure you want to change the base?

remove iconv dependence in unicode.c #186

Conversation

JoesCat commented Oct 24, 2023