Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove iconv dependence in unicode.c #186

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

JoesCat
Copy link
Contributor

@JoesCat JoesCat commented Oct 24, 2023

Best to submit this PR with a large number of changes before going further into other functions where unicode.c can be used as well.

…ines

The simplified 'make check' is based on similar structure I put together
for libspiro:2013-07-22 'Run "make check" to test spiro.c UNIT TEST'
and again for libuninameslist 20170319, plus some other projects.
This simplified 'make check' works okay for older configure.ac/Makefile.am
found in older/mature linux distros since the testsuite.at in 2012..2013
had trouble building with some distros. There is no testsuite.at here.
ucs2_strlen() was only able to act as a strlen() type function but was
inaccurate for working with chars 0x10000..0x10ffff which are coded as
pairs {0xd800..0xdbff}:{0xdc00..0xdfff} which is one char but uses two
utf-16 values. String is tested as utf-16le regardless of CPU endian.

mode=0: acts like the original ucs2_strlen(), which is still useful to
find a count useful for buffers.

mode=1: is a hybrid, which will count one char for the code pairs used
to create values 0x10000..0x10ffff, and does a soft fail to count code
as separate values if the code pair isn't grouped together. This could
be useful in some situations.

mode=2: does a strict check and returns length=-1 if the code pairs is
out of sequence. This is useful to get the right char count and verify
that the extended char code pairs are in the right sequence.
Convert utf16 coded characters to utf8. This also fixes the output buffer
to allow for a worst-case of 1024 x 4 utf8 chars which is possible if all
chars are from range 0x10000..0x10ffff.
This is a subset of a converter made for fontforge, based on utf8_idpb().
See function: 2013-10-06, Expanded utf8_idpb() to output up to 0x7FFFFFFF
Convert utf8 coded characters to utf16. This also fixes the output buffer
to allow for a worst-case of 1024 utf16 chars in range 0x10000..0x10ffff.
This is an optimized subset of a converter made for fontforge with added
checks for bad utf-8 codes. Default behaviour for this function returns a
zero length string. Added another mode to return NULL since it's possible
to receive zero length strings that are not errors (example: empty line).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant