Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
🍄 [[ Builder ]] Further work on encoding/decoding module #1754
In particular, I have:
Not done yet
There are still some unresolved issues when decoding UTF-8, UTF-16 and UTF-32.
Several of these issues could be solved by allowing flags to be passed to the codec functions, at the cost of making the API even more complicated.
Conflicts: engine/src/modules.cpp libscript/libscript.xcodeproj/project.pbxproj libscript/libstdscript-modules.list toolchain/lc-compile/lc-compile.xcodeproj/project.pbxproj toolchain/lc-compile/src/module-helper.cpp
…when encoding (resp. decoding) [[ LCB StdLib ]] Return undefined when replacement is unspecified and encoding or decoding fails
Remove both MCStringGetNativeChars() and MCStringGetNativeCharsWithReplacement() from the public libfoundation API.
* Return true on success/false on failure from MCStringGetNativeChars() and MCStringGetNativeCharsWithReplacement(). * Add an in parameter for size of character buffer provided and an out parameter for number of characters used/needed. * Allow multi-byte replacement sequences for characters that can't be represented in the native encoding, and add an explicit in parameter for the replacement sequence length.
Add a new function that allows more efficient conversion to ASCII, without going via the native encoding.
…ding. The "native" encoding available to script programs is rarely correct for modern systems, e.g.: * Almost all Linux systems use UTF-8, not ISO-8859-1. * All Macs use UTF-8, not MacRoman. Block all Builder programs from using "native" to specify an encoding. If someone's absolutely determined to get "native encoding exactly as in script", they can use "-native" to bypass the check.
…acement(). Allow zero-byte and multibyte replacement sequences. Note that it is *not* an error for the replacement array to contain non-native (or non-ASCII) values. **NOTE**: This patch means that "ASCII" encoding **really is** ASCII encoding, not the native encoding discarding bytes > 127.
Discussion with @runrevmark today (offline).
About the unresolved issues: the thing that determines a decoding error is if we encounter something while decoding that we can't represent internally. That means:
A couple of additional things were brought up:
We need to decide how far we want to go with all of the above before the next release.