Rework the interface for the iconv.convert FFI bindings#662
Merged
rdw-software merged 7 commits intomainfrom Feb 16, 2025
Merged
Rework the interface for the iconv.convert FFI bindings#662rdw-software merged 7 commits intomainfrom
rdw-software merged 7 commits intomainfrom
Conversation
The JIT may reduce benchmarks to basically zero in some scenarios, which is of course misleading. Seeing both versions side-by-side should help in those cases.
C++ has a bunch of tools for this kind of thing, but none seemed suitable at first glance. I'll revisit it later if needed.
a6cf395 to
b1de00e
Compare
The original version contained a lot of hacks and inefficiencies. I took another stab at the interface and decided to make some changes to how the C++ wrapper works, which may or may not make for a better overall experience (TBD): --- Aggregating `iconv_convert` parameters This is one of two breaking changes to the API, although the functionality is mostly the same: * Instead of taking six primitive arguments, the `iconv.convert` binding now takes a single struct pointer * These represent the status of an ongoing charset conversion request, which is obscured by libiconv's API * For the typical use case, there's no drawback - except for a `ffi.new` that can be cached (see benchmarks) * Advantages are a shorter signature, plus the ability to implement APis on top (for async/streaming, later) --- Error handling updates for `iconv_convert` Previously the error handling was mostly "pray it works or debug iconv error codes". Now it's a little better: * The `iconv_convert` interface always returns a `iconv_result_t` (enum value) with corresponding error strings * This is a lot more granular and matches other APIs with saner behavior, like libuv/luv (see `iconv.strerror`) * Unfortunately not all types of errors can be differentiated, because libiconv only uses three (?) error codes * There's also some cases where libiconv will do nothing or reset the handle, which the runtime can't detect --- `iconv_try_close` moved to the C++ layer This is just because detecting the iconv error code via typecasts is annoying, and doing it twice even more so. --- `CHARSET_CONVERSION_FAILED` shared constant removed This was a hack to more easily detect conversion failures, because libiconv's error handling demands it. Constants shouldn't be stored in the static exports tables, because the runtime checks they're populated with function pointers (that aren't `nullptr`) on startup. This implicitly assumes only pointers are stored in the structure, and all members are uniform. In this case it wasn't a problem as the constant was the very last member. However, it seems like a footgun best avoided. It's also completely redundant if the constants are exported as `cdefs` and can be used from the FFI directly. --- Utility methods for error handling and validation The bindings don't export all of them yet (some use C++ types in their signature/are inlined/NOOPs if optimized): * `iconv_strerror` works exactly like `uv_strerror`; it could become the default way of handling errors everywhere * It allows checking enum values with `ffi.C.ICONV_RESULT_OK` and translating to Lua failure tuples easily * Custom errors are independent of libiconv returns - they cover all `errnor` codes and then some * `iconv_check_result` and `iconv_check_errno` are basically more readable versions of the previous unit test assertions --- NYI: Character encoding types (`iconv_encoding_t` placeholder) The API still uses string (`const char*`) identifiers for the libiconv encodings. Which ones are available depends on the libiconv implementation and the system itself. The API currently makes no guarantees as to which can be used. ideally, it should not only do that for all supported encodings (with unit tests), but also enable specific errors when an encoding isn't supported by the implementation or is completely unknown. That's something to be added in a future patch. --- Performance improvements and benchmarks The benchmarks are an afterthought and clearly there's a lot of work needed in that domain still. For now, I removed a bunch of unneeded allocations and cached the largest time-wasters (I/O buffer allocations) via upvalues, to make sure all of the bindings are roughly equivalent. I'm keeping an eye on this part but it isn't a primary focus right now. --- New `ASSUME` macro for non- `NDEBUG` builds There's not currently any way to easily turn this on (requires Ninja build updates, separate issue). I haven't settled on a way to verify assumptions at runtime because all of the available options seemed a bit meh at best. This approach is an experiment which seemed to work well in my testing. It's just C `assert` with some sugar - we'll see how that goes. --- The actual libiconv interface is of course unchanged; it's still not very user-friendly but I can't help that.
I wouldn't expect any name clashes here, but whatever.
I guess it's not currently exported, so using const is fine.
Surely no one would ever try that... right?
b1de00e to
891e9e2
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The original version contained a lot of hacks and inefficiencies. I took another stab at the interface and decided to make some changes to how the C++ wrapper works, which may or may not make for a better overall experience (TBD):
Aggregating
iconv_convertparametersThis is one of two breaking changes to the API, although the functionality is mostly the same:
iconv.convertbinding now takes a single struct pointerffi.newthat can be cached (see benchmarks)Error handling updates for
iconv_convertPreviously the error handling was mostly "pray it works or debug iconv error codes". Now it's a little better:
iconv_convertinterface always returns aiconv_result_t(enum value) with corresponding error stringsiconv.strerror)iconv_try_closemoved to the C++ layerThis is just because detecting the iconv error code via typecasts is annoying, and doing it twice even more so.
CHARSET_CONVERSION_FAILEDshared constant removedThis was a hack to more easily detect conversion failures, because libiconv's error handling demands it. Constants shouldn't be stored in the static exports tables, because the runtime checks they're populated with function pointers (that aren't
nullptr) on startup. This implicitly assumes only pointers are stored in the structure, and all members are uniform. In this case it wasn't a problem as the constant was the very last member. However, it seems like a footgun best avoided.It's also completely redundant if the constants are exported as
cdefsand can be used from the FFI directly.Utility methods for error handling and validation
The bindings don't export all of them yet (some use C++ types in their signature/are inlined/NOOPs if optimized):
iconv_strerrorworks exactly likeuv_strerror; it could become the default way of handling errors everywhereffi.C.ICONV_RESULT_OKand translating to Lua failure tuples easilyerrnorcodes and then someiconv_check_resultandiconv_check_errnoare basically more readable versions of the previous unit test assertionsNYI: Character encoding types (
iconv_encoding_tplaceholder)The API still uses string (
const char*) identifiers for the libiconv encodings. Which ones are available depends on the libiconv implementation and the system itself. The API currently makes no guarantees as to which can be used. ideally, it should not only do that for all supported encodings (with unit tests), but also enable specific errors when an encoding isn't supported by the implementation or is completely unknown. That's something to be added in a future patch.Performance improvements and benchmarks
The benchmarks are an afterthought and clearly there's a lot of work needed in that domain still. For now, I removed a bunch of unneeded allocations and cached the largest time-wasters (I/O buffer allocations) via upvalues, to make sure all of the bindings are roughly equivalent. I'm keeping an eye on this part but it isn't a primary focus right now.
New
ASSUMEmacro for non-NDEBUGbuildsThere's not currently any way to easily turn this on (requires Ninja build updates, separate issue). I haven't settled on a way to verify assumptions at runtime because all of the available options seemed a bit meh at best. This approach is an experiment which seemed to work well in my testing. It's just C
assertwith some sugar - we'll see how that goes.The actual libiconv interface is of course unchanged; it's still not very user-friendly but I can't help that.