Rework the interface for the iconv.convert FFI bindings by rdw-software · Pull Request #662 · evo-lua/evo-runtime

rdw-software · 2025-02-16T15:24:25Z

The original version contained a lot of hacks and inefficiencies. I took another stab at the interface and decided to make some changes to how the C++ wrapper works, which may or may not make for a better overall experience (TBD):

Aggregating `iconv_convert` parameters

This is one of two breaking changes to the API, although the functionality is mostly the same:

Instead of taking six primitive arguments, the iconv.convert binding now takes a single struct pointer
These represent the status of an ongoing charset conversion request, which is obscured by libiconv's API
For the typical use case, there's no drawback - except for a ffi.new that can be cached (see benchmarks)
Advantages are a shorter signature, plus the ability to implement APis on top (for async/streaming, later)

Error handling updates for `iconv_convert`

Previously the error handling was mostly "pray it works or debug iconv error codes". Now it's a little better:

The iconv_convert interface always returns a iconv_result_t (enum value) with corresponding error strings
This is a lot more granular and matches other APIs with saner behavior, like libuv/luv (see iconv.strerror)
Unfortunately not all types of errors can be differentiated, because libiconv only uses three (?) error codes
There's also some cases where libiconv will do nothing or reset the handle, which the runtime can't detect

`iconv_try_close` moved to the C++ layer

This is just because detecting the iconv error code via typecasts is annoying, and doing it twice even more so.

`CHARSET_CONVERSION_FAILED` shared constant removed

This was a hack to more easily detect conversion failures, because libiconv's error handling demands it. Constants shouldn't be stored in the static exports tables, because the runtime checks they're populated with function pointers (that aren't nullptr) on startup. This implicitly assumes only pointers are stored in the structure, and all members are uniform. In this case it wasn't a problem as the constant was the very last member. However, it seems like a footgun best avoided.

It's also completely redundant if the constants are exported as cdefs and can be used from the FFI directly.

Utility methods for error handling and validation

The bindings don't export all of them yet (some use C++ types in their signature/are inlined/NOOPs if optimized):

iconv_strerror works exactly like uv_strerror; it could become the default way of handling errors everywhere
It allows checking enum values with ffi.C.ICONV_RESULT_OK and translating to Lua failure tuples easily
Custom errors are independent of libiconv returns - they cover all errnor codes and then some
iconv_check_result and iconv_check_errno are basically more readable versions of the previous unit test assertions

NYI: Character encoding types (`iconv_encoding_t` placeholder)

The API still uses string (const char*) identifiers for the libiconv encodings. Which ones are available depends on the libiconv implementation and the system itself. The API currently makes no guarantees as to which can be used. ideally, it should not only do that for all supported encodings (with unit tests), but also enable specific errors when an encoding isn't supported by the implementation or is completely unknown. That's something to be added in a future patch.

Performance improvements and benchmarks

The benchmarks are an afterthought and clearly there's a lot of work needed in that domain still. For now, I removed a bunch of unneeded allocations and cached the largest time-wasters (I/O buffer allocations) via upvalues, to make sure all of the bindings are roughly equivalent. I'm keeping an eye on this part but it isn't a primary focus right now.

New `ASSUME` macro for non- `NDEBUG` builds

There's not currently any way to easily turn this on (requires Ninja build updates, separate issue). I haven't settled on a way to verify assumptions at runtime because all of the available options seemed a bit meh at best. This approach is an experiment which seemed to work well in my testing. It's just C assert with some sugar - we'll see how that goes.

The actual libiconv interface is of course unchanged; it's still not very user-friendly but I can't help that.

The JIT may reduce benchmarks to basically zero in some scenarios, which is of course misleading. Seeing both versions side-by-side should help in those cases.

C++ has a bunch of tools for this kind of thing, but none seemed suitable at first glance. I'll revisit it later if needed.

The original version contained a lot of hacks and inefficiencies. I took another stab at the interface and decided to make some changes to how the C++ wrapper works, which may or may not make for a better overall experience (TBD): --- Aggregating `iconv_convert` parameters This is one of two breaking changes to the API, although the functionality is mostly the same: * Instead of taking six primitive arguments, the `iconv.convert` binding now takes a single struct pointer * These represent the status of an ongoing charset conversion request, which is obscured by libiconv's API * For the typical use case, there's no drawback - except for a `ffi.new` that can be cached (see benchmarks) * Advantages are a shorter signature, plus the ability to implement APis on top (for async/streaming, later) --- Error handling updates for `iconv_convert` Previously the error handling was mostly "pray it works or debug iconv error codes". Now it's a little better: * The `iconv_convert` interface always returns a `iconv_result_t` (enum value) with corresponding error strings * This is a lot more granular and matches other APIs with saner behavior, like libuv/luv (see `iconv.strerror`) * Unfortunately not all types of errors can be differentiated, because libiconv only uses three (?) error codes * There's also some cases where libiconv will do nothing or reset the handle, which the runtime can't detect --- `iconv_try_close` moved to the C++ layer This is just because detecting the iconv error code via typecasts is annoying, and doing it twice even more so. --- `CHARSET_CONVERSION_FAILED` shared constant removed This was a hack to more easily detect conversion failures, because libiconv's error handling demands it. Constants shouldn't be stored in the static exports tables, because the runtime checks they're populated with function pointers (that aren't `nullptr`) on startup. This implicitly assumes only pointers are stored in the structure, and all members are uniform. In this case it wasn't a problem as the constant was the very last member. However, it seems like a footgun best avoided. It's also completely redundant if the constants are exported as `cdefs` and can be used from the FFI directly. --- Utility methods for error handling and validation The bindings don't export all of them yet (some use C++ types in their signature/are inlined/NOOPs if optimized): * `iconv_strerror` works exactly like `uv_strerror`; it could become the default way of handling errors everywhere * It allows checking enum values with `ffi.C.ICONV_RESULT_OK` and translating to Lua failure tuples easily * Custom errors are independent of libiconv returns - they cover all `errnor` codes and then some * `iconv_check_result` and `iconv_check_errno` are basically more readable versions of the previous unit test assertions --- NYI: Character encoding types (`iconv_encoding_t` placeholder) The API still uses string (`const char*`) identifiers for the libiconv encodings. Which ones are available depends on the libiconv implementation and the system itself. The API currently makes no guarantees as to which can be used. ideally, it should not only do that for all supported encodings (with unit tests), but also enable specific errors when an encoding isn't supported by the implementation or is completely unknown. That's something to be added in a future patch. --- Performance improvements and benchmarks The benchmarks are an afterthought and clearly there's a lot of work needed in that domain still. For now, I removed a bunch of unneeded allocations and cached the largest time-wasters (I/O buffer allocations) via upvalues, to make sure all of the bindings are roughly equivalent. I'm keeping an eye on this part but it isn't a primary focus right now. --- New `ASSUME` macro for non- `NDEBUG` builds There's not currently any way to easily turn this on (requires Ninja build updates, separate issue). I haven't settled on a way to verify assumptions at runtime because all of the available options seemed a bit meh at best. This approach is an experiment which seemed to work well in my testing. It's just C `assert` with some sugar - we'll see how that goes. --- The actual libiconv interface is of course unchanged; it's still not very user-friendly but I can't help that.

I wouldn't expect any name clashes here, but whatever.

I guess it's not currently exported, so using const is fine.

Surely no one would ever try that... right?

rdw-software added 2 commits February 16, 2025 15:52

Perf: Contrast iconv benchmark results with jit.off()

cc43f10

The JIT may reduce benchmarks to basically zero in some scenarios, which is of course misleading. Seeing both versions side-by-side should help in those cases.

Runtime: Create a new macro for runtime asserts

a94d5ef

C++ has a bunch of tools for this kind of thing, but none seemed suitable at first glance. I'll revisit it later if needed.

rdw-software force-pushed the iconv-interface-rework branch 2 times, most recently from a6cf395 to b1de00e Compare February 16, 2025 15:45

rdw-software added 5 commits February 16, 2025 17:40

Refactor: Move iconv APIs into the FFI namespace

3f294ee

I wouldn't expect any name clashes here, but whatever.

QA: Fix a compiler warning when NDEBUG is off

5a7d89a

QA: Fix a cppcheck warning for sanity_check_buffer

9265ca3

I guess it's not currently exported, so using const is fine.

Tests: Add unit tests for iconv.strerror misuse

891e9e2

Surely no one would ever try that... right?

rdw-software force-pushed the iconv-interface-rework branch from b1de00e to 891e9e2 Compare February 16, 2025 16:40

rdw-software merged commit d790270 into main Feb 16, 2025

rdw-software deleted the iconv-interface-rework branch February 16, 2025 17:04

rdw-software mentioned this pull request Feb 17, 2025

Replace CURLVERSION_NOW with a function #664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework the interface for the iconv.convert FFI bindings#662

Rework the interface for the iconv.convert FFI bindings#662
rdw-software merged 7 commits intomainfrom
iconv-interface-rework

rdw-software commented Feb 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdw-software commented Feb 16, 2025

Aggregating iconv_convert parameters

Error handling updates for iconv_convert

iconv_try_close moved to the C++ layer

CHARSET_CONVERSION_FAILED shared constant removed

Utility methods for error handling and validation

NYI: Character encoding types (iconv_encoding_t placeholder)

Performance improvements and benchmarks

New ASSUME macro for non- NDEBUG builds

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Aggregating `iconv_convert` parameters

Error handling updates for `iconv_convert`

`iconv_try_close` moved to the C++ layer

`CHARSET_CONVERSION_FAILED` shared constant removed

NYI: Character encoding types (`iconv_encoding_t` placeholder)

New `ASSUME` macro for non- `NDEBUG` builds