Skip to content

Rework the interface for the iconv.convert FFI bindings#662

Merged
rdw-software merged 7 commits intomainfrom
iconv-interface-rework
Feb 16, 2025
Merged

Rework the interface for the iconv.convert FFI bindings#662
rdw-software merged 7 commits intomainfrom
iconv-interface-rework

Conversation

@rdw-software
Copy link
Copy Markdown
Member

The original version contained a lot of hacks and inefficiencies. I took another stab at the interface and decided to make some changes to how the C++ wrapper works, which may or may not make for a better overall experience (TBD):

Aggregating iconv_convert parameters

This is one of two breaking changes to the API, although the functionality is mostly the same:

  • Instead of taking six primitive arguments, the iconv.convert binding now takes a single struct pointer
  • These represent the status of an ongoing charset conversion request, which is obscured by libiconv's API
  • For the typical use case, there's no drawback - except for a ffi.new that can be cached (see benchmarks)
  • Advantages are a shorter signature, plus the ability to implement APis on top (for async/streaming, later)

Error handling updates for iconv_convert

Previously the error handling was mostly "pray it works or debug iconv error codes". Now it's a little better:

  • The iconv_convert interface always returns a iconv_result_t (enum value) with corresponding error strings
  • This is a lot more granular and matches other APIs with saner behavior, like libuv/luv (see iconv.strerror)
  • Unfortunately not all types of errors can be differentiated, because libiconv only uses three (?) error codes
  • There's also some cases where libiconv will do nothing or reset the handle, which the runtime can't detect

iconv_try_close moved to the C++ layer

This is just because detecting the iconv error code via typecasts is annoying, and doing it twice even more so.

CHARSET_CONVERSION_FAILED shared constant removed

This was a hack to more easily detect conversion failures, because libiconv's error handling demands it. Constants shouldn't be stored in the static exports tables, because the runtime checks they're populated with function pointers (that aren't nullptr) on startup. This implicitly assumes only pointers are stored in the structure, and all members are uniform. In this case it wasn't a problem as the constant was the very last member. However, it seems like a footgun best avoided.

It's also completely redundant if the constants are exported as cdefs and can be used from the FFI directly.

Utility methods for error handling and validation

The bindings don't export all of them yet (some use C++ types in their signature/are inlined/NOOPs if optimized):

  • iconv_strerror works exactly like uv_strerror; it could become the default way of handling errors everywhere
  • It allows checking enum values with ffi.C.ICONV_RESULT_OK and translating to Lua failure tuples easily
  • Custom errors are independent of libiconv returns - they cover all errnor codes and then some
  • iconv_check_result and iconv_check_errno are basically more readable versions of the previous unit test assertions

NYI: Character encoding types (iconv_encoding_t placeholder)

The API still uses string (const char*) identifiers for the libiconv encodings. Which ones are available depends on the libiconv implementation and the system itself. The API currently makes no guarantees as to which can be used. ideally, it should not only do that for all supported encodings (with unit tests), but also enable specific errors when an encoding isn't supported by the implementation or is completely unknown. That's something to be added in a future patch.

Performance improvements and benchmarks

The benchmarks are an afterthought and clearly there's a lot of work needed in that domain still. For now, I removed a bunch of unneeded allocations and cached the largest time-wasters (I/O buffer allocations) via upvalues, to make sure all of the bindings are roughly equivalent. I'm keeping an eye on this part but it isn't a primary focus right now.

New ASSUME macro for non- NDEBUG builds

There's not currently any way to easily turn this on (requires Ninja build updates, separate issue). I haven't settled on a way to verify assumptions at runtime because all of the available options seemed a bit meh at best. This approach is an experiment which seemed to work well in my testing. It's just C assert with some sugar - we'll see how that goes.


The actual libiconv interface is of course unchanged; it's still not very user-friendly but I can't help that.

The JIT may reduce benchmarks to basically zero in some scenarios, which is of course misleading. Seeing both versions side-by-side should help in those cases.
C++ has a bunch of tools for this kind of thing, but none seemed suitable at first glance. I'll revisit it later if needed.
@rdw-software rdw-software force-pushed the iconv-interface-rework branch 2 times, most recently from a6cf395 to b1de00e Compare February 16, 2025 15:45
The original version contained a lot of hacks and inefficiencies. I took another stab at the interface and decided to make some changes to how the C++ wrapper works, which may or may not make for a better overall experience (TBD):

---

Aggregating `iconv_convert` parameters

This is one of two breaking changes to the API, although the functionality is mostly the same:

* Instead of taking six primitive arguments, the `iconv.convert` binding now takes a single struct pointer
* These represent the status of an ongoing charset conversion request, which is obscured by libiconv's API
* For the typical use case, there's no drawback - except for a `ffi.new` that can be cached (see benchmarks)
* Advantages are a shorter signature, plus the ability to implement APis on top (for async/streaming, later)

---

Error handling updates for `iconv_convert`

Previously the error handling was mostly "pray it works or debug iconv error codes". Now it's a little better:

* The `iconv_convert` interface always returns a `iconv_result_t` (enum value) with corresponding error strings
* This is a lot more granular and matches other APIs with saner behavior, like libuv/luv (see `iconv.strerror`)
* Unfortunately not all types of errors can be differentiated, because libiconv only uses three (?) error codes
* There's also some cases where libiconv will do nothing or reset the handle, which the runtime can't detect

---

`iconv_try_close` moved to the C++ layer

This is just because detecting the iconv error code via typecasts is annoying, and doing it twice even more so.

---

`CHARSET_CONVERSION_FAILED` shared constant removed

This was a hack to more easily detect conversion failures, because libiconv's error handling demands it. Constants shouldn't be stored in the static exports tables, because the runtime checks they're populated with function pointers (that aren't `nullptr`) on startup. This implicitly assumes only pointers are stored in the structure, and all members are uniform. In this case it wasn't a problem as the constant was the very last member. However, it seems like a footgun best avoided.

It's also completely redundant if the constants are exported as `cdefs` and can be used from the FFI directly.

---

Utility methods for error handling and validation

The bindings don't export all of them yet (some use C++ types in their signature/are inlined/NOOPs if optimized):

* `iconv_strerror` works exactly like `uv_strerror`; it could become the default way of handling errors everywhere
* It allows checking enum values with `ffi.C.ICONV_RESULT_OK` and translating to Lua failure tuples easily
* Custom errors are independent of libiconv returns - they cover all `errnor` codes and then some
* `iconv_check_result` and `iconv_check_errno` are basically more readable versions of the previous unit test assertions

---

NYI: Character encoding types (`iconv_encoding_t` placeholder)

The API still uses string (`const char*`) identifiers for the libiconv encodings. Which ones are available depends on the libiconv implementation and the system itself. The API currently makes no guarantees as to which can be used. ideally, it should not only do that for all supported encodings (with unit tests), but also enable specific errors when an encoding isn't supported by the implementation or is completely unknown. That's something to be added in a future patch.

---

Performance improvements and benchmarks

The benchmarks are an afterthought and clearly there's a lot of work needed in that domain still. For now, I removed a bunch of unneeded allocations and cached the largest time-wasters (I/O buffer allocations) via upvalues, to make sure all of the bindings are roughly equivalent. I'm keeping an eye on this part but it isn't a primary focus right now.

---

New `ASSUME` macro for non- `NDEBUG` builds

There's not currently any way to easily turn this on (requires Ninja build updates, separate issue). I haven't settled on a way to verify assumptions at runtime because all of the available options seemed a bit meh at best. This approach is an experiment which seemed to work well in my testing. It's just C `assert` with some sugar - we'll see how that goes.

---

The actual libiconv interface is of course unchanged; it's still not very user-friendly but I can't help that.
I wouldn't expect any name clashes here, but whatever.
I guess it's not currently exported, so using const is fine.
Surely no one would ever try that... right?
@rdw-software rdw-software force-pushed the iconv-interface-rework branch from b1de00e to 891e9e2 Compare February 16, 2025 16:40
@rdw-software rdw-software merged commit d790270 into main Feb 16, 2025
@rdw-software rdw-software deleted the iconv-interface-rework branch February 16, 2025 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant