Skip to content

Fix utf16_string_value_ptr() for TruffleRuby#334

Merged
flavorjones merged 1 commit intosparklemotion:masterfrom
eregon:fix-utf16_string_value_ptr-truffleruby
Aug 12, 2022
Merged

Fix utf16_string_value_ptr() for TruffleRuby#334
flavorjones merged 1 commit intosparklemotion:masterfrom
eregon:fix-utf16_string_value_ptr-truffleruby

Conversation

@eregon
Copy link
Copy Markdown
Contributor

@eregon eregon commented Aug 11, 2022

  • TruffleRuby uses TruffleString which requires all UTF-16 strings to
    always have an even bytesize.
  • The code here ensures the UTF-16 string always ends up with 2 \x00
    bytes, so append them explicitly to have an even bytesize instead of
    relying on RSTRING_PTR() adding the second \x00 byte.
  • Fixes Very recent UTF-16 issue? Not.sure. truffleruby/truffleruby#2704

This fixes the CI on TruffleRuby.

@eregon
Copy link
Copy Markdown
Contributor Author

eregon commented Aug 11, 2022

From truffleruby/truffleruby#2704 (comment)
I wonder why sqlite3_open16 is used, instead of just always using sqlite3_open_v2 and transcoding the filename from UTF-16 to UTF-8 (which should works fine as long as the string is not broken).
It feels like we could remove the usage of sqlite3_open16 fairly easily and a bunch of code with it.

Maybe it was added in 1.8 days where there was no good encoding/transcoding support in Ruby?

@eregon
Copy link
Copy Markdown
Contributor Author

eregon commented Aug 11, 2022

Ah, the side effect of sqlite3_open16 is to set the database encoding to UTF-16 instead of UTF-8 (https://www.sqlite.org/c3ref/open.html).
And I'm not sure there is a way to set "the default encoding for databases" after sqlite3_open_v2.
So this PR makes most sense and we can't remove sqlite3_open16 too easily if users want a UTF-16 "default encoding for databases".

@eregon
Copy link
Copy Markdown
Contributor Author

eregon commented Aug 11, 2022

FWIW CRuby master seems to have TERM_LEN logic to ensure there are rb_enc_mbminlen() \x00 bytes at the end, but that's currently not implemented on TruffleRuby, and I'm not sure since which version of CRuby that's done.

@eregon
Copy link
Copy Markdown
Contributor Author

eregon commented Aug 11, 2022

but that's currently not implemented on TruffleRuby

That seems easy to fix, so I have an incoming fix for that in TruffleRuby.
So if we're happy to rely on Ruby having enough \x00 bytes, then we can replace utf16_string_value_ptr() with RSTRING_PTR().
But CRuby 1.9.2 does not seem to have the TERM_LEN logic, e.g. https://github.com/ruby/ruby/blob/ruby_1_9_2/string.c#L410
TERM_LEN was added in CRuby 2.1: ruby/ruby@b271a8c

@flavorjones
Copy link
Copy Markdown
Member

@eregon Can you please add a test for this behavior?

@flavorjones
Copy link
Copy Markdown
Member

flavorjones commented Aug 12, 2022

Ah, I see, it fails when truffleruby uses packaged libraries, but not when it uses system libraries.

I see the commment at truffleruby/truffleruby#2704 (comment) explaining the source of the problem.

No worries, I'll work through it.

* TruffleRuby uses TruffleString which requires all UTF-16 strings to
  always have an even bytesize.
* The code here ensures the UTF-16 string always ends up with 2 \x00
  bytes, so append them explicitly to have an even bytesize instead of
  relying on RSTRING_PTR() adding the second \x00 byte.
* Fixes truffleruby/truffleruby#2704
@flavorjones flavorjones force-pushed the fix-utf16_string_value_ptr-truffleruby branch from 2f67cd0 to 5aa746f Compare August 12, 2022 21:04
@flavorjones flavorjones merged commit fce39d3 into sparklemotion:master Aug 12, 2022
@flavorjones
Copy link
Copy Markdown
Member

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Very recent UTF-16 issue? Not.sure.

2 participants