Unicode Handling Problems with Ruby 1.9.1 #2

lennart · 2009-10-10T16:53:00Z

Within the C-Extension there's a problem with ID3_Field_get_unicode. The const char-Pointer returns an array of UTF-8 Bytes. With Ruby 1.9.1 this produces a String with the literal UTF-8 Bytes encoded as 8BIT-ASCII (BINARY). Direct Conversions of the encoding within Ruby doesn't work. Additionally there are also \x00-Bytes between each char. I don't know where they come from.

Solutions may be:

Creating UTF-8 Encoded Strings directly from the char-Pointer. I had problems with missing symbols when using rb_enc_str_new(str, size, rb_ut8_encoding()). Compilation works but the method using this function raises Runtime Errors due to the missing symbol for rb_utf8_encoding.
Association UTF-8 Encoding to a created string. This raised merely the same errors, concerning missing symbols for rb_enc_find_index.

Solution for now:
I added a quick fix as mentioned in #1
This fix just hacks the UTF-8 by reading the wrongly encoded string for a field from the C-Extension and converts the String to a Byte-Array (.bytes). It then loops through each and filters out the \x00-Bytes and produces UTF-8-encoded Chars which are then joined again in the end.
Yeah I know it's a hack, but works for now.

I would be grateful for any suggestions on how to solve this adequately.

The text was updated successfully, but these errors were encountered:

robinst · 2009-10-18T18:32:44Z

id3lib actually returns UTF-16 there, not UTF-8. That's where the \x00 bytes come from.

The only problem is that a binary string is returned instead of a real string (marked as being "UTF-16" encoded).

lennart · 2009-10-18T21:44:00Z

um, that explains why I found \x00ff\x00fe in front of some strings (GetRawUnicode returns the Byte Order Marks as well) in this case those are UTF-16 LittleEndian. Does Ruby have any support for UTF-16 so one could convert them to UTF-8 strings?

lennart · 2009-10-19T12:32:58Z

Oh I was wrong, id3lib uses UTF-16 Big Endian internally. When I force_encoding("UTF-16BE") and then encode("UTF-8") everything seems fine.

Here's a gist for the updated patch

lennart · 2009-10-19T13:05:10Z

um sorry, still wrong. Some string are not Big Endian while others are. So one needs to determine the Endianess before encoding... forget the gist, I'll look into it some more

robinst · 2009-10-19T13:30:30Z

Have a look at what get_encoding returns. (Sorry to be of little help at the moment.)

robinst · 2012-01-16T12:59:23Z

Hey there. I'm closing this issue as id3lib-ruby is no longer developed (because id3lib isn't).

Please migrate to my new project: https://github.com/robinst/taglib-ruby

It has full support for ID3v2.4 and Unicode. I'm glad to help with any issues that pop up.

robinst closed this as completed Jan 16, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode Handling Problems with Ruby 1.9.1 #2

Unicode Handling Problems with Ruby 1.9.1 #2

lennart commented Oct 10, 2009

robinst commented Oct 18, 2009

lennart commented Oct 18, 2009

lennart commented Oct 19, 2009

lennart commented Oct 19, 2009

robinst commented Oct 19, 2009

robinst commented Jan 16, 2012

Unicode Handling Problems with Ruby 1.9.1 #2

Unicode Handling Problems with Ruby 1.9.1 #2

Comments

lennart commented Oct 10, 2009

robinst commented Oct 18, 2009

lennart commented Oct 18, 2009

lennart commented Oct 19, 2009

lennart commented Oct 19, 2009

robinst commented Oct 19, 2009

robinst commented Jan 16, 2012