Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Handling Problems with Ruby 1.9.1 #2

Closed
lennart opened this issue Oct 10, 2009 · 6 comments
Closed

Unicode Handling Problems with Ruby 1.9.1 #2

lennart opened this issue Oct 10, 2009 · 6 comments

Comments

@lennart
Copy link

lennart commented Oct 10, 2009

Within the C-Extension there's a problem with ID3_Field_get_unicode. The const char-Pointer returns an array of UTF-8 Bytes. With Ruby 1.9.1 this produces a String with the literal UTF-8 Bytes encoded as 8BIT-ASCII (BINARY). Direct Conversions of the encoding within Ruby doesn't work. Additionally there are also \x00-Bytes between each char. I don't know where they come from.

Solutions may be:

  • Creating UTF-8 Encoded Strings directly from the char-Pointer. I had problems with missing symbols when using rb_enc_str_new(str, size, rb_ut8_encoding()). Compilation works but the method using this function raises Runtime Errors due to the missing symbol for rb_utf8_encoding.
  • Association UTF-8 Encoding to a created string. This raised merely the same errors, concerning missing symbols for rb_enc_find_index.

Solution for now:
I added a quick fix as mentioned in #1
This fix just hacks the UTF-8 by reading the wrongly encoded string for a field from the C-Extension and converts the String to a Byte-Array (.bytes). It then loops through each and filters out the \x00-Bytes and produces UTF-8-encoded Chars which are then joined again in the end.
Yeah I know it's a hack, but works for now.

I would be grateful for any suggestions on how to solve this adequately.

@robinst
Copy link
Owner

robinst commented Oct 18, 2009

id3lib actually returns UTF-16 there, not UTF-8. That's where the \x00 bytes come from.

The only problem is that a binary string is returned instead of a real string (marked as being "UTF-16" encoded).

@lennart
Copy link
Author

lennart commented Oct 18, 2009

um, that explains why I found \x00ff\x00fe in front of some strings (GetRawUnicode returns the Byte Order Marks as well) in this case those are UTF-16 LittleEndian. Does Ruby have any support for UTF-16 so one could convert them to UTF-8 strings?

@lennart
Copy link
Author

lennart commented Oct 19, 2009

Oh I was wrong, id3lib uses UTF-16 Big Endian internally. When I force_encoding("UTF-16BE") and then encode("UTF-8") everything seems fine.

Here's a gist for the updated patch

@lennart
Copy link
Author

lennart commented Oct 19, 2009

um sorry, still wrong. Some string are not Big Endian while others are. So one needs to determine the Endianess before encoding... forget the gist, I'll look into it some more

@robinst
Copy link
Owner

robinst commented Oct 19, 2009

Have a look at what get_encoding returns. (Sorry to be of little help at the moment.)

@robinst
Copy link
Owner

robinst commented Jan 16, 2012

Hey there. I'm closing this issue as id3lib-ruby is no longer developed (because id3lib isn't).

Please migrate to my new project: https://github.com/robinst/taglib-ruby

It has full support for ID3v2.4 and Unicode. I'm glad to help with any issues that pop up.

@robinst robinst closed this as completed Jan 16, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants