-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Files with UTF-16 TIT2 (and others) have invalid bytes before name #61
Comments
A UTF-16 sample file would be a great start. |
Sure. This file: http://datashat.net/music_for_programming_10-unity_gain_temple.mp3 (from http://musicforprogramming.net/) shows the problem. |
Those non-displayable characters are indeed the Byte Order Marker (BOM) from the UTF-16 text. The ID3 documentation specifies this regarding text encodings:
Your file is tagged with encoding 01 "UTF-16" which means the text could be either big-endian or little-endian, as determined by the BOM at the start of the string. Without the BOM it is unknown how to display (or convert) the text since it's not known what order the bytes come in. With encoding 02 "UTF-16BE" the order is known so the BOM is not needed. I did make a small change to remove the BOM from blank frame description fields (which are usually blank). The BOM will remain for non-empty description as well as the actual data. Normally you would pull the comment data you need from $info['comments']['title'] rather than $info['id3v2']['COMM'][0]['data'] and the data there is (by default) already converted to UTF-8 which intrinsically removes the BOM. If you do need to process your data directly in UTF-16 for whatever reason then you would need the BOM intact otherwise your string couldn't be handled. |
I have the latest version and I'm still seeing the same as above. I made a fresh checkout of the repo, and at the bottom of the page I see "Powered by getID3() v1.9.10-201511241457" which seems to be the latest version. (Thanks very much for looking into this by the way!) |
Well, I think I know why there are two things, seems like one is coming from the id3v1 tag (the shortened one) and one from the id3v2 tag (with the BOM). You probably already figure that :) But I'm still not sure why you're not seeing that behavior. Could there be something in my php settings? I'm on 5.6.4 64-bit. |
My best guess would be that your PHP installation doesn't have native iconv() support and it's relying on getid3_lib::iconv_fallback() and there may be an issue in there. Note that this is simply a guess at this point, I'll need to take a look at that tomorrow and see if I can find a problem. I'll let you know. |
Can you save the entire output of demo.browse for that file to a .html file and attach it here please? |
Sure, attached below (as .txt so github would let me). I'll have a look too and see if I can figure anything out with the iconv thing, thanks for the hint. |
If I disable the built-in iconv and use getID3's version it still works correctly. Perhaps there is an issue with your built-in version of iconv? First let's check if it's there, what version if available, and then try a very simple conversion using both PHP's iconv() function and getID3's version:
They should both just say "Hi" with no BOM, 2 chars long. I suspect one of them will be 4-chars with a BOM. |
Yep, looks like iconv is failing and the builtin one is leaving the BOM:
|
ahh, and iconv error is: "Notice: iconv(): Wrong charset, conversion from |
couple of other notes
|
ohh, and if i run php at the command line, it works. outputting:
So it must be something with my nginx install. Yar. I will keep hunting. |
Okay, turned out to be an issue with php-fpm which wasn't loading the iconv shared libraries properly. Thanks for the help pin-pointing it! |
I have some files that have utf-16 titles. When looking at them in demo.browse, the values get prefixed with invalid characters. These show up as ? chars in my browser, but looking at the returned data, they are not valid utf-16 either. For instance, for one file, the comments_html section contains:
This is for a number of different files, and other tools process the tags correctly.
Let me know if you need more info, or what else I can do to help track down what's wrong. I'm on version 1.9.10-20150914
The text was updated successfully, but these errors were encountered: