Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode support when *reading* tags #28

Closed
lazka opened this issue Jul 4, 2014 · 3 comments
Closed

Unicode support when *reading* tags #28

lazka opened this issue Jul 4, 2014 · 3 comments

Comments

@lazka
Copy link
Member

lazka commented Jul 4, 2014

Originally reported by: Christoph Reiter (Bitbucket: lazka, GitHub: lazka)


From exh...@gmail.com on September 26, 2009 17:01:08

This may or may not be a bug. I'd ask first on a mailing list or a forum,
but I can't find one :/

I have a file which has it's tags stored as UTF16 (encoding=1, at least
that's what mutagen tells me). I open the file as follows:

meta = File( localpath )
title = meta.get( 'TIT2' ).text[0]

This will return a python unicode object. Which (AFAIK) is stored
internally as UCS2 or UCS4 (depending on compile options). So if I want the
data in UTF8, all I should need to do is:

title.encode("utf8")

This works without throwing an exception, but the string I get is not at
all what is stored in the ID3 tag. Instead I get a result with plenty of
Chinese characters and whatnot.

Any ideas what that might be?

Original issue: http://code.google.com/p/mutagen/issues/detail?id=28


@lazka
Copy link
Member Author

lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From exh...@gmail.com on September 27, 2009 03:04:14

Here's what I get from mutagen-inspect (other tagging tools display it correctly):

-- /mp3s/Alice In Chains/Alice in Chains/01 - Grind.mp3
- MPEG 1 layer 3, 192000 bps, 44100 Hz, 288.23 seconds (audio/mp3)
TDRC=1995
TALB=�䄀氀椀挀攀 椀渀 䌀栀愀椀渀猀
COMM=='eng'=YEAR: 1995
COMM=ID3v1 Comment='XXX'=YEAR: 1995
TRCK=1
TPE1=Alice in Chains
TIT2=�䜀爀椀渀搀
TCON=�䜀爀甀渀最攀

@lazka
Copy link
Member Author

lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on September 27, 2009 21:36:40

Status: Invalid

@lazka
Copy link
Member Author

lazka commented Jul 4, 2014

Original comment by Christoph Reiter (Bitbucket: lazka, GitHub: lazka):


From joe.wreschnig@gmail.com on September 27, 2009 21:36:33

You need to change the encoding attribute on the frame, not just re-encode the text
to a different string format. I'm not really sure what you did. You should ask on the
QL development list. Probably the frame is not actually UCS-2, and is UTF-8 with an
invalid encoding marker in the first place (and then, probably because you tagged in
Mutagen and then decided to go off and use EasyTag or something after that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant