Fix encoding issues with encodingKludge #294

Diaoul · 2011-08-21T12:13:15Z

Fixed all encoding issues I could encounter on both my Synology NAS and Ubuntu 10.04

Main changes

Add encodingKludge from Sick-Beard to deal with encoding issues
Removed irrelevant hardcoded 'UTF-8' strings and replaced with ek-ed ones
The logger is now unicode-aware

Tested with album 4 of Beyoncé

Scan Music Library
Force Post-Process Albums in Download Folder (all post-process options activated)

Any feedback is appreciated

abuisman · 2011-08-21T15:45:20Z

I was having these encoding issues as well, good work :).

pabloalcantara · 2011-08-21T18:28:55Z

Hope this fix every problem...
@Diaoul could you see if my encode.py needs some changes?

rembo10 · 2011-08-21T19:38:31Z

Hey Diaoul - everything looks good - this is the only thing i'm thinking about:

temp_f = headphones.DESTINATION_DIR
for f in folder_list:
    temp_f = ek.ek(os.path.join, temp_f, f)
    ek.ek(os.chmod, temp_f, int(headphones.FOLDER_PERMISSIONS, 8))

Not that familiar with ek, so I'm not sure if this handles it but i was having trouble with this before and i made some changes along the way to where i encoded it. headphones.DESTINATION_DIR is unicode, folder_list is unicode. When you ek.ek temp_f, it becomes bytestring, so when it adds the next folder, temp_f is now bytestring and f is unicode. Does ek handle these mixed encoding arguments alright? If not I think we can take that line out and just ek the last line, and maybe os.path.normpath(temp_f). Does that make sense? Let me know what you think

rembo10 · 2011-08-21T20:21:56Z

I don't know what it is with ek but it just doesn't want to work on OSX. I start getting mad unicodedecodeerrors with it

Diaoul · 2011-08-21T21:09:25Z

@rembo10 AFAIK it works on Sick-Beard so this should work on OSX as Sick-Beard runs well on OSX. Maybe ask @midgetspy how he made it with OSX.

About the chmod thing, I tried several things to make it simplier but without success...

rembo10 · 2011-08-21T21:45:36Z

I'm probably doing this wrong but I think there are some things I want to keep in bytestring. For example, downloaded track paths - those don't really need to be returned as unicode, since filenames in *nix can have multiple encodings in the same filename. if i unicode it with one encoding, an error will probably be thrown. keeping it bytestring means i'm not trying to manipulate it - and since i'm not really storing those values, but using them to embed album art, rename, etc., doesn't it make sense to keep it as a bytestring?

On Aug 21, 2011, at 2:09 PM, Diaoul wrote:

@rembo10 AFAIK it works on Sick-Beard so this should work on OSX as Sick-Beard runs well on OSX. Maybe ask @midgetspy how he made it with OSX.

Reply to this email directly or view it on GitHub:
#294 (comment)

rembo10 · 2011-08-22T01:17:15Z

Maybe it was how I was adapting it to the changes I made? Can you update your version with my master, since I made a few changes that I had to resolve in the merge which may have messed some things up. If it works I'll definitely merge it in because it seems a lot cleaner than all my encoding and decoding all over the place.

On Aug 21, 2011, at 2:09 PM, Diaoul wrote:

@rembo10 AFAIK it works on Sick-Beard so this should work on OSX as Sick-Beard runs well on OSX. Maybe ask @midgetspy how he made it with OSX.

Reply to this email directly or view it on GitHub:
#294 (comment)

midgetspy · 2011-08-22T03:19:13Z

The safe way to deal with encoding is:

decode every bytestring "entering" the application (from disk, database, the internet, etc)
use exclusively unicode string objects inside your application
explicitly encode all strings to bytestrings any time they "exit" your application (file names/content, database entries, etc)

The last one is the most important (but it relies on the first two being done properly) because if you give some functions a unicode string they will encode it with default encoding which may be something ASCII-like which will cause it to choke on any special characters that may be present in the output strings.

Theoretically you should always know what encoding a bytestring you receive is using but unfortunately that is not always the case. Specifically, many NAS boxes have embedded linux with an ASCII-like encoding specified as the system encoding but then their filesystems have UTF-8 (for example) filenames that were written over the network by a different OS.

encodingKludge was written because some functions (like os.*) in python seem to return unicode objects on some OSes and bytestrings on others. I have never reproduced this but that is what I guessed the problem was based on the tracebacks I was seeing. If you're starting from scratch I would encourage you to deal with your strings in a better way than I did (preferably something that doesn't require "kludge" in the name ;-P).

rembo10 · 2011-08-22T05:34:56Z

haha kludge is a great word :-)

i'm keeping all the filenames in the post processor as bytestrings. as soon as i try to unicode them, i face the problem that i may lose the actual filepath on linux/nas systems, since filenames are stored as bytestrings anyways. the values that i pass in (from the database, etc) are all unicode, so i encode those values using their sys encoding when renaming, etc.

one thing i noticed was that config variables (i.e. headphones.CONFIG_OPTION) were bytestrings on startup, but unicode when config options were saved from the UI. this might account for some of the randomness with your os functions. i added encoding to ConfigObj to fix this:

headphones.CFG = ConfigObj(headphones.CONFIG_FILE, encoding='utf-8')

now I know all my headphones.CONFIG_OPTIONS will be utf-8 and can encode them when appropriate.

rembo10 · 2011-08-22T06:02:06Z

ps. hope you don't mind that i basically stole the whole startup script from sickbeard :-)

Diaoul · 2011-08-22T09:43:05Z

I am not sure there is "a better way" to deal with unicode than a kludge given the randomness of some os.* function results...
I'll try to build up a test stript to understand python behaviour on my NAS and my computer aswell, maybe a unittest file or something

Diaoul · 2011-08-22T10:22:55Z

@pabloalcantara: I can't really look at it as I don't use this function and can only correct bugs as I see them. You can try on your own to edit the code using ek until all errors are gone.

rembo10 · 2011-08-22T10:23:37Z

From what i sort of pieced together the os functions arent random, as long as you know what type of strings youre passing in (unicode or byte)

On Aug 22, 2011, at 2:43 AM, Diaoulreply@reply.github.com wrote:

I am not sure there is "a better way" to deal with unicode than a kludge given the randomness of some os.* function results...
I'll try to build up a test stript to understand python behaviour on my NAS and my computer aswell, maybe a unittest file or something

Reply to this email directly or view it on GitHub:
#294 (comment)

Diaoul · 2011-08-29T09:50:09Z

I've found the problem and it's not yours. You should just work with unicode everywhere, including os.*

Fix encoding issues with encodingKludge

430a4c9

Diaoul closed this Aug 29, 2011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix encoding issues with encodingKludge #294

Fix encoding issues with encodingKludge #294

Diaoul commented Aug 21, 2011

abuisman commented Aug 21, 2011

pabloalcantara commented Aug 21, 2011

rembo10 commented Aug 21, 2011

rembo10 commented Aug 21, 2011

Diaoul commented Aug 21, 2011

rembo10 commented Aug 21, 2011

rembo10 commented Aug 22, 2011

midgetspy commented Aug 22, 2011

rembo10 commented Aug 22, 2011

rembo10 commented Aug 22, 2011

Diaoul commented Aug 22, 2011

Diaoul commented Aug 22, 2011

rembo10 commented Aug 22, 2011

Diaoul commented Aug 29, 2011

Fix encoding issues with encodingKludge #294

Fix encoding issues with encodingKludge #294

Conversation

Diaoul commented Aug 21, 2011

Main changes

Tested with album 4 of Beyoncé

abuisman commented Aug 21, 2011

pabloalcantara commented Aug 21, 2011

rembo10 commented Aug 21, 2011

rembo10 commented Aug 21, 2011

Diaoul commented Aug 21, 2011

rembo10 commented Aug 21, 2011

rembo10 commented Aug 22, 2011

midgetspy commented Aug 22, 2011

rembo10 commented Aug 22, 2011

rembo10 commented Aug 22, 2011

Diaoul commented Aug 22, 2011

Diaoul commented Aug 22, 2011

rembo10 commented Aug 22, 2011

Diaoul commented Aug 29, 2011