New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix encoding issues with encodingKludge #294
Conversation
I was having these encoding issues as well, good work :). |
Hope this fix every problem... |
Hey Diaoul - everything looks good - this is the only thing i'm thinking about:
Not that familiar with ek, so I'm not sure if this handles it but i was having trouble with this before and i made some changes along the way to where i encoded it. headphones.DESTINATION_DIR is unicode, folder_list is unicode. When you ek.ek temp_f, it becomes bytestring, so when it adds the next folder, temp_f is now bytestring and f is unicode. Does ek handle these mixed encoding arguments alright? If not I think we can take that line out and just ek the last line, and maybe os.path.normpath(temp_f). Does that make sense? Let me know what you think |
I don't know what it is with ek but it just doesn't want to work on OSX. I start getting mad unicodedecodeerrors with it |
@rembo10 AFAIK it works on Sick-Beard so this should work on OSX as Sick-Beard runs well on OSX. Maybe ask @midgetspy how he made it with OSX. About the chmod thing, I tried several things to make it simplier but without success... |
I'm probably doing this wrong but I think there are some things I want to keep in bytestring. For example, downloaded track paths - those don't really need to be returned as unicode, since filenames in *nix can have multiple encodings in the same filename. if i unicode it with one encoding, an error will probably be thrown. keeping it bytestring means i'm not trying to manipulate it - and since i'm not really storing those values, but using them to embed album art, rename, etc., doesn't it make sense to keep it as a bytestring? On Aug 21, 2011, at 2:09 PM, Diaoul wrote:
|
Maybe it was how I was adapting it to the changes I made? Can you update your version with my master, since I made a few changes that I had to resolve in the merge which may have messed some things up. If it works I'll definitely merge it in because it seems a lot cleaner than all my encoding and decoding all over the place. On Aug 21, 2011, at 2:09 PM, Diaoul wrote:
|
The safe way to deal with encoding is:
The last one is the most important (but it relies on the first two being done properly) because if you give some functions a unicode string they will encode it with default encoding which may be something ASCII-like which will cause it to choke on any special characters that may be present in the output strings. Theoretically you should always know what encoding a bytestring you receive is using but unfortunately that is not always the case. Specifically, many NAS boxes have embedded linux with an ASCII-like encoding specified as the system encoding but then their filesystems have UTF-8 (for example) filenames that were written over the network by a different OS. encodingKludge was written because some functions (like os.*) in python seem to return unicode objects on some OSes and bytestrings on others. I have never reproduced this but that is what I guessed the problem was based on the tracebacks I was seeing. If you're starting from scratch I would encourage you to deal with your strings in a better way than I did (preferably something that doesn't require "kludge" in the name ;-P). |
haha kludge is a great word :-) i'm keeping all the filenames in the post processor as bytestrings. as soon as i try to unicode them, i face the problem that i may lose the actual filepath on linux/nas systems, since filenames are stored as bytestrings anyways. the values that i pass in (from the database, etc) are all unicode, so i encode those values using their sys encoding when renaming, etc. one thing i noticed was that config variables (i.e. headphones.CONFIG_OPTION) were bytestrings on startup, but unicode when config options were saved from the UI. this might account for some of the randomness with your os functions. i added encoding to ConfigObj to fix this: headphones.CFG = ConfigObj(headphones.CONFIG_FILE, encoding='utf-8') now I know all my headphones.CONFIG_OPTIONS will be utf-8 and can encode them when appropriate. |
ps. hope you don't mind that i basically stole the whole startup script from sickbeard :-) |
I am not sure there is "a better way" to deal with unicode than a kludge given the randomness of some os.* function results... |
@pabloalcantara: I can't really look at it as I don't use this function and can only correct bugs as I see them. You can try on your own to edit the code using ek until all errors are gone. |
From what i sort of pieced together the os functions arent random, as long as you know what type of strings youre passing in (unicode or byte) On Aug 22, 2011, at 2:43 AM, Diaoulreply@reply.github.com wrote:
|
I've found the problem and it's not yours. You should just work with unicode everywhere, including os.* |
Fixed all encoding issues I could encounter on both my Synology NAS and Ubuntu 10.04
Main changes
Tested with album 4 of Beyoncé
Any feedback is appreciated