Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to specify additional encoding for options decoding #6119

Open
MrS0m30n3 opened this issue Jun 28, 2015 · 2 comments
Open

Way to specify additional encoding for options decoding #6119

MrS0m30n3 opened this issue Jun 28, 2015 · 2 comments

Comments

@MrS0m30n3
Copy link

Hi @dstftw

On your commit you replaced the utf-8 encoding with the preferredencoding() function. I just wanted to ask if it is possible to use an existing command line option like --encoding in order to specify an additional encoding.

For example if someone wants to call youtube-dl.exe on windows using the subprocess module (which does not support unicode on Python 2.x) he has to encode the input to the subprocess module. With your implementation the user has to use the locale.getpreferredencoding() to encode the data else the decoding on the side of youtube-dl will fail. But if the returned encoding from locale.getpreferredencoding() can't encode the input some of the characters get lost.

I am currently working on #5527 so it would be helpful if the user had the power to select the encoding both for the encoding and the decoding phase.

We can achieve this behaviour using something like this on options.py

def compat_conf(conf):
    if sys.version_info < (3,):
+      enc = conf[conf.index(str('--encoding')) + 1] if str('--encoding') in conf else preferredencoding()
+      return [a.decode(enc), 'replace') for a in conf]
    return conf
@dstftw
Copy link
Collaborator

dstftw commented Jul 4, 2015

The only problem with respecting --encoding is that you forget about configuration files. --encoding can be provided in user/system configuration and handling this case results in rather complicated and clumsy logic.
So, under python 2, we should look through command_line_conf, user_conf and system_conf, find the most prioritive --encoding and decode byte strings to unicode strings according to it. Fine, but before we should take into account --ignore-config that should be adapted for the case when conf's are yet lists of byte strings. Than, if we respect --encoding, for all python we should do _readOptions with this encoding and not with locale.getpreferredencoding() as open by default do. But, to know the encoding we have to read a configuration file already, so obviously we'll need to read it with some basic encoding, e.g. aforementioned preferredencoding, extract --encoding and reread it with extracted encoding. And more fun further.
Of course, we can revert to decoding with utf-8, but currently configurations are supposed to be in locale.getpreferredencoding() that is kind of inconsistent.

@MrS0m30n3
Copy link
Author

We could define a new command line option to set the encoding only for the command_line_conf decoding and decode user_conf and system_conf with the preferredencoding().

We basically search the sys.argv for the new option and if one is presented we extract the given encoding. Then we can use this encoding to further decode the command_line_conf. If no option is specified we can fallback to the preferredencoding() function.

Example:

+ def compat_conf(conf, encoding=preferredencoding()):                                   
    if sys.version_info < (3,):                                          
+      return [a.decode(encoding, 'replace') for a in conf]                 
    return conf                                                         

+ # Try to extract the encoding for the command_line_conf decode process                                                                             
+ enc = sys.argv[sys.argv.index(str('--new-option')) + 1] if str('--new-option') in sys.argv else None
+ command_line_conf = compat_conf(sys.argv[1:], enc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants