Skip to content

Ensure config loader tests include unicode#396

Merged
minrk merged 1 commit intoipython:masterfrom
vidartf:unicode_encoding
May 3, 2017
Merged

Ensure config loader tests include unicode#396
minrk merged 1 commit intoipython:masterfrom
vidartf:unicode_encoding

Conversation

@vidartf
Copy link
Contributor

@vidartf vidartf commented Apr 19, 2017

This PR adds unicode characters to the paths of the config loader tests. This reveals an exception on Windows:

self = <traitlets.config.loader.PyFileConfigLoader object at 0x0000019D8035AD68>

    def _read_file_as_dict(self):
        """Load the config file into self.config, with recursive loading."""
        def get_config():
            """Unnecessary now, but a deprecation warning is more trouble than it's worth."""
            return self.config

        namespace = dict(
            c=self.config,
            load_subconfig=self.load_subconfig,
            get_config=get_config,
            __file__=self.full_filename,
        )
        fs_encoding = sys.getfilesystemencoding() or 'ascii'
>       conf_filename = self.full_filename.encode(fs_encoding)
E       UnicodeEncodeError: 'mbcs' codec can't encode characters in position 0--1: invalid character

On Windows, sys.getfilesystemencoding() should not really be used to encode file paths.

On Windows NT+, file names are Unicode natively, so no conversion is performed. getfilesystemencoding() still returns 'mbcs', as this is the encoding that applications should use when they explicitly want to convert Unicode strings to byte strings that are equivalent when used as file names.

Not entirely sure how to best solve this in a way that is compatible with all Python/OSes, so no solution suggested yet, but for a pure Windows solution I would just remove the encoding step (conf_filename = self.full_filename passes tests on Win/Python 3.6).

This reveals an exception on Windows:

self = <traitlets.config.loader.PyFileConfigLoader object at
0x0000019D8035AD68>

def _read_file_as_dict(self):
"""Load the config file into self.config, with recursive loading."""
def get_config():
"""Unnecessary now, but a deprecation warning is more trouble than it's
worth."""
return self.config

namespace = dict(
c=self.config,
load_subconfig=self.load_subconfig,
get_config=get_config,
__file__=self.full_filename,
)
fs_encoding = sys.getfilesystemencoding() or 'ascii'
>       conf_filename = self.full_filename.encode(fs_encoding)
E       UnicodeEncodeError: 'mbcs' codec can't encode characters in
position 0--1: invalid character
@minrk minrk merged commit d939e2b into ipython:master May 3, 2017
@vidartf vidartf deleted the unicode_encoding branch May 3, 2017 14:42
@minrk minrk mentioned this pull request May 3, 2017
@Carreau Carreau added this to the 5.0 milestone Jun 4, 2020
@Carreau Carreau added 5.0-re-review Need to re-review for potential API impact changes. 5.0-no-incidence change that has noincidence on 5.0 compat (eg: doc) and removed 5.0-re-review Need to re-review for potential API impact changes. labels Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5.0-no-incidence change that has noincidence on 5.0 compat (eg: doc)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants