Parsing of mixed-cased locale ID. #361
Conversation
Looks like this issue relates to #283. |
Current coverage is
|
@@ -24,15 +25,22 @@ | |||
_dirname = os.path.join(os.path.dirname(__file__), 'locale-data') | |||
|
|||
|
|||
def normalize_locale(name): | |||
"""Return a normalized locale ID or `None` if the ID is not recognized.""" |
akx
Mar 3, 2016
Member
Could you explain what "normalization" means in the docstring?
Could you explain what "normalization" means in the docstring?
kdeldycke
Mar 3, 2016
Author
Contributor
if name in _cache: | ||
return True | ||
return os.path.exists(os.path.join(_dirname, '%s.dat' % name)) | ||
return True if normalize_locale(name) else False |
akx
Mar 3, 2016
Member
return bool(normalize_locale(name))
:)
return bool(normalize_locale(name))
:)
akx
Mar 3, 2016
Member
To be honest, I'd like to see this normalization thing as a "slow path", if the previous os.path.exists
path fails.
To be honest, I'd like to see this normalization thing as a "slow path", if the previous os.path.exists
path fails.
kdeldycke
Mar 3, 2016
Author
Contributor
Nice, thanks @kdeldycke! (CI: I'm not waiting for Appveyor, since most builds completed and some failed due to ephemeral errors.) |
85239b2
into
python-babel:master
Thanks @akx for the quick merge! |
if name == locale_id.lower(): | ||
return locale_id | ||
|
||
|
||
def exists(name): |
jtwang
Mar 3, 2016
Contributor
Wow, greedy method.
I'm really looking forward to #305 !
(just an observation, please ignore :)
Wow, greedy method.
I'm really looking forward to #305 !
(just an observation, please ignore :)
:param name: the locale identifier string | ||
""" | ||
if name in _cache: | ||
return True | ||
return os.path.exists(os.path.join(_dirname, '%s.dat' % name)) | ||
file_found = os.path.exists(os.path.join(_dirname, '%s.dat' % name)) | ||
return True if file_found else bool(normalize_locale(name)) |
jtwang
Mar 3, 2016
Contributor
Would iteration over in-mem objects would be faster than disk access? If so, please consider performing the in-memory check first. :)
(super duper minor optimization)
Would iteration over in-mem objects would be faster than disk access? If so, please consider performing the in-memory check first. :)
(super duper minor optimization)
akx
Mar 3, 2016
Member
That'd complicate the whole deal unnecessarily, imho. It'd become something like
- (1) name in cache?
- (2) normalized name in cache?
- (3) name in file system?
- (4) normalized name in file system?
which would kinda interleave normalization with this "simple" existence checking...
That'd complicate the whole deal unnecessarily, imho. It'd become something like
- (1) name in cache?
- (2) normalized name in cache?
- (3) name in file system?
- (4) normalized name in file system?
which would kinda interleave normalization with this "simple" existence checking...
all_ids = localedata.locale_identifiers() | ||
assert len(all_ids) == len(set(all_ids)) | ||
# Check locale IDs don't collide after lower-case normalization. | ||
lower_case_ids = list(map(methodcaller('lower'), all_ids)) |
jtwang
Mar 3, 2016
Contributor
IMO
lower_case_ids = [id.lower() for id in all_ids]
is more readable, but don't care much.
IMO
lower_case_ids = [id.lower() for id in all_ids]
is more readable, but don't care much.
akx
Mar 3, 2016
Member
Yeah, it would be, and more Pythonic, but that's kinda minor, and can be trivially corrected later if need be.
Yeah, it would be, and more Pythonic, but that's kinda minor, and can be trivially corrected later if need be.
|
||
|
||
def test_mixedcased_locale(): | ||
for l in localedata.locale_identifiers(): |
jtwang
Mar 3, 2016
Contributor
Is it necessary to iterate through all ids? Would it be sufficient to spot check some deliberately selected hardcoded values? Eg. 'Fi', 'FI_fi', 'fi_FI', 'fI_fI', 'ZH_hanT_HK'
Is it necessary to iterate through all ids? Would it be sufficient to spot check some deliberately selected hardcoded values? Eg. 'Fi', 'FI_fi', 'fi_FI', 'fI_fI', 'ZH_hanT_HK'
akx
Mar 3, 2016
Member
That's another optimization that could be done (via pytest.parametrize
), but I don't think our test suite takes too long to run as it is :)
That's another optimization that could be done (via pytest.parametrize
), but I don't think our test suite takes too long to run as it is :)
jtwang
Mar 3, 2016
Contributor
True that. I just get the heebie jeebies when I see random.choice
in unit tests, ha.
True that. I just get the heebie jeebies when I see random.choice
in unit tests, ha.
Ah, crud, didn't notice this was already merged into master. Feel free to ignore all my comments, none of them are major. :) |
All comments were duly noted and processed, don't fret :D |
Following #351, it seems that mixed-cased locale IDs are not properly checked as existing depending on the platform.
The issue lies in
babel.localedata.exists()
which rely on filesystem assets. As a result, mixed-cased locale IDs are properly parsed on OSX (case-insensitive filesystem) but not on Linux.The first commit demonstrate the issue depending on the system.
The second commit ensure that locale IDs recognized by Babel are uniques, even if lower-cased.
The third commit fix to the issue.