New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing of mixed-cased locale ID. #361

Merged
merged 5 commits into from Mar 3, 2016

Conversation

Projects
None yet
5 participants
@kdeldycke
Contributor

kdeldycke commented Mar 2, 2016

Following #351, it seems that mixed-cased locale IDs are not properly checked as existing depending on the platform.

The issue lies in babel.localedata.exists() which rely on filesystem assets. As a result, mixed-cased locale IDs are properly parsed on OSX (case-insensitive filesystem) but not on Linux.

The first commit demonstrate the issue depending on the system.
The second commit ensure that locale IDs recognized by Babel are uniques, even if lower-cased.
The third commit fix to the issue.

@kdeldycke

This comment has been minimized.

Contributor

kdeldycke commented Mar 2, 2016

Looks like this issue relates to #283.

kdeldycke added some commits Mar 2, 2016

@codecov-io

This comment has been minimized.

codecov-io commented Mar 2, 2016

Current coverage is 89.68%

Merging #361 into master will increase coverage by +0.02% as of bb16d60

@@            master    #361   diff @@
======================================
  Files           23      23       
  Stmts         3794    3801     +7
  Branches         0       0       
  Methods          0       0       
======================================
+ Hit           3402    3409     +7
  Partial          0       0       
  Missed         392     392       

Review entire Coverage Diff as of bb16d60

Powered by Codecov. Updated on successful CI builds.

@@ -24,15 +25,22 @@
_dirname = os.path.join(os.path.dirname(__file__), 'locale-data')
def normalize_locale(name):
"""Return a normalized locale ID or `None` if the ID is not recognized."""

This comment has been minimized.

@akx

akx Mar 3, 2016

Member

Could you explain what "normalization" means in the docstring?

This comment has been minimized.

@kdeldycke

kdeldycke Mar 3, 2016

Contributor

Sure. Done in kdeldycke@a5cb831 .

if name in _cache:
return True
return os.path.exists(os.path.join(_dirname, '%s.dat' % name))
return True if normalize_locale(name) else False

This comment has been minimized.

@akx

akx Mar 3, 2016

Member

return bool(normalize_locale(name)) :)

This comment has been minimized.

@akx

akx Mar 3, 2016

Member

To be honest, I'd like to see this normalization thing as a "slow path", if the previous os.path.exists path fails.

This comment has been minimized.

@kdeldycke
@akx

This comment has been minimized.

Member

akx commented Mar 3, 2016

Nice, thanks @kdeldycke!

(CI: I'm not waiting for Appveyor, since most builds completed and some failed due to ephemeral errors.)

akx added a commit that referenced this pull request Mar 3, 2016

Merge pull request #361 from kdeldycke/mixed_cased_locale_id
Parsing of mixed-cased locale ID.

@akx akx merged commit 85239b2 into python-babel:master Mar 3, 2016

3 of 6 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
review/gitmate/commit GitMate is currently running the analysis for this commit.
Details
review/gitmate/manual This commit needs review.
Details
codecov/patch 100.00% of diff hit (target 80.00%)
Details
codecov/project 89.68% (+0.02%) compared to ee0e0c1 at 89.66%
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@kdeldycke

This comment has been minimized.

Contributor

kdeldycke commented Mar 3, 2016

Thanks @akx for the quick merge! 😃

@kdeldycke kdeldycke deleted the kdeldycke:mixed_cased_locale_id branch Mar 3, 2016

if name == locale_id.lower():
return locale_id
def exists(name):

This comment has been minimized.

@jtwang

jtwang Mar 3, 2016

Contributor

Wow, greedy method.
I'm really looking forward to #305 !

(just an observation, please ignore :)

:param name: the locale identifier string
"""
if name in _cache:
return True
return os.path.exists(os.path.join(_dirname, '%s.dat' % name))
file_found = os.path.exists(os.path.join(_dirname, '%s.dat' % name))
return True if file_found else bool(normalize_locale(name))

This comment has been minimized.

@jtwang

jtwang Mar 3, 2016

Contributor

Would iteration over in-mem objects would be faster than disk access? If so, please consider performing the in-memory check first. :)

(super duper minor optimization)

This comment has been minimized.

@akx

akx Mar 3, 2016

Member

That'd complicate the whole deal unnecessarily, imho. It'd become something like

  • (1) name in cache?
  • (2) normalized name in cache?
  • (3) name in file system?
  • (4) normalized name in file system?

which would kinda interleave normalization with this "simple" existence checking...

all_ids = localedata.locale_identifiers()
assert len(all_ids) == len(set(all_ids))
# Check locale IDs don't collide after lower-case normalization.
lower_case_ids = list(map(methodcaller('lower'), all_ids))

This comment has been minimized.

@jtwang

jtwang Mar 3, 2016

Contributor

IMO

lower_case_ids = [id.lower() for id in all_ids] is more readable, but don't care much.

This comment has been minimized.

@akx

akx Mar 3, 2016

Member

Yeah, it would be, and more Pythonic, but that's kinda minor, and can be trivially corrected later if need be.

def test_mixedcased_locale():
for l in localedata.locale_identifiers():

This comment has been minimized.

@jtwang

jtwang Mar 3, 2016

Contributor

Is it necessary to iterate through all ids? Would it be sufficient to spot check some deliberately selected hardcoded values? Eg. 'Fi', 'FI_fi', 'fi_FI', 'fI_fI', 'ZH_hanT_HK'

This comment has been minimized.

@akx

akx Mar 3, 2016

Member

That's another optimization that could be done (via pytest.parametrize), but I don't think our test suite takes too long to run as it is :)

This comment has been minimized.

@jtwang

jtwang Mar 3, 2016

Contributor

True that. I just get the heebie jeebies when I see random.choice in unit tests, ha.

@jtwang

This comment has been minimized.

Contributor

jtwang commented Mar 3, 2016

Ah, crud, didn't notice this was already merged into master. Feel free to ignore all my comments, none of them are major. :)

@akx

This comment has been minimized.

Member

akx commented Mar 3, 2016

All comments were duly noted and processed, don't fret :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment