Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use default locale when lowercasing #820

Merged
merged 2 commits into from Jun 11, 2017
Merged

Don't use default locale when lowercasing #820

merged 2 commits into from Jun 11, 2017

Conversation

cketti
Copy link
Contributor

@cketti cketti commented Jan 31, 2017

String.toLowerCase() uses the default locale for the case conversion. This can lead to undesired results when using the Turkish locale (see the blog post linked in issue #256).

This pull request adds some tests to make sure jsoup's behavior is independent of the default locale. MultiLocaleRule runs every test annotated with @MultiLocaleTest twice. Once with Locale.ENGLISH as the default locale and once with Turkish as the default value.

This does not fix content matching issues with non-ASCII data, e.g. the one described in issue #474. However, it now consistently fails independent of the default locale.

Fixes #256

When the default locale is set to Turkish, "I".toLowerCase() returns "ı",
the dotless I. The method toLowerCase() is used throughout the code to
normalize values. But none of these should be locale-sensitive. That's why
right now all the added tests are failing.
@jhy jhy merged commit 7ba0ee7 into jhy:master Jun 11, 2017
@jhy
Copy link
Owner

jhy commented Jun 11, 2017

Thanks! Looks great.

@jhy
Copy link
Owner

jhy commented Jun 12, 2017

BTW @cketti I really appreciate the thoroughness of this implementation, particularly the testing, and the cleanliness of the static import. Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

node.attr & node.hasAttr lowercase without Locale
2 participants