-
Notifications
You must be signed in to change notification settings - Fork 682
bug 1461350: Reimplement locale-prefixed URLs to mirror Django #4790
Conversation
Copy code from Django 1.8.19 which needs to be modified to work with Kuma language codes, legacy locale support, and 404s: * from django.conf.urls: - i18n_patterns * from django,core.urlresolvers: - LocaleRegexURLResolver (extended) * from django.middleware.locale: - LocaleMiddleware * from django.utils.translation.trans_real: - get_langugage - get_supported_language_variant - get_language_from_path - get_language_from_request No modifications yet, to make it easier to review changes in future Django versions.
Modify the Django 1.8.19 locale code to work with Kuma: * Return Kuma language codes, with overrides from settings.LOCALE_ALIASES, from get_supported_language_variant. * Don't check the session for a language override. * Don't add header Vary: Accept-Language * Handle alternate code paths unused in Kuma, by converting to asserts, or commenting and skipping code coverage.
Add middlewares for Kuma-specific locale features, provided in to LocaleURLMiddleware or elsewhere, to keep LocaleMiddleware close to Django's version. LangSelectorMiddleware handles locale redirects based on the ?lang query parameter. LocaleStandardizerMiddleware redirects to standardize local prefix cases, legacy locales, and overly specific locales, rather than return 404s.
kuma/wiki/tests/test_middleware.py
Outdated
|
||
With the Django 1.8-derived middleware, these are combined into a single | ||
302 temporary redirect, which may also help with retiring DocumentZones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops - this comment should have been removed. Rebase coming.
Replace LocaleURLMiddleware with LangSelectorMiddleware, LocaleStandardizerMiddleware, and Django's modified LocaleMiddleware. Simplify Kuma's reverse function, and drop code that supports the old locale framework. One large change is that urlpatterns are split into those with a language prefix (lang_urlpatterns), and those without (urlpatterns). The customized LocaleRegexURLResolver and i18n_patterns ensure that Kuma's language codes (such as en-US) are used instead of Django's language codes (such as en-us). Kuma's previous locale tools (reverse, Jinja's wiki_url) defaulted to no locale prefix. With the new framework, the default is the currently active locale, and it is retained in request.path_info. These changes require some adjustments, such as in the DocumentZone middleware and tests. Overly-specific locales in the Accept-Language header are processed differenty. Previously, this header: Accept-Language: fr-FR, de;q=0.5 would results in 'de' being selected as the first exact match. The code now uses Django's behaviour, where 'fr' is chosen as the generic locale match for the first parameter.
eb3654d
to
9f7ba92
Compare
kuma/core/i18n.py
Outdated
selecting a more generic variant. Raises LookupError if nothing found. | ||
|
||
If `strict` is False (the default), the function will look for an alternative | ||
country-specific variant when the currently checked is not found. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit. This portion on strict
seems out of place now that it has been removed from the function signature.
|
||
# Check for known override | ||
if lang_code in settings.LOCALE_ALIASES: | ||
return settings.LOCALE_ALIASES[lang_code] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ I love that the LOCALE_ALIASES
are incorporated here. They can be handled in a much cleaner, more straightforward fashion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a much better fit than the "fix after the fact" method from the last PR.
kuma/core/i18n.py
Outdated
out the main language. | ||
|
||
If check_path is True, the URL path prefix will be checked for a language | ||
code, otherwise this is skipped for backwards compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit. This check_path
portion seems out of place now that it has been removed from the function signature.
kuma/core/i18n.py
Outdated
if not language_code_re.search(accept_lang): # pragma: no cover | ||
# Check added with a security fix: | ||
# https://www.djangoproject.com/weblog/2007/oct/26/security-fix/ | ||
# It is unclear how to trigger this branch, so skipping coverage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice research!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It seems to me that Django's parse_accept_lang_header
will only return values that would satisfy the language_code_re.search
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to replace it with an assertion, and we'll let the internet tell us what string we haven't thought of.
kuma/core/middleware.py
Outdated
query = dict((smart_str(k), v) for | ||
"""Redirect if ?lang query parameter is valid.""" | ||
query_lang = request.GET.get('lang') | ||
if query_lang not in dict(settings.LANGUAGES): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. I was thinking that since this is run for almost every request, and that most requests will not have a lang
query parameter, we could short circuit the need for a dictionary lookup with:
if not (query_lang and (query_lang in dict(settings.LANGUAGES))):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a few cases where we'd like a quick check of the supported Kuma language code, so I'll go a little farther down this path and avoid creating a dict
every time.
kuma/core/i18n.py
Outdated
possible_lang_codes.append(generic_lang_code) | ||
raw_supported_lang_codes = get_languages() | ||
supported_lang_codes = [kuma_language_code_to_django(lang) | ||
for lang in raw_supported_lang_codes] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. I was thinking that when we create the supported_lang_codes
list, it's too bad we lose the OrderedDict
(from get_languages()
) and its efficiency (for the following code in supported_lang_codes
check). What do you think of either one of the following instead?
raw_supported_lang_codes = get_languages()
supported_lang_codes = OrderedDict((kuma_language_code_to_django(lang), None)
for lang in raw_supported_lang_codes)
or, skipping the call to get_languages
altogether:
supported_lang_codes = OrderedDict((kuma_language_code_to_django(lang), None)
for lang, _ in settings.LANGUAGES)
with both options assuming, of course, that we add a from collections import OrderedDict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why an OrderedDict
would be more efficient to iterate over than a list.
I'm not as sensitive to efficiency here, because of the lru_cache
, but it appears we could use similar code in a few places, so it makes sense to calculate it once and skip get_languages
(or call it get_kuma_languages
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant to say is that the OrderedDict
is more efficient for the later code in supported_lang_codes
check within the loop. Good point about the lru_cache
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. We want the set-lookup behavior for testing inclusion, and the ordered-list behavior when iterating through the list. OrderedDict
does both.
kuma/core/middleware.py
Outdated
fixed_locale = settings.LOCALE_ALIASES[literal_from_path.lower()] | ||
elif not match and literal_from_path.startswith(language_from_path): | ||
# Language code is a specific locale (fr vs fr-FR) | ||
fixed_locale = language_from_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice, I verified that kuma/core/tests/test_locale_middleware.py
covers all three of these cases.
lang_code = request.COOKIES.get(settings.LANGUAGE_COOKIE_NAME) | ||
|
||
try: | ||
return get_supported_language_variant(lang_code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This language code should always be mdn locale other than user modify the cookie!
I think we can only check if the locale is supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think it is a bad idea to avoid get_supported_language_variant
here.
lang_code
is None
if there is no request cookie, and get_supported_language_variant
will raise a LookupError
.
get_supported_language_variant
checks that the cookie-provided locale is supported, and other stuff. This is where we'll do any locale switching if we remove a previously supported language, or it will also raise a LookupError
and we'll fallback to the Accept-Language
header.
except LookupError: | ||
pass | ||
|
||
accept = request.META.get('HTTP_ACCEPT_LANGUAGE', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know if there are any test for the ordering of locale choice!
Like, first path locale, then cookie, then header!
maybe worth writing one if not exists!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a headless test, but probably not a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a few tests of each piece of this. I couldn't find an explicit test that the cookie beat the Accept-Language
header, so I updated test_locale_middleware_language_cookie
to include a header.
return lang_code | ||
|
||
lang_code = request.COOKIES.get(settings.LANGUAGE_COOKIE_NAME) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be inside a if block.
if lang_code:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear which line you are saying should be in an if block. I've tried to determine a line that should be in an if block, and none seem appropriate to me.
kuma/core/i18n.py
Outdated
# Django supports a case when LANGUAGE_CODE is not in LANGUAGES | ||
# (see https://github.com/django/django/pull/824). but our LANGUAGE_CODE is | ||
# always in LANGUAGES. | ||
assert settings.LANGUAGE_CODE == settings.LANGUAGES[0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont know if we should check it in every request. maybe add a test where its checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indirectly checked here:
I think it makes sense to add a test and remove the assertion here.
kuma/wiki/middleware.py
Outdated
# Skip slugs that don't have locales, and won't be in a zone | ||
request_slug = request.path_info.lstrip('/') | ||
if any(request_slug.startswith(slug) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. It seems more efficient to move lines 46-48 immediately after the definition of request_slug
, and then start the maybe_lang
and path
stuff.
kuma/wiki/middleware.py
Outdated
path = request.path_info | ||
request_slug = path.lstrip('/') | ||
maybe_lang = request_slug.split(u'/')[0] | ||
if maybe_lang in settings.ENABLED_LOCALES: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Since the code is run for almost every request, it'd be nice to make this more efficient by using a set
or frozenset
that we create once and use thereafter, so perhaps adding something like ENABLED_LOCALES_SET = frozenset(ENABLED_LOCALES)
within kuma/settings/common.py
and then using that here?
kuma/wiki/urls.py
Outdated
@@ -29,6 +29,9 @@ | |||
url(r'^\$files$', | |||
edit_attachment, | |||
name='attachments.edit_attachment'), | |||
url(r'^\$edit/(?P<revision_id>\d+)$', | |||
views.edit.edit, | |||
name='wiki.new_revision_based_on'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this wiki.new_revision_based_on
url made its way back into our lives, but it needs to go! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
language_from_path = get_language_from_path(request.path_info) | ||
if response.status_code == 404 and not language_from_path: | ||
language_path = '/%s%s' % (language, request.path_info) | ||
path_valid = is_valid_path(request, language_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails to redirect locale-free zone URL's (e.g., http://localhost:8000/Firefox
gets a 404 instead of redirecting to the active locale set from get_language_from_request
on the way in), which causes a number of the tests within tests/headless/test_cdn.py
to fail. It seems that is_valid_path
has to consider zones as well? 🤦♂️ I'm tempted to say we should take this opportunity to dump zones now, but that probably is a bridge too far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, all of the zone-related slugs (all that start with /Firefox
) with the test_locale_selection_cached
and test_locale_selection_not_cached
tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted my actual results in the main thread below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize the results and discussion below:
- Tests against /es/Firefox and /fr/Firefox failed because only the English page is in the sample database. This code returns a 404, where the previous code returned a 302. This is probably an improvement, but means those tests should be skipped or marked
XFAIL
in local development, until the sample database is updated. - The test for
?lang=de
succeeded with a 302, even though the page is not in the sample database. The quick 404 would be nice, but it isn't a regression, so I'm not considering a change for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jwhitlock. Sorry for my confusion around zones yesterday. I realized it later last night. I don't think we have to change anything for the ?lang=de
case, even in the future. I like the simplicity of the LangSelectorMiddleware
as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice. I really like this approach. The nits you can do with as you please, and the re-emergence of the wiki.new_revision_based_on
endpoint is an easy fix. However, just when I was thinking we were home free, I noticed that some of the headless CDN tests were failing when run against my local dev instance. I think all of the failures are due to the fact that is_valid_path
does not take zones into account (oh man, do I hate zones!).
Here is the result of the headless tests run against my local dev instance (ignore
|
@escattone see if http://localhost:8000/es/Firefox is in your locale dev environment, and if it isn't, |
@jwhitlock You're right on. All of the headless tests passed after I did the following: docker-compose exec web ./manage.py scrape_document https://developer.mozilla.org/es/Firefox
docker-compose exec web ./manage.py scrape_document https://developer.mozilla.org/fr/Firefox |
That's good news! So it seems all that's needed is to |
Also: the |
Update the docstrings for locale functions taken from Django to more accurately describe the behaviour and where it deviates from Django's behaviour.
This was removed in PR mdn#4769, and accidently re-added when copying from an earlier version of this pull request.
In the DocumentZoneMiddleware, skip non-locale paths (such as health checks) as early as possible, before the string manipulation for zoned URL manipulations.
Codecov Report
@@ Coverage Diff @@
## master #4790 +/- ##
==========================================
+ Coverage 95.81% 95.84% +0.02%
==========================================
Files 270 270
Lines 24489 24560 +71
Branches 1748 1754 +6
==========================================
+ Hits 23465 23539 +74
+ Misses 810 808 -2
+ Partials 214 213 -1
Continue to review full report at Codecov.
|
Use settings.LANGUAGES order to pick region-specific locale prefixes, mirroring Django's algorithm. The first region-specific locale prefix is chosen for the generic language, such as zh-CN for zh, and en-US for en. This is identical to the previous order with the notable exception of pt-PT, not pt-BR, being used for pt (Portuguese). This affects other settings: * ACCEPTED_LOCALES is in resolution order, and includes any candidate locales. * ENABLED_LOCALES is also in resolution order * FIRST_LOCALE is removed * LOCALE_ALIASES drops locales that can be derived from resolution order. Only the Chinese customizations (cn, zh-Hans) remain. * SORTED_LANGUAGES (new) contains the locales in preferred order for display and for forms (en-US, then the others in alphabetical order) Consistency checks for locale settings are in the kuma.core.tests.test_settings, avoiding runtime assertions. New functions kuma.core.i18n.get_django_languages and .get_kuma_languages, modeled after Django's get_languages(), return cached OrderedDicts of LANGUAGES, either with Django (en-us) or Kuma (en-US) language codes. These are efficient for both set (lang_code in get_kuma_languages()) and list (for lang_code in get_kuma_languages()) operations, and are now used in a few places.
Update test_locale_middleware_language_cookie to include an Accept-Language header, which is overridden by the cookie.
Previously, a no-locale prefix would get a redirect, whether or not the next link was a 404. Now, the LocaleMiddleware will 404 if the destination with a locale prefix is not valid. This affects the tests for /fr/Firefox and /es/Firefox, which are not yet in the sample database.
I believe I've addressed the review comments. I made a significant change to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice adjustments as well as all of the painstaking work that you've put into this! This is really nicely done. Thanks @jwhitlock!
('sr', 'sr-Latn'), | ||
('zh-CN', 'zh-TW'), | ||
)) | ||
def test_preferred_locale_codes(primary, secondary): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Really nice idea to formalize in a test!
from django.conf import settings | ||
|
||
|
||
def test_accepted_locales(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another great test addition, thanks!
This replaces the
LocaleURLMiddleware
("Based on zamboni.amo.middleware") with one based on Django'sLocaleMiddleware
, plus additional middlewares for Kuma-specific functionality. This replaces PR #4766, which was the work-in-progress version of this code.The Django code is from release 1.8.19, and has several customizations to handle Kuma features like mixed-case locale codes (
en-US
vs Django'sen-us
). It also removes some features we don't use, such as storing a language preference in the session (adding aVary: cookie
to all responses), and some flexibility needed in the Django framework code.The widest code change is the splitting of URL patterns into locale-prefixed patterns and no-locale patterns. In apps, these are grouped into
lang_urlpatterns
andurlpatterns
, and they are applied inkuma/urls.py
with a customizedi18n_patterns
. The patterns now include the active locale, and the active locale needs to be temporarily changed (with language.override(locale)
) to generate URLs to other locales. The default URLs contain the locale, so Kuma is less likely to generate "generic" URLs.Another subtle change is the handling of region-specific languages in the
Accept-Language
header. Previously, a header offr-FR; de=0.5
would selectde
, the first exact match. After this change, the Django-based algorithm picksfr
, the general locale variant offr-FR
. More discussion is at #4766 (comment).There is some known follow-on work that is not included, because this monster is scary enough:
reverse
to drop support for legacy parameters.reverse
to a standard pattern argument.