Skip to content

Commit

Permalink
Switch Bedrock to using Django's own i18n mechanism (#14327)
Browse files Browse the repository at this point in the history
* [Breaking] Drop our custom locale middleware and switch to Django's

* [Breaking] Allow mixed-case lang codes as settings.LANGUAGES and related lang-code lookups

* [Breaking] Use i18n_patterns to lang-code prefix relevant paths in mozorg.urls

This includes splitting mozorg.urls into two groups so we only use
i18n_patterns with paths that need it. We've kept the main, localed, group's
name to be urls.py to reduce the chance of merge conflicts, as this piece
of work is likely to need rebasing against main quite a lot until it's ready.

The decision about which get prefixed and which do not is based on the
settings SUPPORTED_NONLOCALES and SUPPORTED_LOCALE_IGNORE that were used
in our former i18n machinery. At the time of committing, they still exist,
but both of should be able to be removed once this work is done (or at least
moved to power some tests)

* [Breaking] We now have /en-US/ resolving/rendering for the homepage, but not yet /fr/

We use a prefixer for our i18n_urlpatterns so that when we reverse() and
resolve() we allow for a custom prefix that is normalized to our mixed-case
lang codes.

Tests are needed.

* Add middleware to fix up the lang code in the path

This essentially replaces urlresolves.Prefixer.fix()

Tests are needed.

* Lay in custom middleware that lightly wraps LocaleMiddleware to normalize lang codes

* Switch to using django.utils.translation to track the active locale

* Ensure the request is annotated with the relevant lang_code, so that it's
available in templates

* Refactor base test class with simpler locale-activation logic using django core i18n

* Minor fix to modern way to call pytest

* Fix serving of sitemap_none.xml to be without locale

* Small-as-possible changes to get test suite running/not erroring

* Fixup VPN Resource Centre tests - enabling a non-real locale was trickiest part

* Update tests for accepted locales

Note that we no longer have a lowercase-to-mixed-case lang lookup, so this
test changes but might become redundant later in the refactor

* Improve naming in helper

* [Breaking] Add placeholder failing tests

* Extend Django's support for get_language_info to include our three-character lang names, which it doesn't have by default

This is a part of ensuring three-letter country codes still work, but it might
be redundant - TBC

* Use Django's bidi-lang-checker, not ours

* Drop unnecessary work in custom locale middleware

This was useful before we started using i18n_patterns, but now not all of it
is needed. Some still is, though

* Fixup redirects behaviour (using /webvision/ as an example case)

* Redirects don't need url prefixing via i18n_patterns, so we're moving them to
nonlocale_urls instead.
* When detecting a language, en-CA was being picked over en-US, which is
avoided by ensuring en-US is at the start of the languages list
* Note that for the webvision redirect we needed to add:
   * locale_prefix=True
   * prepend_locale=False

* Amend VPNRC tests - tweaking where appropriate; also dropped a fiddly and unrealistic test

* Expand test coverage of redirects in bedrock.mozorg

* Fix need for fallback to settings.LANGUAGE_CODE if no language can be detected/retrieved

* Reinstate logic to handle fallback to appropriate locale based on Accept-Language header

* Move mozorg redirect() calls to redirects.py to avoid the routes having lang codes prefixed to them

* Fixup l10n_utils.render behaviour via minimal tweaks to tests

* Add regression-check tests for utility views

* Improve normalize_locale via tests, including tests for Pocket Mode

* Add tests for i18n helpers

* Improve language normalization + tests

* Move careers redirects to own file, so that they remain lang-code agnostic

* Move foundation redirect out of urls.py so that it remains language agnostic

* Move redirects out of urls.py so that they remain language agnostic - tweak _one_ test related to this

* Avoid unnecessarily adding lang code to a reversed() url in our redirector - we don't need it unless we explicitly ask for it

* Fix lang codes for CANONICAL_LANGS for mozorg mode

* Tidy up tests

* Fixing tests with settings=DEV

* Contentful-related tests needed the dev-urls config wrapping in URLpatterns
to make it work as per the real site; if we just specced dev_urls they did not
work because they were not i18n-patterned

* Default a locale to '' not None - the latter causes Locale page to blow up and the former matches original behaviour better

* Ensure no-JS language changing works

Fixes #5931
See #5931 for context

* Formatting fix after rebasing on main

* Rename our get_language helper to make it clearer what it does

* Monkeypatch Django's check_for_language() because it looks in the wrong place for us

By default check_for_language() seeks gettext-compatible files in locale/ directories
in the project, but we don't have those because our l10n is handled using Fluent
strings in the www-l10n-team directory, cloned from their git repo.

So, instead, we just perform a light check to see whether a given lang is among the
langs that shold be available. The actual decision about whether we have have (enough)
Fluent strings available for that locale happens in lib.l10n_utils.render()

* Expand test coverage of our custom prefixer

* Bring bedrock.base.i18n up to 100% coverage

* Add tests for language-code fixup + slight refactor to ensure it doesn't to _too_ much

* Fix failing startup check for Pocket mode that blocked everything running

* Drop unnecessary-after-all patching of LANG_INFO - we found a different route to making check_for_language work; this was the wrong direction

* Add a little extra logging to the language-fixup redirects we're doing

* Minor test polishing

* Support special-case showing of a locale list at / if no accept-lang header is present

This changeset ensures we retain the existing behaviour where a robot/spider
going to / with no accept-language header gets a 200 OK response but with a
locale-picker page, NOT the homepage.

The fix here was to avoid running Django's LocaleMiddleware when (and only
when) we detect the special case of a request for the root path and no
Accept-Language header - which is basically a call for the root page with
no way to determine the client's locale. (We don't use sessions in Bedrock
so Django will never find a language code in a session anyway)

* Improve naming to avoid confusion between polish (verb) and Polish in split_path_and_polish_lang

* Improve in-code docs for BedrockLangCodeFixupMiddleware

* Test tweak after rebasing on main (which now has Django 4.2)
  • Loading branch information
stevejalim committed Apr 17, 2024
1 parent b54650a commit 66ce747
Show file tree
Hide file tree
Showing 40 changed files with 1,178 additions and 545 deletions.
165 changes: 165 additions & 0 deletions bedrock/base/i18n.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.

import django.urls
from django.conf import settings
from django.utils.text import capfirst
from django.utils.translation.trans_real import parse_accept_lang_header

from lib.l10n_utils import translation


class LocalePrefixPattern(django.urls.LocalePrefixPattern):
"""
A sub-class of Django's "LocalePrefixPattern" that simply normalizes the language
prefix for Bedrock (e.g., upper-case country codes).
This is an essential piece, since it controls the resolution of the
language-code prefix of incoming URLs as well as the language code prefix
of the URLs generated by Django's "reverse".
"""

@property
def language_prefix(self):
language_code = normalize_language(translation.get_language()) or settings.LANGUAGE_CODE
if language_code == settings.LANGUAGE_CODE and not self.prefix_default_language:
return ""
return f"{language_code}/"


def path_needs_lang_code(path):
"""Convenient way to spot if a path needs a prefix or not"""
if path in settings.SUPPORTED_LOCALE_IGNORE:
return False

if path.lstrip("/").split("/")[0] in settings.SUPPORTED_NONLOCALES:
return False

return True


def bedrock_i18n_patterns(*urls, prefix_default_language=True):
"""
Similar to Django's i18n_patterns but uses our "LocalePrefixPattern", defined above
"""
if not settings.USE_I18N:
return list(urls)
return [
django.urls.URLResolver(
LocalePrefixPattern(prefix_default_language=prefix_default_language),
list(urls),
)
]


def normalize_language(language):
"""
Given a language code, returns the language code supported by Bedrock
in the proper case, for example "eN-us" --> "en-US" - or None if
Bedrock doesn't support the language code.
We also convert it to a specified fallback language if the actual
presented code is not available _and_ we have a fallback specced in
settings.
"""

if not language:
return None

if settings.IS_POCKET_MODE:
lang_code = language.lower()
else:
lang_code = language

if lang_code in settings.LANGUAGE_URL_MAP_WITH_FALLBACKS:
return settings.LANGUAGE_URL_MAP_WITH_FALLBACKS[lang_code]

# Reformat the lang code to be mixed-case, as we expect
# them to be for our lookup
lang, _partition, classifier = language.partition("-")
lang = lang.lower() # this part is _always_ lowercase

_special_cases = ["hant"]

if classifier:
if "-" in classifier or classifier in _special_cases:
_classifier, _partition, rest = classifier.partition("-")
if _classifier.lower() in _special_cases:
# support zh-hant-tw -> zh-Hant-TW
_classifier = capfirst(_classifier)
if rest:
classifier = f"{_classifier}-{rest.upper()}"
else:
classifier = _classifier
else:
# Support patterns like ja-JP-mac
classifier = f"{_classifier.upper()}-{rest}"
else:
classifier = classifier.upper()

lang_code = f"{lang}-{classifier}"
else:
lang_code = lang

try:
return settings.LANGUAGE_URL_MAP_WITH_FALLBACKS[lang_code]
except KeyError:
pre = lang_code.split("-")[0]
return settings.LANGUAGE_URL_MAP_WITH_FALLBACKS.get(pre)


def split_path_and_normalize_language(path_):
"""
Split the requested path into (lang, path) and
switches to a supported lang if available
locale will be empty if it isn't found.
Returns:
- lang code (str, may be empty)
- remaining URL path (str)
- whether or not the lang code changed (bool)
"""
path = path_.lstrip("/")

# Use partition instead of split since it always returns 3 parts
extracted_lang, _, rest = path.partition("/")

supported_lang = normalize_language(extracted_lang)
different = extracted_lang != supported_lang

if extracted_lang and supported_lang:
return supported_lang, rest, different
else:
return "", path, False


def get_language_from_headers(request):
"""
Return a locale code we support on the site using the
user's Accept-Language header to determine which is best. This
mostly follows the RFCs but read bug 439568 for details.
"""
accept_lang = request.headers.get("Accept-Language")
if accept_lang:
best = get_best_language(accept_lang)
if best:
return best

return settings.LANGUAGE_CODE


def get_best_language(accept_lang):
"""Given an Accept-Language header, return the best-matching language."""
ranked = parse_accept_lang_header(accept_lang)
for lang, _ in ranked:
supported = normalize_language(lang)
if supported:
return supported


def check_for_bedrock_language(lang_code):
return lang_code in [x[0] for x in settings.LANGUAGES]
180 changes: 160 additions & 20 deletions bedrock/base/middleware.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,48 +10,188 @@
"""

import base64
import contextlib
import inspect
import time
from functools import wraps

from django.conf import settings
from django.core.exceptions import MiddlewareNotUsed
from django.http import Http404, HttpResponse
from django.http import Http404, HttpResponse, HttpResponseRedirect
from django.middleware.locale import LocaleMiddleware as DjangoLocaleMiddleware
from django.utils import translation
from django.utils.deprecation import MiddlewareMixin
from django.utils.translation import trans_real

from commonware.middleware import FrameOptionsHeader as OldFrameOptionsHeader

from bedrock.base import metrics
from lib.l10n_utils import translation
from bedrock.base.i18n import (
check_for_bedrock_language,
get_language_from_headers,
normalize_language,
path_needs_lang_code,
split_path_and_normalize_language,
)
from lib.l10n_utils import is_root_path_with_no_language_clues

from . import urlresolvers

class BedrockLangCodeFixupMiddleware(MiddlewareMixin):
"""Middleware focused on prepping a viable, Bedrock-compatible language code
in the URL, ready for the rest of the i18n logic.
It:
1) Redirects to a new language-coded path if we detect the lang=XX querystring.
This is basically a no-JS way for us to handle language switching.
2) Normalises language codes that are in the path - eg en-us -> en-US and also
goes to a prefix code if we don't have support (eg de-AT -> de)
3) If no redirect is needed, sets request.locale to be the normalized
lang code we've got from the URL
Querystrings are preserved in GET redirects.
class LocaleURLMiddleware:
"""
This middleware adjusts the `path_info` for reverse URL resolving.

We split `request.path_info` into `locale` and `path`. The `path` portion
is saved back to `request.path_info` for reverse URL resolving, while the
`locale` will either be one we support or empty string.
def _redirect(self, request, lang_code, subpath):
dest = f"/{lang_code}/{subpath}"

# log the redirect without querystrings, in case there's a token or similar we don't want to end up in Markus
metrics.incr("bedrock.langfixup.redirect", tags=[f"from:{request.path}", f"to:{dest}"])

if request.GET:
dest += f"?{request.GET.urlencode()}"

return HttpResponseRedirect(dest)

def process_request(self, request):
# Initially see if we can separate a lang code from the rest of the URL
# path, along with a hint about whether the lang code changed, which
# suggests we will need to redirect using `lang_code` as a new prefix.

lang_code, subpath, lang_code_changed = split_path_and_normalize_language(request.path)

# Check for any no-JS language - doing this eagerly is good, as any further
# fixups due to the actual lang code in the lang=XX querystring will be
# handled by the rest of this function when we come back to it.
# Also fixes https://github.com/mozilla/bedrock/issues/5931
lang_via_querystring = request.GET.get("lang", None)
if lang_via_querystring is not None:
cleaned_lang_via_querystring = normalize_language(lang_via_querystring)
# Drop the lang querystring to avoid a redirect loop;
# request.GET is immutable so we have to edit a copy
if not cleaned_lang_via_querystring:
cleaned_lang_via_querystring = settings.LANGUAGE_CODE
qs = request.GET.copy()
del qs["lang"]
request.GET = qs
return self._redirect(request, cleaned_lang_via_querystring, subpath)

# If we're not redirecting based on a querystring, do we need to find
# a language code? If so, find from the headers and redirect.
if not lang_code and path_needs_lang_code(request.path) and not is_root_path_with_no_language_clues(request):
lang_code = get_language_from_headers(request)
return self._redirect(request, lang_code, subpath)

# Or if we have a lang code, but it's not the one that was in the
# original URL path – e.g. fixing es-mx to es-MX - we also need to redirect
if lang_code and lang_code_changed:
return self._redirect(request, lang_code, subpath)

# If we get this far, the path contains a validly formatted lang code
# OR the path does not need a lang code. We annotate the request appropriately
request.locale = lang_code if lang_code else ""


class BedrockLocaleMiddleware(DjangoLocaleMiddleware):
"""Middleware that usually* wraps Django's own i18n middleware in order to
ensure we normalize language codes - i..e. we ensure they are in
mixed case we use, rather than Django's internal all-lowercase codes.
*for one specific situation, though, we skip Django's LocaleMiddleware entirely:
we have a special SEO-helping page that gets returned to robots/spiders that
don't declare an accept-language header, showing a list of locales to pick.
It needs to be kept super-light so that it doesn't diverge too far from
the stock LocaleMiddleware, lest we wake up dragons when we use
wagtail-localize, which squarely depends on django's LocaleMiddleware.
Note: this is not SUMO's LocaleMiddleware, this just a tribute.
(https://github.com/escattone/kitsune/blob/main/kitsune/sumo/middleware.py#L128)
"""

def __init__(self, get_response):
self.get_response = get_response
def process_request(self, request):
if is_root_path_with_no_language_clues(request):
# Skip using Django's LocaleMiddleware
metrics.incr(
"bedrock.localemiddleware.skipdjangolocalemiddleware",
tags=[f"path:{request.path}"],
)
else:
with normalized_get_language():
with simplified_check_for_language():
return super().process_request(request)

def __call__(self, request):
response = self.process_request(request)
if response:
def process_response(self, request, response):
if is_root_path_with_no_language_clues(request):
# Skip using Django's LocaleMiddleware on the response cycle too
return response
return self.get_response(request)

def process_request(self, request):
prefixer = urlresolvers.Prefixer(request)
urlresolvers.set_url_prefix(prefixer)
with normalized_get_language():
with simplified_check_for_language():
return super().process_response(request, response)


@contextlib.contextmanager
def normalized_get_language():
"""
Ensures that any use of django.utils.translation.get_language()
within its context will return a normalized language code. This
context manager only works when the "get_language" function is
acquired from the "django.utils.translation" module at call time,
so for example, if it's called like "translation.get_language()".
Note that this does not cover every call to translation.get_language()
as there are calls to it on Django startup and more, but this gives us an
extra layer of (idempotent) fixing up within the request/response cycle.
"""
get_language = translation.get_language

@wraps(get_language)
def get_normalized_language():
return normalize_language(get_language())

translation.get_language = get_normalized_language

try:
yield
finally:
translation.get_language = get_language


@contextlib.contextmanager
def simplified_check_for_language():
"""Ensures that calls to trans_real.check_for_language within
its context will not check for the existence of files in
`appname/LANG_CODE/locale` and therefore avoid a false negative about
the langs we support (our lang files are in ./data/l10n-team/LANG_CODE)
but check_for_language is opinionated and expects only the Django directory
pattern for lang files."""

check_for_language = trans_real.check_for_language

@wraps(check_for_language)
def simpler_check_for_language(lang_code):
return check_for_bedrock_language(lang_code)

trans_real.check_for_language = simpler_check_for_language

request.path_info = f"/{prefixer.shortened_path}"
request.locale = prefixer.locale
translation.activate(prefixer.locale or settings.LANGUAGE_CODE)
try:
yield
finally:
trans_real.check_for_language = check_for_language


class BasicAuthMiddleware:
Expand Down

0 comments on commit 66ce747

Please sign in to comment.