Skip to content

Commit

Permalink
feature #52986 [HttpFoundation] Similar locale selection (Spomky)
Browse files Browse the repository at this point in the history
This PR was merged into the 7.1 branch.

Discussion
----------

[HttpFoundation] Similar locale selection

| Q             | A
| ------------- | ---
| Branch?       | 7.1
| Bug fix?      | no
| New feature?  | yes <!-- please update src/**/CHANGELOG.md files -->
| Deprecations? | no <!-- please update UPGRADE-*.md and src/**/CHANGELOG.md files -->
| Issues        | none
| License       | MIT

Allow the nearest locale to be selected instead the default one.
Ping `@welcoMattic`

I noted a non-optimized locale selection. Let say I have the following controller and root route:

```php
#[Route('/', name: 'app_root', methods: [Request::METHOD_GET])]
public function indexNoLocale(Request $request): Response
{
    $locale = $request->getPreferredLanguage($this->supportedLocales) ?? $this->defaultLocale;

    return $this->redirectToRoute('app_homepage', [
        '_locale' => $locale,
    ], Response::HTTP_SEE_OTHER);
}
```

And the following parameters:
* `$this->supportedLocales` = `['fr_FR', 'en_US']`
* `$this->defaultLocale` = `'en_US'`

When a user arrives on this page with the header `accept-language: ja-JP,fr_CA;q=0.7,fr;q=0.5`, the default locale `en_US` is returned.
In this situation, I would expect to use `fr_FR` as the browser indicates French is a possible choice (`fr_CA` and `fr`).

With this change, and if no exact match is found, a similar language is selected. In this example, `fr_FR` is acceptable and then returned.

Commits
-------

ef73505 Allow the nearest locale to be selected instead the default one.
  • Loading branch information
fabpot committed Apr 5, 2024
2 parents a59cf0e + ef73505 commit d1f3ee8
Show file tree
Hide file tree
Showing 2 changed files with 146 additions and 73 deletions.
122 changes: 91 additions & 31 deletions src/Symfony/Component/HttpFoundation/Request.php
Expand Up @@ -1532,24 +1532,25 @@ public function getPreferredLanguage(?array $locales = null): ?string
return $preferredLanguages[0] ?? null;
}

$locales = array_map($this->formatLocale(...), $locales ?? []);
if (!$preferredLanguages) {
return $locales[0];
}

$extendedPreferredLanguages = [];
foreach ($preferredLanguages as $language) {
$extendedPreferredLanguages[] = $language;
if (false !== $position = strpos($language, '_')) {
$superLanguage = substr($language, 0, $position);
if (!\in_array($superLanguage, $preferredLanguages, true)) {
$extendedPreferredLanguages[] = $superLanguage;
if ($matches = array_intersect($preferredLanguages, $locales)) {
return current($matches);
}

$combinations = array_merge(...array_map($this->getLanguageCombinations(...), $preferredLanguages));
foreach ($combinations as $combination) {
foreach ($locales as $locale) {
if (str_starts_with($locale, $combination)) {
return $locale;
}
}
}

$preferredLanguages = array_values(array_intersect($extendedPreferredLanguages, $locales));

return $preferredLanguages[0] ?? $locales[0];
return $locales[0];
}

/**
Expand All @@ -1567,32 +1568,91 @@ public function getLanguages(): array
$this->languages = [];
foreach ($languages as $acceptHeaderItem) {
$lang = $acceptHeaderItem->getValue();
if (str_contains($lang, '-')) {
$codes = explode('-', $lang);
if ('i' === $codes[0]) {
// Language not listed in ISO 639 that are not variants
// of any listed language, which can be registered with the
// i-prefix, such as i-cherokee
if (\count($codes) > 1) {
$lang = $codes[1];
}
} else {
for ($i = 0, $max = \count($codes); $i < $max; ++$i) {
if (0 === $i) {
$lang = strtolower($codes[0]);
} else {
$lang .= '_'.strtoupper($codes[$i]);
}
}
}
}

$this->languages[] = $lang;
$this->languages[] = $this->formatLocale($lang);
}
$this->languages = array_unique($this->languages);

return $this->languages;
}

/**
* Strips the locale to only keep the canonicalized language value.
*
* Depending on the $locale value, this method can return values like :
* - language_Script_REGION: "fr_Latn_FR", "zh_Hans_TW"
* - language_Script: "fr_Latn", "zh_Hans"
* - language_REGION: "fr_FR", "zh_TW"
* - language: "fr", "zh"
*
* Invalid locale values are returned as is.
*
* @see https://wikipedia.org/wiki/IETF_language_tag
* @see https://datatracker.ietf.org/doc/html/rfc5646
*/
private static function formatLocale(string $locale): string
{
[$language, $script, $region] = self::getLanguageComponents($locale);

return implode('_', array_filter([$language, $script, $region]));
}

/**
* Returns an array of all possible combinations of the language components.
*
* For instance, if the locale is "fr_Latn_FR", this method will return:
* - "fr_Latn_FR"
* - "fr_Latn"
* - "fr_FR"
* - "fr"
*
* @return string[]
*/
private static function getLanguageCombinations(string $locale): array
{
[$language, $script, $region] = self::getLanguageComponents($locale);

return array_unique([
implode('_', array_filter([$language, $script, $region])),
implode('_', array_filter([$language, $script])),
implode('_', array_filter([$language, $region])),
$language,
]);
}

/**
* Returns an array with the language components of the locale.
*
* For example:
* - If the locale is "fr_Latn_FR", this method will return "fr", "Latn", "FR"
* - If the locale is "fr_FR", this method will return "fr", null, "FR"
* - If the locale is "zh_Hans", this method will return "zh", "Hans", null
*
* @see https://wikipedia.org/wiki/IETF_language_tag
* @see https://datatracker.ietf.org/doc/html/rfc5646
*
* @return array{string, string|null, string|null}
*/
private static function getLanguageComponents(string $locale): array
{
$locale = str_replace('_', '-', strtolower($locale));
$pattern = '/^([a-zA-Z]{2,3}|i-[a-zA-Z]{5,})(?:-([a-zA-Z]{4}))?(?:-([a-zA-Z]{2}))?(?:-(.+))?$/';
if (!preg_match($pattern, $locale, $matches)) {
return [$locale, null, null];
}
if (str_starts_with($matches[1], 'i-')) {
// Language not listed in ISO 639 that are not variants
// of any listed language, which can be registered with the
// i-prefix, such as i-cherokee
$matches[1] = substr($matches[1], 2);
}

return [
$matches[1],
isset($matches[2]) ? ucfirst(strtolower($matches[2])) : null,
isset($matches[3]) ? strtoupper($matches[3]) : null,
];
}

/**
* Gets a list of charsets acceptable by the client browser in preferable order.
*
Expand Down
97 changes: 55 additions & 42 deletions src/Symfony/Component/HttpFoundation/Tests/RequestTest.php
Expand Up @@ -1502,27 +1502,43 @@ public function testGetPreferredLanguage()
{
$request = new Request();
$this->assertNull($request->getPreferredLanguage());
$this->assertNull($request->getPreferredLanguage([]));
$this->assertEquals('fr', $request->getPreferredLanguage(['fr']));
$this->assertEquals('fr', $request->getPreferredLanguage(['fr', 'en']));
$this->assertEquals('en', $request->getPreferredLanguage(['en', 'fr']));
$this->assertEquals('fr-ch', $request->getPreferredLanguage(['fr-ch', 'fr-fr']));

$request = new Request();
$request->headers->set('Accept-language', 'zh, en-us; q=0.8, en; q=0.6');
$this->assertEquals('en', $request->getPreferredLanguage(['en', 'en-us']));

$request = new Request();
$request->headers->set('Accept-language', 'zh, en-us; q=0.8, en; q=0.6');
$this->assertEquals('en', $request->getPreferredLanguage(['fr', 'en']));

$request = new Request();
$request->headers->set('Accept-language', 'zh, en-us; q=0.8');
$this->assertEquals('en', $request->getPreferredLanguage(['fr', 'en']));
}

/**
* @dataProvider providePreferredLanguage
*/
public function testPreferredLanguageWithLocales(?string $expectedLocale, ?string $acceptLanguage, array $locales)
{
$request = new Request();
$request->headers->set('Accept-language', 'zh, en-us; q=0.8, fr-fr; q=0.6, fr; q=0.5');
$this->assertEquals('en', $request->getPreferredLanguage(['fr', 'en']));
if ($acceptLanguage) {
$request->headers->set('Accept-language', $acceptLanguage);
}
$this->assertSame($expectedLocale, $request->getPreferredLanguage($locales));
}

public static function providePreferredLanguage(): iterable
{
yield '"es_PA" is selected as no supported locale is set' => ['es_PA', 'es-pa, en-us; q=0.8, en; q=0.6', []];
yield 'No supported locales' => [null, null, []];
yield '"fr" selected as first choice when no header is present' => ['fr', null, ['fr', 'en']];
yield '"en" selected as first choice when no header is present' => ['en', null, ['en', 'fr']];
yield '"fr_CH" selected as first choice when no header is present' => ['fr_CH', null, ['fr-ch', 'fr-fr']];
yield '"en_US" is selected as an exact match is found (1)' => ['en_US', 'zh, en-us; q=0.8, en; q=0.6', ['en', 'en-us']];
yield '"en_US" is selected as an exact match is found (2)' => ['en_US', 'ja-JP,fr_CA;q=0.7,fr;q=0.5,en_US;q=0.3', ['en_US', 'fr_FR']];
yield '"en" is selected as an exact match is found' => ['en', 'zh, en-us; q=0.8, en; q=0.6', ['fr', 'en']];
yield '"fr" is selected as an exact match is found' => ['fr', 'zh, en-us; q=0.8, fr-fr; q=0.6, fr; q=0.5', ['fr', 'en']];
yield '"en" is selected as "en-us" is a similar dialect' => ['en', 'zh, en-us; q=0.8', ['fr', 'en']];
yield '"fr_FR" is selected as "fr_CA" is a similar dialect (1)' => ['fr_FR', 'ja-JP,fr_CA;q=0.7,fr;q=0.5', ['en_US', 'fr_FR']];
yield '"fr_FR" is selected as "fr_CA" is a similar dialect (2)' => ['fr_FR', 'ja-JP,fr_CA;q=0.7', ['en_US', 'fr_FR']];
yield '"fr_FR" is selected as "fr" is a similar dialect' => ['fr_FR', 'ja-JP,fr;q=0.5', ['en_US', 'fr_FR']];
yield '"fr_FR" is selected as "fr_CA" is a similar dialect and has a greater "q" compared to "en_US" (2)' => ['fr_FR', 'ja-JP,fr_CA;q=0.7,ru-ru;q=0.3', ['en_US', 'fr_FR']];
yield '"en_US" is selected it is an exact match' => ['en_US', 'ja-JP,fr;q=0.5,en_US;q=0.3', ['en_US', 'fr_FR']];
yield '"fr_FR" is selected as "fr_CA" is a similar dialect and has a greater "q" compared to "en"' => ['fr_FR', 'ja-JP,fr_CA;q=0.7,en;q=0.5', ['en_US', 'fr_FR']];
yield '"fr_FR" is selected as is is an exact match as well as "en_US", but with a greater "q" parameter' => ['fr_FR', 'en-us;q=0.5,fr-fr', ['en_US', 'fr_FR']];
yield '"hi_IN" is selected as "hi_Latn_IN" is a similar dialect' => ['hi_IN', 'fr-fr,hi_Latn_IN;q=0.5', ['hi_IN', 'en_US']];
yield '"hi_Latn_IN" is selected as "hi_IN" is a similar dialect' => ['hi_Latn_IN', 'fr-fr,hi_IN;q=0.5', ['hi_Latn_IN', 'en_US']];
yield '"en_US" is selected as "en_Latn_US+variants+extensions" is a similar dialect' => ['en_US', 'en-latn-us-fonapi-u-nu-numerical-x-private,fr;q=0.5', ['fr_FR', 'en_US']];
yield '"zh_Hans" is selected over "zh_TW" as the script as a greater priority over the region' => ['zh_Hans', 'zh-hans-tw, zh-hant-tw', ['zh_Hans', 'zh_tw']];
}

public function testIsXmlHttpRequest()
Expand Down Expand Up @@ -1601,30 +1617,28 @@ public function testGetAcceptableContentTypes()
$this->assertEquals(['application/vnd.wap.wmlscriptc', 'text/vnd.wap.wml', 'application/vnd.wap.xhtml+xml', 'application/xhtml+xml', 'text/html', 'multipart/mixed', '*/*'], $request->getAcceptableContentTypes());
}

public function testGetLanguages()
/**
* @dataProvider provideLanguages
*/
public function testGetLanguages(array $expectedLocales, ?string $acceptLanguage)
{
$request = new Request();
$this->assertEquals([], $request->getLanguages());

$request = new Request();
$request->headers->set('Accept-language', 'zh, en-us; q=0.8, en; q=0.6');
$this->assertEquals(['zh', 'en_US', 'en'], $request->getLanguages());

$request = new Request();
$request->headers->set('Accept-language', 'zh, en-us; q=0.6, en; q=0.8');
$this->assertEquals(['zh', 'en', 'en_US'], $request->getLanguages()); // Test out of order qvalues

$request = new Request();
$request->headers->set('Accept-language', 'zh, en, en-us');
$this->assertEquals(['zh', 'en', 'en_US'], $request->getLanguages()); // Test equal weighting without qvalues

$request = new Request();
$request->headers->set('Accept-language', 'zh; q=0.6, en, en-us; q=0.6');
$this->assertEquals(['en', 'zh', 'en_US'], $request->getLanguages()); // Test equal weighting with qvalues
if ($acceptLanguage) {
$request->headers->set('Accept-language', $acceptLanguage);
}
$this->assertEquals($expectedLocales, $request->getLanguages());
}

$request = new Request();
$request->headers->set('Accept-language', 'zh, i-cherokee; q=0.6');
$this->assertEquals(['zh', 'cherokee'], $request->getLanguages());
public static function provideLanguages(): iterable
{
yield 'empty' => [[], null];
yield [['zh', 'en_US', 'en'], 'zh, en-us; q=0.8, en; q=0.6'];
yield 'Test out of order qvalues' => [['zh', 'en', 'en_US'], 'zh, en-us; q=0.6, en; q=0.8'];
yield 'Test equal weighting without qvalues' => [['zh', 'en', 'en_US'], 'zh, en, en-us'];
yield 'Test equal weighting with qvalues' => [['en', 'zh', 'en_US'], 'zh; q=0.6, en, en-us; q=0.6'];
yield 'Test irregular locale' => [['zh', 'cherokee'], 'zh, i-cherokee; q=0.6'];
yield 'Test with variants, unicode extensions and private information' => [['pt_BR', 'hy_Latn_IT', 'zh_Hans_TW'], 'pt-BR-u-ca-gregory-nu-latn, hy-Latn-IT-arevela, zh-Hans-TW-fonapi-u-islamcal-x-AZE-derbend; q=0.6'];
yield 'Test multiple regions' => [['en_US', 'en_CA', 'en_GB', 'en'], 'en-us, en-ca, en-gb, en'];
}

public function testGetAcceptHeadersReturnString()
Expand Down Expand Up @@ -2199,7 +2213,7 @@ public function testFactory()

public function testFactoryCallable()
{
$requestFactory = new class {
$requestFactory = new class() {
public function createRequest(): Request
{
return new NewRequest();
Expand All @@ -2211,7 +2225,6 @@ public function createRequest(): Request
$this->assertEquals('foo', Request::create('/')->getFoo());

Request::setFactory(null);

}

/**
Expand Down

0 comments on commit d1f3ee8

Please sign in to comment.