Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language codes are silently lowercased from config #7344

Closed
sergioisidoro opened this issue Jun 1, 2020 · 10 comments
Closed

Language codes are silently lowercased from config #7344

sergioisidoro opened this issue Jun 1, 2020 · 10 comments

Comments

@sergioisidoro
Copy link

What version of Hugo are you using (hugo version)?

$ hugo version
Hugo Static Site Generator v0.68.3/extended darwin/amd64 BuildDate: unknown

Does this issue reproduce with the latest release?

Yes. Upgraded and tested with latest : Hugo Static Site Generator v0.72.0/extended darwin/amd64 BuildDate: unknown

Issue

This is a fairly simple thing I found that should be fixable in 5 minutes for people who know the codebase. I was trying to setup localisation with language codes including the regional variations (en-GB, pt-PT, etc) and was hitting an error.

ERROR 2020/06/01 19:56:59 Site config value "en-GB" for defaultContentLanguage does not match any language definition

After a couple of trials and errors i found that if I had everything in lowercase it worked. Apparently the [languages] keys are lowercased, but the defaultContentLanguage is not.

Additional thing that might be usefull: Add an example with these regional languages to documentation

Code:

languageCode = "en-GB"
defaultContentLanguage = "en-GB"

.... 
[languages]
  [languages.en-GB]
    title = "Never gonna give you up"
    contentDir = "content/en-gb"
    languageName = "English-GB"
    weight = 10
  [languages.pt-PT]
    title = "Nunca te vou deixar"
    contentDir = "content/pt-pt"
    languageName = "Português-PT"
    weight = 20

Fails with

Change of config file detected, rebuilding site.
2020-06-01 19:56:59.410 +0300
ERROR 2020/06/01 19:56:59 Failed to reload config:

ERROR 2020/06/01 19:56:59 Site config value "en-GB" for defaultContentLanguage does not match any language definition
Rebuilt in 1 ms

But if I change things to:

languageCode = "en-gb"
defaultContentLanguage = "en-gb"

Things work. Since Hugo is silently downcasing the language codes, it makes sense to automatically lowercase the defaultContentLanguage param.

@hasitpbhatt
Copy link

hasitpbhatt commented Jun 5, 2020

@bep The change mentioned here seems pretty straightforward. We have 2 options if we do want to do code change.

  1. Change every cfg.GetString("defaultContentLanguage") to cfg.GetLowerString("defaultContentLanguage"), but the issue there could be, nothing prevents new diffs from using cfg.GetString("defaultContentLanguage").
  2. We don't explicitly change the string to lowercase
    maps.ToLower(params)
    .

To support standard localisation (en-GB etc.) I would tend towards the second approach, but it could potentially break existing projects. So I think the documentation option proposed by @sergioisidoro seems to be the way to go considering everything.

@sergioisidoro
Copy link
Author

To support standard localisation (en-GB etc.) I would tend towards the second approach, but it could potentially break existing projects.

It there is something worth doing, it's worth overdoing. :D

Here's the IETF guidelines.
https://tools.ietf.org/html/bcp47#page-1-9

langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]

region is defined as:

 region        = 2ALPHA              ; ISO 3166-1 code
               / 3DIGIT              ; UN M.49 code

And ISO 3166-1 codes are all uppercase according to Wiki - https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 (did not get the actual standard).

So it seems that the "correct" way of doing this is to not lowercase the languages.
Could this be documented as is right now, but fixed in the next major version?

@sergioisidoro
Copy link
Author

@hasitpbhatt Added some very rudimentary docs here: gohugoio/hugoDocs#1139

@stale
Copy link

stale bot commented Oct 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help.
If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open.
If this is a feature request, and you feel that it is still relevant and valuable, please tell us why.
This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

@stale stale bot added the Stale label Oct 4, 2020
@sergioisidoro
Copy link
Author

@hasitpbhatt What is the way forward here? I can try to draft some PR.

@stale stale bot removed the Stale label Oct 5, 2020
@hasitpbhatt
Copy link

@bep can you please help us out here!!! Thanks

@houmie
Copy link

houmie commented Jan 9, 2021

This is indeed a problem as it's not using the ISO standard definition. Language code should be lowercase, but country code has to be upper case.

Facebook sharing for example is expecting fr-FR or en-GB and not as fr-fr or en-gb respectively.

https://developers.facebook.com/docs/internationalization#locales

<meta property="og:locale" content="{{ .Site.LanguageCode }}"/>
is producing
<meta property="og:locale" content="en-us"/>
which is incorrect.

@techmagus
Copy link

Just an additional information.

As per W3C, the casing of language codes does not matter as far as HTML and XML are concerned.

See: Language tags in HTML and XML

The entries in the registry follow certain conventions with regard to upper and lower letter-casing. For example, language tags are lower case, alphabetic region subtags are upper case, and script tags begin with an initial capital. This is only a convention! When you use these subtags you are free to do as you like, unless you are constrained by the rules of the system you are working with. For HTML and XML language markup, the case should not matter.

(my emphases)

The aforementioned W3C page was last updated on: 2014-03-03 13:55.

I searched archive.org and found that it has always been that way. See: archive.org: 2005-02-25 13:36:05

I used to understand that language codes should follow the IANA subtag registry casing as well, even though I've read the aforementioned W3C document countless of times--but for some reason it did not register. It also doesn't appear that they silently added it recently since archive.org shows it has always been there.

Since the W3C is the authority on HTML and XML markup, and they said the casing does not matter for those, I'm withdrawing my request for this feature. Also, minifiers lowercase it (at least by default) and browsers does are fine with it. Unless, as W3C said, you are constrained by the rules of the system you are working with.

@jmooring
Copy link
Member

This was resolved in v0.76.4. No problems with this configuration:

defaultContentLanguage = "en-GB"

[languages.en-GB]
weight = 1

[languages.pt-PT]
weight = 2

For those concerned about the inability to use upper case or mixed case in the published URL, see #9404.

For those concerned about upper case and mixed case in the built-in templates (rss, sitemap, and alias), specify a language code in your site configuration. For example:

[languages.en]
languageCode = 'en-US'
weight = 1

[languages.de]
languageCode = 'de-DE'
weight = 2

Use the value in your templates with {{ site.Language.LanguageCode }}. If you don't set languageCode, it will fall back to the language key (e.g., de instead of de-DE).

If you populate the lang attribute of the html element in your templates with {{ site.Language.LanguageCode }}, this will be converted to lower case if you minify your site. This behavior will change shortly when we bump tdewolff/minify to v2.12.6.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants