New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multilingual multihost support #4027

Closed
bep opened this Issue Oct 30, 2017 · 23 comments

Comments

Projects
None yet
5 participants
@bep
Member

bep commented Oct 30, 2017

The current multilingual support in Hugo is restricted to 1 baseURL, i.e. the languages are put into subfolders named after the language code (the default language may be kept on the top level), i.e. https://example.com/en etc.

This works great and is possibly the most common use case.

This does, however, not allow using the baseURL to differentiate the languages, i.e. https://en.example.com, https://jp.example.com and similar variations.

Core Changes

This issue describes a way to define a baseURL per language. The new rule will be:

If a baseURL is set on the language level, then all languages must have one and they must all be different.

Example:

[languages]
[languages.no]
baseURL = "https://example.no"
languageName = "Norsk"
weight = 1
title = "På norsk"

[languages.en]
baseURL = "https://example.com"
languageName = "English"
weight = 2
title = "In English"

With the above, the two sites will be generated into public with their own root:

public
├── en
└── no

The important part here is that:

All URLs (i.e .Permalink etc.) will be generated from that root. So the English home page above will have its .Permalink set to https://example.com/.

Hugo Server

The changes above will work for the regular hugo; the two sites will be ready to be configured as two virtual hosts in Nginx or similar.

But you'd want to test it before you go live, of course. So we need to adapt hugo server to also handle multiple base URLs.

This issue suggests that we start as many HTTP servers as there are languages. Which, in its default port settings will just increment from port 1313 and get:

http://localhost:1313/
http://localhost:1314/
...

Then you can navigate the sites (i.e. jump from article to its translations etc.) just as it was deployed live on your production environment.

Multiple content and static dirs

Also see #4073 and #3757

With the new deployment topology this new feature creates, you often end up wanting better control of your content and your static files.

We will improve on this on two levels:

  1. staticDir can now be a slice of strings.
  2. Each language can have its own staticDir settings.
  3. Additional staticDir entries can be added by adding a ID from 1..10 to the key, e.g. staticDir1. This can be useful if you want to keep the global static dirs settings, but have one or more additional directories for a specific language.

All the static directories will create a union filesystem from left to right:

theme static dir, global config static dirs, language static dirs.

Example:

staticDir = ["static1", "static2"]
[languages]
[languages.no]
staticDir = ["staticDir_override", "static_no"]
baseURL = "https://example.no"
languageName = "Norsk"
weight = 1
title = "På norsk"

[languages.en]
staticDir2 = "static_en"
baseURL = "https://example.com"
languageName = "English"
weight = 2
title = "In English"

In the above, with no theme used:

  • the English site will get its static files as a union of "static1", "static2" and "static_en". On file duplicates, the right-most version will win.
  • the Norwegian site will get its static files as a union of "staticDir_override" and "static_no".

@bep bep added the Enhancement label Oct 30, 2017

@bep bep self-assigned this Oct 30, 2017

@bep bep added this to the v0.31 milestone Oct 30, 2017

@RickCogley

This comment has been minimized.

Show comment
Hide comment
@RickCogley

RickCogley Oct 31, 2017

Contributor

Thank you for organizing this @bep.

An aside: the .co.jp domain is an interesting one, because the authorities in Japan regulate it so that one corporate entity in Japan can get only one .co.jp domain. (There was a big problem at the beginning of the Internet here, with people "squatting" on .co.jp domains.) Sometimes after a Japan entity gets a .co.jp, they put other languages under say .com or .no etc, whatever is logical for that language.

I can think of these things:

  • the idea that it will start many HTTP servers with a different port per language is great. Hugo server would need to increment up from a CLI-specified port, given multiple baseURLs, as well.
  • Japanese registrars offer kanji domain names now (technically, IDN's), like 日本語.jp etc. It might be an edge case, but does hugo allow a baseURL that is non-ASCII? Reference: https://unicode.org/faq/idn.html
Contributor

RickCogley commented Oct 31, 2017

Thank you for organizing this @bep.

An aside: the .co.jp domain is an interesting one, because the authorities in Japan regulate it so that one corporate entity in Japan can get only one .co.jp domain. (There was a big problem at the beginning of the Internet here, with people "squatting" on .co.jp domains.) Sometimes after a Japan entity gets a .co.jp, they put other languages under say .com or .no etc, whatever is logical for that language.

I can think of these things:

  • the idea that it will start many HTTP servers with a different port per language is great. Hugo server would need to increment up from a CLI-specified port, given multiple baseURLs, as well.
  • Japanese registrars offer kanji domain names now (technically, IDN's), like 日本語.jp etc. It might be an edge case, but does hugo allow a baseURL that is non-ASCII? Reference: https://unicode.org/faq/idn.html
@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Oct 31, 2017

Member

but does hugo allow a baseURL that is non-ASCII?

Yes, Hugo is all UTF-8 and I'm pretty sure what we do no "normalization" of the baseURL part.

As to hugo server there are some existing logic that says that we use localhost by default if --baseURL is not set in CLI. We probably need something like this here as well. You would normally not use a "real domain" in test, but I use some TypeKit fonts in my sites, so I need a local domain (defined in my hosts file) to get the validation to pass. But let me think about that when I get to it.

Member

bep commented Oct 31, 2017

but does hugo allow a baseURL that is non-ASCII?

Yes, Hugo is all UTF-8 and I'm pretty sure what we do no "normalization" of the baseURL part.

As to hugo server there are some existing logic that says that we use localhost by default if --baseURL is not set in CLI. We probably need something like this here as well. You would normally not use a "real domain" in test, but I use some TypeKit fonts in my sites, so I need a local domain (defined in my hosts file) to get the validation to pass. But let me think about that when I get to it.

@RickCogley

This comment has been minimized.

Show comment
Hide comment
@RickCogley

RickCogley Oct 31, 2017

Contributor

thanks @bep. I think I confused things. I meant to say, if the port is set on the CLI, then hugo would need to increment up from that.

 hugo ... -p 1317 ...

... would get you 1317, 1318 etc, if you have multiple baseURLs specified in the config.

Regarding typekit needing a local domain, I did not know that, and just specified "localhost" and "127.0.0.1" in the typekit "kit editor" setup, for my sites that use typekit. It seems to work...

Contributor

RickCogley commented Oct 31, 2017

thanks @bep. I think I confused things. I meant to say, if the port is set on the CLI, then hugo would need to increment up from that.

 hugo ... -p 1317 ...

... would get you 1317, 1318 etc, if you have multiple baseURLs specified in the config.

Regarding typekit needing a local domain, I did not know that, and just specified "localhost" and "127.0.0.1" in the typekit "kit editor" setup, for my sites that use typekit. It seems to work...

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Oct 31, 2017

Member

just specified "localhost" and "127.0.0.1" in the typekit "kit editor" setup, for my sites that use typekit. It seems to work...

Yes, that works too, I guess, but having a "secret domain" prevents others from using my subscription. Not a big thing.

And yes; I agree about the incremental port thing.

Member

bep commented Oct 31, 2017

just specified "localhost" and "127.0.0.1" in the typekit "kit editor" setup, for my sites that use typekit. It seems to work...

Yes, that works too, I guess, but having a "secret domain" prevents others from using my subscription. Not a big thing.

And yes; I agree about the incremental port thing.

@RickCogley

This comment has been minimized.

Show comment
Hide comment
@RickCogley

RickCogley Oct 31, 2017

Contributor

ah, never thought of that. Uh oh!

Contributor

RickCogley commented Oct 31, 2017

ah, never thought of that. Uh oh!

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Oct 31, 2017

Member

Just to complete that thought. I have something like this in /etc/hosts:

127.0.0.1   somename.local

And then I do hugo server --baseUrl=http://somename.local.

Which I will make sure works also in a multihost setup (http://somename.local:1313, http://somename.local:1314 etc.).

Member

bep commented Oct 31, 2017

Just to complete that thought. I have something like this in /etc/hosts:

127.0.0.1   somename.local

And then I do hugo server --baseUrl=http://somename.local.

Which I will make sure works also in a multihost setup (http://somename.local:1313, http://somename.local:1314 etc.).

bep added a commit to bep/hugo that referenced this issue Nov 2, 2017

bep added a commit that referenced this issue Nov 4, 2017

bep added a commit that referenced this issue Nov 10, 2017

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Nov 10, 2017

Member

@RickCogley I'm back from Spain and about to wrap my head around finishing this implementation.

One remark:

We may refine this in the future, but my first take on the static folder will be to duplicate it for the different languages. Which I think makes the most sense.

Member

bep commented Nov 10, 2017

@RickCogley I'm back from Spain and about to wrap my head around finishing this implementation.

One remark:

We may refine this in the future, but my first take on the static folder will be to duplicate it for the different languages. Which I think makes the most sense.

@RickCogley

This comment has been minimized.

Show comment
Hide comment
@RickCogley

RickCogley Nov 10, 2017

Contributor

Welcome back @bep. Hope you got to relax & enjoy. :-)

Yeah, I can see duplicating static. Just wondering, if I had stuff that was shared between both sites, could I use a symlink between? Say, the main language's static/img is linked into other languages' static/img. Or, doesn't hugo deal well with symlinks...?

Contributor

RickCogley commented Nov 10, 2017

Welcome back @bep. Hope you got to relax & enjoy. :-)

Yeah, I can see duplicating static. Just wondering, if I had stuff that was shared between both sites, could I use a symlink between? Say, the main language's static/img is linked into other languages' static/img. Or, doesn't hugo deal well with symlinks...?

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Nov 10, 2017

Member

@RickCogley I have a better idea. I will revise my first post to include this.

Member

bep commented Nov 10, 2017

@RickCogley I have a better idea. I will revise my first post to include this.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Nov 10, 2017

Member

@RickCogley I have updated the description with a new section about this. I think this will be very valuable, not just for this particular feature.

Member

bep commented Nov 10, 2017

@RickCogley I have updated the description with a new section about this. I think this will be very valuable, not just for this particular feature.

@moorereason

This comment has been minimized.

Show comment
Hide comment
@moorereason

moorereason Nov 10, 2017

Contributor

@bep, the current description lists this config block (with irrelevant elements removed):

[languages]
staticDir = "static_no"
[languages.no]

[languages.en]
staticDir = "static_en"

Did you mean to include staticDir = "static_no" directly under the [languages] table?

Contributor

moorereason commented Nov 10, 2017

@bep, the current description lists this config block (with irrelevant elements removed):

[languages]
staticDir = "static_no"
[languages.no]

[languages.en]
staticDir = "static_en"

Did you mean to include staticDir = "static_no" directly under the [languages] table?

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Nov 10, 2017

Member

Did you mean to include staticDir = "static_no" directly under the [languages] table?

Copy and paste mistake. Thanks for spotting.

Member

bep commented Nov 10, 2017

Did you mean to include staticDir = "static_no" directly under the [languages] table?

Copy and paste mistake. Thanks for spotting.

@RickCogley

This comment has been minimized.

Show comment
Hide comment
@RickCogley

RickCogley Nov 11, 2017

Contributor

@bep that sounds slick. Given the same file in say static_en/img and static_ja/img which one will "win" if the idea is the "right most" will? Will it be alphabetical, or, last defined in the config.toml?

Also, are the rules different if, say, they have the same filename but one is newer?

I have updated the description with a new section about this. I think this will be very valuable, not just for this particular feature.

On file duplicates, the right-most version will win.

Contributor

RickCogley commented Nov 11, 2017

@bep that sounds slick. Given the same file in say static_en/img and static_ja/img which one will "win" if the idea is the "right most" will? Will it be alphabetical, or, last defined in the config.toml?

Also, are the rules different if, say, they have the same filename but one is newer?

I have updated the description with a new section about this. I think this will be very valuable, not just for this particular feature.

On file duplicates, the right-most version will win.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Nov 11, 2017

Member

Given the same file in say static_en/img and static_ja/img which one will "win" if the idea is the "right most" will?

In my head: In the above case, no files in static_ja/img will be visible in the English site and vice versa. That is the foundation of this. So you can have logo.png in both places and it would just work. I don't think it would make sense to mix those two "bags of resources". This becomes even more clear when we start to talk about content.

The "righ most" is for the Japanese site the static_jp/img, for English static_en/img -- with a note to the above: Files from static_jp/img will not be visible to the English site.

Also, are the rules different if, say, they have the same filename but one is new

No.

Member

bep commented Nov 11, 2017

Given the same file in say static_en/img and static_ja/img which one will "win" if the idea is the "right most" will?

In my head: In the above case, no files in static_ja/img will be visible in the English site and vice versa. That is the foundation of this. So you can have logo.png in both places and it would just work. I don't think it would make sense to mix those two "bags of resources". This becomes even more clear when we start to talk about content.

The "righ most" is for the Japanese site the static_jp/img, for English static_en/img -- with a note to the above: Files from static_jp/img will not be visible to the English site.

Also, are the rules different if, say, they have the same filename but one is new

No.

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Nov 11, 2017

Member

@RickCogley considering #2699 it may (at least for the content part) make sense to let both "language folders" be visible to both -- but in the static case the current language will always win on duplicates. Will think.

Member

bep commented Nov 11, 2017

@RickCogley considering #2699 it may (at least for the content part) make sense to let both "language folders" be visible to both -- but in the static case the current language will always win on duplicates. Will think.

bep added a commit to bep/hugo that referenced this issue Nov 11, 2017

bep added a commit to bep/hugo that referenced this issue Nov 11, 2017

Add multilingual multihost support
This commit adds multihost support when more than one language is configured and `baseURL` is set per language.

Updates gohugoio#4027

bep added a commit to bep/hugo that referenced this issue Nov 11, 2017

Add multilingual multihost support
This commit adds multihost support when more than one language is configured and `baseURL` is set per language.

Updates gohugoio#4027
@RickCogley

This comment has been minimized.

Show comment
Hide comment
@RickCogley

RickCogley Nov 12, 2017

Contributor

@bep If you set:

 staticDir = ["static1", "static2"]

... do you also have to add the language static dirs like static_no into that?

 staticDir = ["static1", "static2", "static_no"]

Or are those set under the language blocks only?

In a multilingual site, in my experience you have images that are language specific but, you also have images that are common to both. So I suppose, common ones go into "global static" and language specific go into "language specific" correct?

Contributor

RickCogley commented Nov 12, 2017

@bep If you set:

 staticDir = ["static1", "static2"]

... do you also have to add the language static dirs like static_no into that?

 staticDir = ["static1", "static2", "static_no"]

Or are those set under the language blocks only?

In a multilingual site, in my experience you have images that are language specific but, you also have images that are common to both. So I suppose, common ones go into "global static" and language specific go into "language specific" correct?

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Nov 12, 2017

Member

Or are those set under the language blocks only?

Yes.

So I suppose, common ones go into "global static" and language specific go into "language specific" correct?

Yes. The use case for having a "static_no" is typically for assets that are different between the two languages: Logo (text in the particular language), maybe also CSS files per language, whatever.

Member

bep commented Nov 12, 2017

Or are those set under the language blocks only?

Yes.

So I suppose, common ones go into "global static" and language specific go into "language specific" correct?

Yes. The use case for having a "static_no" is typically for assets that are different between the two languages: Logo (text in the particular language), maybe also CSS files per language, whatever.

@RickCogley

This comment has been minimized.

Show comment
Hide comment
@RickCogley

RickCogley Nov 12, 2017

Contributor

Ok, it seems clean / good to me.
Someone will probably stick their language statics in there, though! :-)

Contributor

RickCogley commented Nov 12, 2017

Ok, it seems clean / good to me.
Someone will probably stick their language statics in there, though! :-)

@RickCogley

This comment has been minimized.

Show comment
Hide comment
@RickCogley

RickCogley Nov 12, 2017

Contributor
Contributor

RickCogley commented Nov 12, 2017

@bep

This comment has been minimized.

Show comment
Hide comment
@bep

bep Nov 13, 2017

Member

How about staticLangDir?

I have slept on this, and I think I have figured out a good way to differentiate override these static dirs vs add these static dirs.

It will behave conceptually a little different when running in multihost mode vs regular, but I think it should be logical for most people.

If you need more staticDir properties, add an ID as suffix. An ID is a integer between 1 and 10.

So:

staticDir = ["static1", "static2"]
[languages]
[languages.no]
staticDir = ["static1"]
staticDir2 = "static_no"
baseURL = "https://example.no"
languageName = "Norsk"
weight = 1
title = "På norsk"

[languages.en]
staticDir2 = "static_en"
baseURL = "https://example.com"
languageName = "English"
weight = 2
title = "In English"

The above shows a mix of override and additions.

For no: "static1", "static_no"
For en: "static1", "static2", "static_en"

In both of the above, the right-most directory will win on duplicates.

Member

bep commented Nov 13, 2017

How about staticLangDir?

I have slept on this, and I think I have figured out a good way to differentiate override these static dirs vs add these static dirs.

It will behave conceptually a little different when running in multihost mode vs regular, but I think it should be logical for most people.

If you need more staticDir properties, add an ID as suffix. An ID is a integer between 1 and 10.

So:

staticDir = ["static1", "static2"]
[languages]
[languages.no]
staticDir = ["static1"]
staticDir2 = "static_no"
baseURL = "https://example.no"
languageName = "Norsk"
weight = 1
title = "På norsk"

[languages.en]
staticDir2 = "static_en"
baseURL = "https://example.com"
languageName = "English"
weight = 2
title = "In English"

The above shows a mix of override and additions.

For no: "static1", "static_no"
For en: "static1", "static2", "static_en"

In both of the above, the right-most directory will win on duplicates.

@RickCogley

This comment has been minimized.

Show comment
Hide comment
@RickCogley

RickCogley Nov 13, 2017

Contributor

Oh, smart!

  • Override a global staticDir property by specifying the same property name as the global, under the language.
  • Give a language a unique static directory by specifying a unique property name under the language block.
Contributor

RickCogley commented Nov 13, 2017

Oh, smart!

  • Override a global staticDir property by specifying the same property name as the global, under the language.
  • Give a language a unique static directory by specifying a unique property name under the language block.

bep added a commit to bep/hugo that referenced this issue Nov 16, 2017

Add multilingual multihost support
This commit adds multihost support when more than one language is configured and `baseURL` is set per language.

Updates gohugoio#4027

bep added a commit to bep/hugo that referenced this issue Nov 16, 2017

Add multilingual multihost support
This commit adds multihost support when more than one language is configured and `baseURL` is set per language.

Updates gohugoio#4027

bep added a commit to bep/hugo that referenced this issue Nov 16, 2017

Add multilingual multihost support
This commit adds multihost support when more than one language is configured and `baseURL` is set per language.

Updates gohugoio#4027

bep added a commit to bep/hugo that referenced this issue Nov 17, 2017

Add support for multiple staticDirs
This commit adds support for multiple statDirs both on the global and language level.

A simple `config.toml` example:

```bash
staticDir = ["static1", "static2"]
[languages]
[languages.no]
staticDir = ["staticDir_override", "static_no"]
baseURL = "https://example.no"
languageName = "Norsk"
weight = 1
title = "På norsk"

[languages.en]
staticDir2 = "static_en"
baseURL = "https://example.com"
languageName = "English"
weight = 2
title = "In English"
```

In the above, with no theme used:

the English site will get its static files as a union of "static1", "static2" and "static_en". On file duplicates, the right-most version will win.
the Norwegian site will get its static files as a union of "staticDir_override" and "static_no".

This commit also concludes the Multihost support in gohugoio#4027.

Fixes gohugoio#36
Closes gohugoio#4027

bep added a commit that referenced this issue Nov 17, 2017

Add multilingual multihost support
This commit adds multihost support when more than one language is configured and `baseURL` is set per language.

Updates #4027

@bep bep closed this in 60dfb9a Nov 17, 2017

@biodranik

This comment has been minimized.

Show comment
Hide comment
@biodranik

biodranik Nov 21, 2017

Contributor

@bep Is it possible to add custom Google Analytics ID for each language domain? And if not, when it can be implemented?

Contributor

biodranik commented Nov 21, 2017

@bep Is it possible to add custom Google Analytics ID for each language domain? And if not, when it can be implemented?

@hanzei

This comment has been minimized.

Show comment
Hide comment
@hanzei

hanzei Jun 28, 2018

@biodranik you can set site params per language

hanzei commented Jun 28, 2018

@biodranik you can set site params per language

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment