Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting transliterations #2178

Open
jensscherbl opened this issue Aug 25, 2014 · 24 comments
Open

Revisiting transliterations #2178

jensscherbl opened this issue Aug 25, 2014 · 24 comments

Comments

@jensscherbl
Copy link
Member

I wanted to bring this topic up a while ago but forgot about it.

The core function for creating handles uses transliterations. Transliterations can be
overwritten per language by extensions.

If I'm not mistaken, what transliterations are used to create a handle always depends on what language your backend currently uses.

Is this correct?

I think that's a problem for multilingual websites and in cases where the backend language doesn't match the frontend language.

Symphony usually doesn't assume anything regarding your frontend, so this seems a bit weird.

Any ideas?

@nitriques
Copy link
Member

Translation has always been a hot topic!

I think we can assume that handle are:

  1. Ascii representation of any utf-8 value
  2. Usable as a url

Which makes the problem easier to solve.

But I would agree that Symphony could provide extendability on this particular thing, since it may be the will of a developer to change it.

For that, we would need to figure out a correct way to "distribute" the work. Right now, I do not know any delegates that needs a return a value when it gets triggered and this would look weird.

Also, should a handle change it's value if the current language of the author that hits the save button is not the same as the last author ? I would not want my urls to change unless the actual value of the field changes.

@jensscherbl
Copy link
Member Author

Also, should a handle change it's value if the current language of the author that hits the save button is not the same as the last author ? I would not want my urls to change unless the actual value of the field changes.

Not sure if we're on the same side here. Handles are used in the frontend, so they shouldn't reflect or depend on any authors backend language at all imo.

@nitriques
Copy link
Member

Not sure if we're on the same side here. Handles are used in the frontend, so they shouldn't reflect or depend on any authors backend language at all imo.

Yes that's totally right.

I was referring to your comment:

I think that's a problem for multilingual websites and in cases where the backend language doesn't match the frontend language.

Handles must not care what is the current backend language. It must be deterministic.

So extendability is hard since is must preserve the deterministic nature of this.

@brendo
Copy link
Member

brendo commented Oct 13, 2014

Just to reply on this. Yes. At the moment if you have one Author in Chinese who creates an entry, the handle will be created with Chinese transliterations. If you then get an Author in German edit the same entry, the handle will use German transliterations.

This may result in some differences and the handle changing. This also occurs for all resources in the backend, datasources, events, field labels etc.

So how can we solve this? I have no idea to be honest.

We can't prevent Authors of difference languages editing each other's content. I think it would be strange to always just enforce English transliterations. Can we get by without using them at all for handles? What would be the impact?

Is this actually a problem that needs solving? Has anyone actually experienced an issue with two different languages resulting in the handle changing and breaking the site?

@nilshoerrmann
Copy link
Contributor

So how can we solve this? I have no idea to be honest.

The system language (not the author's language) should be used as standard.
So the system language should match the frontend language. If you need (or would like to have) a different language for a user, this can be changed on a per user base (author setting).

Is this actually a problem that needs solving?

Yes. We normally use an English backend for development (personal habit) but our clients use the German local. We've often run into the problem of differing handles.

@jensscherbl
Copy link
Member Author

We can't prevent Authors of difference languages editing each other's content.

That's one problem, but not the only one.

One author editing content for multiple languages (multilingual website) is a problem as well. Just because an author uses the backend in one language, we can't assume that the content is in the same language, or that all content is only in one language.

Is this actually a problem that needs solving?

I think so.

Has anyone actually experienced an issue with two different languages resulting in the handle changing and breaking the site?

As mentioned above, this is only one of the issues. The more common problem is with multilingual sites.

http://www.getsymphony.com/discuss/thread/107626/

I think it would be strange to always just enforce English transliterations.

Indeed, this wouldn't solve the problem at all.

Can we get by without using them at all for handles?

I think it's ok for URLs (and even domains) to have unicode characters in them, so we should only strip out (instead of replacing with language specific terms) characters that are not allowed in URLs (like spaces).

The system language (not the author's language) should be used as standard. So the system language should match the frontend language.

Again, what if the frontend is multilingual?

@nilshoerrmann
Copy link
Contributor

Again, what if the frontend is multilingual?

Ah, right, sorry. A two-step idea:

  • Make the core handle creation accept a language parameter that overrides the system language.
  • Create a multilingual input field that stores multiple handles and switches based on the frontend context.

@jensscherbl
Copy link
Member Author

Create a multilingual input field that stores multiple handles and switches based on the frontend context.

Not necessary to store multiple handles and switch based on frontend context. A field only holds content in one language (right?), so the handle for a field only needs to be in one language as well.

Except you're planning to make all core fields multilingual and store content for each field in different languages (like the Multilingual xxx Field-extensions).

Make the core handle creation accept a language parameter that overrides the system language.

Could work, but you'd need to set the language for every field in the section editor.

To be clear, your proposed solution would work as well and setting up sections could remain as it is now, since handles for different languages would be created in the background automatically.

But I think it's a somewhat "dirty" solution, since you're cluttering up the database with lots of unnecessary handles when only one handle is needed.

Also you'd have to take care of datasource filtering etc.

Wouldn't it be much simpler to not replace characters with language specific terms at all?

@nilshoerrmann
Copy link
Contributor

Wouldn't it be much simpler to not replace characters with language specific terms at all?

Can you elaborate, please?

@nilshoerrmann
Copy link
Contributor

What about adding a language selector to all core fields that create handles?
This feature could be hidden behind a config flag (defaulting to no localisation, our status quo).

@nilshoerrmann
Copy link
Contributor

PS: There could also be an additional setting "Handle localisation" in the system preferences right under the language selector providing three options:

  • system language
  • author language
  • field-based language selection

(The latter providing the aforementioned language selector in the field settings.)

@jensscherbl
Copy link
Member Author

What about adding a language selector to all core fields that create handles?
This feature could be hidden behind a config flag (defaulting to no localisation, our status quo).

PS: There could also be an additional setting "Handle localisation" in the system preferences right under the language selector providing three options:

Sounds complicated and possibly introduces more new issues than it resolves...

Can you elaborate, please?

Simple. If we don't use transliterations and don't replace certain characters like &with language specific terms like and, we don't have a language problem.

HTML5 supports IRIs and IRIs are capable of handling unicode characters, so do we actually still need transliterations? Just strip out reserved characters and everything's fine.

Only thing to keep in mind is to urldecode parameters when filtering datasources.

@nilshoerrmann
Copy link
Contributor

So you'd like to keep umlauts in the URL?

HTML5 supports IRIs and IRIs are capable of handling unicode characters, so do we actually still need transliterations?

I think browser support is not an issue. The actual problem is an audience without these characters on the keyboard. If I think of German umlauts, they'd be an accessibility problem outside German speaking countries.

@jensscherbl
Copy link
Member Author

The actual problem is an audience without these characters on the keyboard. If I think of German umlauts, they'd be an accessibility problem outside German speaking countries.

Mhh. Good point.

Would it be possible to have generic (not language specific) transliterations?

Also, are we talking about transliteration or transcription here? Since transliterations still have special characters, I think it's actually the latter and we're not using the term correctly.

Edit

On the other hand, does every country has ASCII characters on their keyboards?

Update

Apparently they do.

All non-Latin computer keyboard layouts can also input Latin letters as well as the script of the language, for example, when typing in URLs or names. This may be done through a special key on the keyboard devoted to this task, or through some special combination of keys, or through software programs that do not interact with the keyboard much.

@jensscherbl
Copy link
Member Author

The actual problem is an audience without these characters on the keyboard. If I think of German umlauts, they'd be an accessibility problem outside German speaking countries.

Thinking more about it, what about internationalized domain names? Same accessibility issues, I guess? And if this wasn't a concern for domain names (although I'm curious why not), is it really on us to worry about it for URL handles?

@michael-e
Copy link
Member

Same accessibility issues, I guess?

Sure. How should an Englishman ever type "blödsinn.de" on his Keyboard?

@jensscherbl
Copy link
Member Author

How should an Englishman ever type "blödsinn.de" on his Keyboard?

http://xn--bldsinn-b1a.de ;)

@michael-e
Copy link
Member

Oh, I would love to tell him that on the phone, proudly presenting my new website. :-)

@jensscherbl
Copy link
Member Author

As I said, I'm wondering why this wasn't a concern for domain names.

I mean, you can always argue to only use a german domain for a german audience. Same for handles. If a URL has german umlauts in it, chances are that it's not the URL you're looking for in the first place if you don't have german umlauts on your keyboard.

But what about the edge cases? What about people living abroad, for example?

@nilshoerrmann
Copy link
Contributor

There is a good reason why international domain names are so successful ;)

I don't know of any good example of a German organisation using umlauts in their domain besides redirecting to the transliterated equivalent. While umlauts works fine inside German speaking countries, you kind of exclude people from other countries (Michael's simple example proofs that) – and the internet is global, not local.

What would you do, if you needed a cedilla for a French domain or link (which is what we are talking about here)? Difficult on a non-french keyboard.

Transliteration are a common solution to this problem – in written form generally, not on the web exclusively.

@nilshoerrmann
Copy link
Contributor

PS: By the way, even if a user was able to type umlauts, he would have to know them to correctly memorise a link. Think of the German eszett: people from other countries tend to think of it as "a strange b" – which is why letters from abroad are often addressed to "Beispielstrabe" instead of "Beispielstraße". So I really think umlauts in domain names or links are a bad idea.

@jensscherbl
Copy link
Member Author

Transliteration are a common solution to this problem – in written form generally, not on the web exclusively.

Agreed, so let's get back to solving the problem. Keep in mind, I'm not trying to defend a particular idea, only brainstorming here.

Three possible solutions, imo.

  • Build true multilingual support right into the core (what you proposed).
  • Don't create language specific handles (what I proposed).
  • Don't automatically create handles at all (explained below).

Don't create language specific handles

Would it be possible to have generic (not language specific) transliterations?

Also, are we talking about transliteration or transcription here? Since transliterations still have special characters, I think it's actually the latter and we're not using the term correctly.

Don't automatically create handles at all

Developers could add a dedicated handle field to a section where handles are needed. The handle field would be a core input field. The core input field would get reflection and text formatter capabilities, so the handle field could grab content from another field, and transliterations (even language specific) could be added as text formatters. If an author is unhappy with the result, the handle could be edited.

Also, reflection and text formatter capabilities for the core input field would benefit other uses cases as well.

Only thing I don't know how to handle yet is how to create handles for structural settings (section names, field names, data sources, events). I think always using english transliterations in the backend would be fine, though.

@nitriques
Copy link
Member

I am all for

Build true multilingual support right into the core

@andrewminton
Copy link
Contributor

+1 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants