Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help for translators with interpolation #3939

Open
gravitystorm opened this issue Feb 23, 2023 · 10 comments
Open

Help for translators with interpolation #3939

gravitystorm opened this issue Feb 23, 2023 · 10 comments
Labels
i18n Internationalisation - related to translation into different languages

Comments

@gravitystorm
Copy link
Collaborator

In a commit comment, @Nikerabbit writes:

These changes are really difficult for translators because they are create patchwork messages. Especially without accompanying message documentation.

More docs:

https://translatewiki.net/wiki/Translating:Localisation_for_developers#Message_documentation
https://www.mediawiki.org/wiki/Help:System_message#Avoid_fragmented_or_'patchwork'_messages

To be fair, we have the same issue in MediaWiki. Often it can be addressed by using wikitext, but some cases that is not possible. Best would be inline syntax that handles context-dependent escaping. For example:

messages:
signup_message: Please ${elem|sign up}

code using fake PHP syntax as I am not familiar with Ruby:

t( 'signup_message', function ( $label ) { return makeButton( 'someurl', htmlspecialchars( $label ) ); } );

But in the meantime, I hope message doc can be added cross-linking the translatable messages to each other.

I'm opening this issue to explore what we can do to make it easier for our translators to handle translations with interpolations. The most common usage of the interpolations are for links, where the link text is used by a link helper and then interpolated into the rest of the sentence, for example:

en:
  diary_entries:
    show:
      login_to_leave_a_comment_html: '%{login_link} to leave a comment'
      login: Login
t(".login_to_leave_a_comment_html", :login_link => link_to(t(".login"), login_path(:referer => request.fullpath)))

I'm currently working on a series of PRs to remove all the raw html from the translations, and so this interpolation approach is also being used for html within sentences (bold, emphasis etc) like:

en:
  site:
    welcome:
      whats_on_the_map:
        off_the_map_html: |
          What it %{doesnt} include is opinionated data like ratings, historical or
          hypothetical features, and data from copyrighted sources. Unless you have special
          permission, don't copy from online or paper maps.
        doesnt: doesn't
t ".whats_on_the_map.off_the_map_html", :doesnt => tag.em(t(".whats_on_the_map.doesnt"))
@gravitystorm gravitystorm added the i18n Internationalisation - related to translation into different languages label Feb 23, 2023
@Nikerabbit
Copy link
Contributor

In terms of severity, this is not the highest. With this approach translators can still control word order to create grammatical sentences (as opposed to hard-coded string concatenation). The challenge is to have them understand how to do it. Like I said, message docs can help to some extent: we can make links to the connected messages and show their current translation.

@tomhughes
Copy link
Member

I didn't really understand @Nikerabbit's code, not because I don't know PHP (though I don't) but because it's not clear what the t function does with the callback - when does it call it? What does it pass as the argument?

As far as I know they rails i18n stuff doesn't have anything like that anyway.

@gravitystorm
Copy link
Collaborator Author

I think @Nikerabbit was trying to illustrate an approach something like this: https://github.com/iGEL/it

@tomhughes
Copy link
Member

That looks nice - shame it seems to be a bit dormant.

@lonvia
Copy link
Contributor

lonvia commented Feb 23, 2023

This approach could cause some headache for translators. Using another sentence from the new bold versions:

 en:
  site:
    welcome:
      whats_on_the_map:
        on_the_map_html: |
          OpenStreetMap is a place for mapping things that are both %{real_and_current}. ...
        real_and_current: real and current

The German translation would go:

 en:
  site:
    welcome:
      whats_on_the_map:
        on_the_map_html: |
          OpenStreetMap ist ein Ort, um Dinge zu erfassen, die sowohl %{real_and_current} sind ....
        real_and_current: echt als auch aktuell

But that's not really what you would want. You would want: OpenStreetMap ist ein Ort, um Dinge zu mappen die sowohl echt als auch aktuell sind. sowohl ... als auch .. is a standing expression here. In the German version the bold section would ideally be split. In addition, it might make a difference if the interpolated thing is a link something emphasized when it comes to translations. (e.g. word order in German can be changed to emphasize). So it would be good to know that.

Have you considered using a simplified markdown? Most people are familiar with the **...** notation nowadays and can follow it. You can also use a simplified version for links: [Login](login_link) to leave a comment.

(NB: The English texts are are really difficult to translate as is, because they use unusual and complicated sentence structures. Rewording into simpler English might solve some of the problems as well.)

@gravitystorm
Copy link
Collaborator Author

@lonvia Thanks for the example and the perspective. We could look into introducing inline markdown, which moves away from the zero-markup approach I was aiming for, but doesn't open as many headaches as using full html markup.

I had a brief look to see whether there are any implications for non-Latin languages, and found an edge case (with CJK, punctuation and needing spaces after ** markers) so we'll need to check if there are other limitations in the commonmark specification that might impact specific languages.

@gravitystorm
Copy link
Collaborator Author

@Nikerabbit I'm not super expert with the translatewiki interface, so apologies if this is already implemented. Is there some way that, when translating a specific string, you can review the other strings in the same key hierarchy? For example, when translating en.site.welcome.whats_on_the_map.real_and_current you can check which other strings are in the en.site.welcome.whats_on_the_map prefix?

Alternatively, is there any way we can indicate in the en.yml file that two keys are related, and have that show up in the translatewiki interface?

I want to make sure that, for example, if someone gets the real_and_current fragment, that it's easy for them to find the full sentence that it will be included in.

@Nikerabbit
Copy link
Contributor

@maro-21
Copy link

maro-21 commented Apr 5, 2023

What's wrong with HTML tags in messages? Better than a single sentence divided into several fragments. Indecipherable by some translators, and single words translated without context.

@gravitystorm
Copy link
Collaborator Author

What's wrong with HTML tags in messages? Better than a single sentence divided into several fragments. Indecipherable by some translators, and single words translated without context.

Yes, I'm totally aware that dividing these sentences up makes them harder to translate, and I'm sorry that this is the case. That's why I opened this issue. However, there are some fundamental problems, including security implications, with allowing html in the translations. I can explain in more detail at some point in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n Internationalisation - related to translation into different languages
Projects
None yet
Development

No branches or pull requests

5 participants