Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for internationalization #825

Merged
merged 13 commits into from
Nov 17, 2021
Merged

Fixes for internationalization #825

merged 13 commits into from
Nov 17, 2021

Conversation

gcampax
Copy link
Contributor

@gcampax gcampax commented Oct 25, 2021

This is on top of #823

It changes how we extract translations from manifests and how we merge them back. The changes are necessary to actually be able to translate correctly and with proper grammar.

It also implements canonical forms of enum values, which was a missing piece for translation.

@gcampax gcampax added bug Something isn't working i18n Translation / internationalization issues cleanup Clean ups, refactorings, and code improvements that don't fall under other categories labels Oct 25, 2021

export interface TranslatableString {
key : string;
context ?: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of context?
Is it to distinguish between similar inputs that need to be translated differently depending on the function/ domain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. gettext either takes a single string, or a pair of string and context. documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ik I'm trying to understand where we need context. Do you have an example in "it.po" where msgid is the same but msgstr is different?

}
}

function makeChoice(choices : Replaceable[]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this to handle multiple canonical annotations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we fold everything into a single string, so translators can add/remove variants.

msgid "{them [plural=other]|it [plural=one]}"
msgstr ""
"{loro [plural=other]|lui [plural=one,gender=m]|lei [plural=one,gender=f]}"

#: lib/utils/thingtalk/describe.ts:258 lib/utils/thingtalk/describe.ts:517
#: lib/utils/thingtalk/describe.ts:257 lib/utils/thingtalk/describe.ts:516
msgid "the ${name} [plural=name[plural]]"
msgstr ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what's happening here. Are you adding all possible variations with different gender/ plural forms? Are these two the only attributes that changes the output? How about weird casing of words in German that changes depending on their position in the sentence?
Also, a general question: does PO allows multiple translations for a single input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're supposed to add all the variations. In Italian gender/number are the only variations for nouns. In German, IIRC, case only matters for articles, but yeah you would have that too. German will need to disable lower casing because nouns are capitalized, but other than that it should work the same way.

PO as a file format specifies one translation per translatable string. In the way we use PO though, the translation has a specific format (genie template strings) and we can have multiple variants.

Translating canonical forms one by one doesn't make sense. Instead,
collapse all canonical forms for a given parameter/function into
a single translatable string, so translators can add/remove phrases
as they need.
Factor out the code to walk annotations into a common module,
and use (p)gettext consistently to merge the annotations. This
ensures that the translations are actually compatible with the
tools that process PO files.
Using the "enum_value" or "value_enum" keys of the #_[canonical]
annotation of a parameter that uses the type.

Because enums are used interchangeably on all parameters with the
same type, the same canonical must be used on all parameters,
and it is unspecified which one will be used otherwise.
"filter" is a keyword in ThingTalk so we can't actually write
`#_[canonical={ filter= ... }]`
Normalize locale from gettext (POSIX) to BGP 47 format
So we can use it from thingpedia-common-devices
@gcampax
Copy link
Contributor Author

gcampax commented Oct 27, 2021

Btw, one problem with this PR that I noticed from the automatic translation tests is that the automatic translation tries to translate the flags as well. Can we handle that in a smarter way somehow? @Mehrad0711

@Mehrad0711
Copy link
Member

What do you mean by flags?

@gcampax
Copy link
Contributor Author

gcampax commented Oct 27, 2021

The part in the square brackets sets different flags on each phrase, for pos, plural, gender, etc.

@Mehrad0711
Copy link
Member

But we're already doing that:

r"\[plural=.+?(?:\[plural\])?\]" # [plural=xxx]

In previous po files I didn't see "gender". There were a few "pos" but those are the lines we don't translate. If your changes are adding lines with new constructs or flags we should just include it in that regex.

@gcampax
Copy link
Contributor Author

gcampax commented Oct 27, 2021

pos is definitely one, and it's definitely going in a lot of strings that we do need to translate. But we should make this regex generic to any flag.

@Mehrad0711
Copy link
Member

Yeah feel free to fix it and check the rest. There might be other constructs I'm unaware of that we miss there.

@gcampax gcampax merged commit 312bb63 into master Nov 17, 2021
@gcampax gcampax deleted the wip/i18n-fixes branch November 17, 2021 04:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cleanup Clean ups, refactorings, and code improvements that don't fall under other categories i18n Translation / internationalization issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants