Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[i18n, feat] Add basic context (msgctxt) support #5234

Merged
merged 7 commits into from Aug 21, 2019

Conversation

@Frenzie
Copy link
Member

commented Aug 21, 2019

References #5232

Given an entry in the PO file like the following:

msgctxt "systemstat"
msgid "    Total"
msgstr "Totaal"

It can be addressed using:

_("systemstat", "    Total")

This allows to distinguish between separate instances of the same string, for example "Pages" meaning "Number of pages" and "Pages" meaning "Display of pages".

Extraction of this code pattern is not yet implemented by nightswatcher. xgettext didn't yet support Lua back in 2013 when all this was first added to the program, but now it does. Therefore it might make the most sense to replace the current Python extraction script with xgettext itself.

By default it only understands gettext.dgettext(), but that can be addressed by passing some extra command line arguments, for example:

xgettext -l lua -c --keyword=_:1c,2 *.lua
Frenzie added 2 commits Aug 21, 2019
[i18n, feat] Add basic dgettext (msgctxt) support
References #5232

Given an entry in the PO file like the following:

```
msgctxt "systemstat"
msgid "    Total"
msgstr "Totaal"
```

It can be addressed using:

```lua
_("systemstat", "    Total")
```

This allows to distinguish between separate instances of the same string, for example "Pages" meaning "Number of pages" and "Pages" meaning "Display of pages".

Extraction of this code pattern is not yet implemented by nightswatcher. xgettext didn't yet support Lua back in 2013 when all this was first added to the program, but now it does. Therefore it might make the most sense to replace the current Python extraction script with xgettext itself.

By default it only understands gettext.dgettext(), but that can be addressed by passing some extra command line arguments, for example:

```
xgettext -l lua -c --keyword=_:1c,2 *.lua
```

@Frenzie Frenzie added the enhancement label Aug 21, 2019

@Frenzie Frenzie added this to the 2019.09 milestone Aug 21, 2019

@Frenzie Frenzie requested review from NiLuJe, pazos and poire-z Aug 21, 2019

@poire-z

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

I know quite nothing about how gettext and our translation works, how we generate the po, etc...
So I don't know if these fields names and order are some gettext requirements or not.
Just wondering, if for readability and writability, it would be better to have to use:

_("    Total", "systemstat")
_("My english string that I want written here", "that alt context")
_("Pages", "rendermode")
_("Pages", "number")
@Frenzie

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

Yes, the order is demanded by gettext.

https://www.mankier.com/3/gettext

Of course we don't actually use gettext, but doing something custom would be confusing at best. Also we could swap out this pure Lua subset of gettext for the real deal any time we wanted in case we wanted to use the fancier features (i.e., ngettext and friends).

@poire-z

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

OK, that order still looks a bit unintuitive to me :)
But may be better in your code to stick with the gettext term of "domain", rather than "context" then (dunno, but it became a bit, just a bit, more understandable when I read "domain" on that manpage...)

Hope it will be just to be used for disambiguation, and that we can stick with our classic _("stuff") for most things.
How will we know it's already used somewhere else, and that we need to think if we're in the same generic blank domain - or that we need to add one?

@Frenzie

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

Of course you could swap the order of the arguments in Lua but that'd still be confusing. ;-)

The PO file is extracted using this custom Python script: https://github.com/koreader/koreader-misc/blob/cc6a7a0d81a4f59dba237e57be7b259c00f7799d/gettext/lua_xgettext.py

It works fine, but in retrospect adding support for [[]] and such was a bit of a waste of time (if somewhat fun to play around with the regex). And the "upstream" source/inspiration for that script already moved to xgettext back in 2016 or so.

@Frenzie

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

But may be better in your code to stick with the gettext term of "domain", rather than "context" then (dunno, but it became a bit, just a bit, more understandable when I read "domain" on that manpage...)

It uses both. dgettext suggests domain, msgctxt suggests context. ;-) I'll change it.

How will we know it's already used somewhere else, and that we need to think if we're in the same generic blank domain - or that we need to add one?

Something along these lines (but that's too inclusive):

find . -name '*.lua' >POTFILES
xgettext -l lua --keyword=_:1c,2 --from-code=utf-8 --files-from=POTFILES

Then either directly in the generated file or in a program like Poedit, Virtaal, Qt Linguist, etc. you can search for the string to see where it occurs:

Screenshot_20190821_171247

I prefer any of those over Transifex tbh but there's a lower barrier to entrance with Transifex.

@Frenzie

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

By the by, here's what a domain looks like in Virtaal (it's not very clear):

Screenshot_20190821_172726

The way everything is displayed there it's hard to accidentally miss whitespace. I wonder how many translators accidentally did that wrong just like me. :-P

Note how it purposefully and misleadingly strips whitespace in the overview:

Screenshot_20190821_173235

@poire-z

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

So, in regular dev, I'll keep just using _("my wording") and wait for someone to open an issue and say there are wording conflicts in such language and we need to add a domain to one of the occurence, right? :)

@Frenzie

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

Most of the time I assume so, yes. There are some instances of single words that might be suspicious all by themselves though. I've noticed some instances in the past, but in Dutch/French/German they worked just well enough to be slightly awkward and not wrong.

Anyway, the full context for this PR stems from a question on Transifex:

https://www.transifex.com/houqp/koreader/translate/#zh_TW/koreader/110323460

cges30901

Does this mean the number of pages or the layout of pages?

Frenzie

@cges30901 Generally the number, but you're right that this lumping together of all strings that are the same (a leftover from the early days) is no longer a good idea. Because one of those mentioned in context is about page style/layout. I'll open a ticket about it on GitHub.

@Frenzie

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

Hm, I may have gotten domain and context confused in some way. It's not entirely clear to me how to use context in a normal gettext fashion.

glib defines a C_() macro.
https://developer.gnome.org/glib/2.28/glib-I18N.html#C-:CAPS

I seem to be using context effectively correctly, except for the part where I'm relating it to dgettext.

Edit: there seems to be a pgettext().

@Frenzie Frenzie changed the title [i18n, feat] Add basic dgettext (msgctxt) support [i18n, feat] Add basic context (msgctxt) support Aug 21, 2019

Frenzie added 2 commits Aug 21, 2019
Revert "rename context to domain"
This reverts commit 0e0e9bb.
@Frenzie

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

@poire-z Thanks for making me aware of my mistaking context and domain for the same thing.

I've reworked it so that you could do something like this:

local _ = require("gettext")
local C_ = _.pgettext

Or perhaps simply:

local _ = require("gettext")
local pgettext = _.pgettext

We should standardize on one of those, I suppose C_ is a good candidate.

@poire-z
Copy link
Contributor

left a comment

That became quite clearer :)

@poire-z

This comment has been minimized.

Copy link
Contributor

commented Aug 21, 2019

So:

_("Scroll") -- when I don't fear anything
C_("rendering", "Scroll") -- in case I fear Scroll might be used elsewhere
C_("rendering", "Pages")
_C("rendering", "Pages") -- looks nice(r) too

edit: but C_ explictely states the order :) Context, then _translatableString

@Frenzie

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

Indeed. Where C_ is whatever we pass in xgettext --keyword=C_:1c,2, because the only thing it understands without special instructions is the awfully unwieldy gettext.pgettext().

Having it as _() with two arguments would be ideal imo, but the usual implementation of the _() convenience function uses:

1 arg = gettext
2 args = dgettext
3 args = ngettext
4 args = dngettext

@Frenzie

This comment has been minimized.

Copy link
Member Author

commented Aug 21, 2019

Also, reference for contexts: https://www.gnu.org/software/gettext/manual/html_node/Contexts.html

A context naming example from that document:

pgettext ("Menu|", "File")
pgettext ("Menu|", "Printer")
pgettext ("Menu|File|", "Open")
pgettext ("Menu|File|", "New")
pgettext ("Menu|Printer|", "Select")
pgettext ("Menu|Printer|", "Open")
pgettext ("Menu|Printer|", "Connect")

@Frenzie Frenzie merged commit 0067c8f into koreader:master Aug 21, 2019

1 check passed

ci/circleci Your tests passed on CircleCI!
Details

@Frenzie Frenzie deleted the Frenzie:dgettext branch Aug 21, 2019

@Frenzie Frenzie added the i18n label Aug 26, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.