[2.2] [Translation] Po/Mo file readers do not parse plurization rules #2630

Closed
ghost opened this Issue Nov 12, 2011 · 31 comments
@ghost

I've been experimenting with the new PO/MO file readers that @stealth35 added recently. There is one very unfortunate thing missing from this however, is that they don't parse the plurization rules. In Gettext, these are part of the header so the framework doesn't need to know hard coded values as is required in Symfony2. In order for these readers to be useful, they really need to parse this information. It will require an extra property in the MessageCatalog class I guess.

@stealth35

@drak I just add the Dumpers not Loaders, The loaders has very poors, but I work on it, see #2412

@ghost

bump :-)

@fabpot fabpot added a commit that referenced this issue Jan 4, 2012
@fabpot fabpot merged branch stealth35/fix_2630 (PR #3023)
Commits
-------

f4890c2 [Translation] Po/MoFileLoader parse plurization rules

Discussion
----------

[Translation] Po/MoFileLoader parse plurization rules

Bug fix: yes
Feature addition: no
Backwards compatibility break: no
Symfony2 tests pass: [![Build Status](https://secure.travis-ci.org/stealth35/symfony.png?branch=fix_2630)](http://travis-ci.org/stealth35/symfony)Fixes the following tickets: #2630
Todo: Not happy with the pluralize style

```
msgid "foo"
msgid_plural "foos"
msgstr[0] "bar"
msgstr[1] "bars"
```
to

```
array (
    'foo' => 'bar',
    'foos' => '{0} bar|{1} bars'
)
```
54a8b58
@ghost

@fabpot @stealth35 I am not sure what the associated PR does, but I am not sure it address what I talked about, that is, that .PO and .MO files have a header which gives the formula of the language puralisation. e.g.

https://github.com/zikula-communities/german/blob/master/locale/de/LC_MESSAGES/zikula.po#L14

"Plural-Forms: nplurals=2; plural=(n != 1);\n"

This formula is required to understand whether to use singular or plural rules and also which plural translation to use (since there can be multiple). The formula needs to be parsed somehow, e.g.

https://github.com/zikula/core/blob/master/src/lib/i18n/ZMO.php#L487 although this implementation cannot be used since it's polluted by the GPL.

Parsing the plural formula decouples the Translation component from having to know about pluralisation rules entirely (for gettext).

@mvrhov

drak, but other translation formats don't support this so afterall this has to be hard coded into the symfony. Hell even the native Delphi gettext mo file reader implementation I'm using in my Delphi apps has this information hard coded.

@ghost

@mvrhov For other formats yes, but for the Gettext implementation, it should read the headers because that is part of the Gettext standard - we can't half implement a standard - I don't think being half-baked here does Symfony2 any favours.

There are dozens of pluralization rules and hundred and hundreds of locales - there is simply no way to be complete and .PO and .MO must have a pluralisation header, so it must be used. Symfony2 has a list of pluralisation rules for the other formats because I assume that makes up for the lack of pluralisation standards in those formats, but Gettext is the most mature and the most widely implemented standard of them all so implementing it fully is important.

@stealth35

@drak I got you, left open your issue (I waited for you for the #3023 TODO, but @fabpot merge)

@ghost

@stealth35 - ah, I see. You need to keep [WIP] in the pull request title to tell @fabpot it's not ready for merge, but no problem, we wait for your next PR, I was just worried :)

Thanks!

@ghost

@stealth35 - hey, any update on this PR?

@stealth35

hi @drak, it's an hard thing, the easy way it's to use eval, but I see a security hole here, so actually I don't know how to handle this

@ghost

@stealth35 I am not sure eval is a security hole since the .mo file should not be writable and frankly if someone has access to the FS, you have more to worry about than injecting a header to a mo file. For this reason I'm not sure we really need to lex the nplurals rule.

@sstok

Using eval() is only a problem when its done in production (as it may been disabled).
About the security concern, is there a way to-do some validation (like only allowing normal charters for example)?

@stof
Symfony member

@drak what is still missing here ?

@ghost

@stof - Gettext .mo files have the pluralisation formula in the header. It must be read by the mo reader and used when making plural calculations. If @stealth35 has no time for this, I can take up the task no problem.

@fabpot
Symfony member

Anyone working on this? That would be better if we can ship that with Symfony 2.1.

@jmccaffrey42

CakePHP has this functionality I believe, I think it would be trivial to import it from there. Would anyone object to me doing that and creating a pull request?

@jmccaffrey42

It seems the way they implemented it is incredibly simple, yet because the landscape of languages is finite, pretty effective I would immagine.

https://github.com/cakephp/cakephp/blob/2.1/lib/Cake/I18n/I18n.php#L254

@sstok

Well CakePHP is released under MIT so thats great news! And it does not use eval() 👍

@ghost

I the cake code is missing a case for nplurals=6 (Arabic).

I don't really like that it's guessing - the plural rule is precise: but if it means a quick solution for this ticket, then OK.

I'd like to reiterate that inline evaluation is perfectly ok in this case and makes plurals 100% accurate - there is no security vector that wouldn't exist anyway if the filesystem was writable and the Plural-Forms: header could and should be validated via regex anyway once at parse-time.

So long as any implementation is clean-room, we are OK because the PHP Gettext authors are extremely difficult people to deal with (may people have tried to get them to change the license to LGPL but they refused time and again).

For reference this is a list of rules that need to be covered: http://translate.sourceforge.net/wiki/l10n/pluralforms - you a see it's pretty complex and without real lexing, inline evaluation is really the only way to accurately do this.

@jmccaffrey42

Thanks, Ill go and mess around with the eval path and see what I come up with. That website is great BTW, we can build tests around all those cases to ensure we are supporting the right stuff.

@ghost Unknown referenced this issue May 14, 2012
Closed

Drupal gettext #4235

@clemens-tolboom

In #4249 I added support for using Gettext Headers as just strings.

In the mentioned #3023 by @stealth35 I don't see anything about parsing headers ... the implemented plural parsing is using interval and not indexed plural. (I hope I understood that correctly).

I fixed plural loading/dumping in #4249.

@blaugueux

Any news about this PR for 2.1 ?

@fabpot
Symfony member

@drak any update?

@stof
Symfony member

@drak ping

@ghost
@blaugueux

@drak So the PO/MO translation system shall not be used before 2.2 ?

@ghost

@blaugueux - Gettext implementation is usable, but understanding that the current implementations do not respect the pluralisation headers or support contexts. Plurals are taken from a hard coded list. I'll be making PRs for 2.2 soon.

@mmucklo mmucklo pushed a commit that referenced this issue May 23, 2013
@fabpot fabpot merged branch stealth35/fix_2630 (PR #3023)
Commits
-------

f4890c2 [Translation] Po/MoFileLoader parse plurization rules

Discussion
----------

[Translation] Po/MoFileLoader parse plurization rules

Bug fix: yes
Feature addition: no
Backwards compatibility break: no
Symfony2 tests pass: [![Build Status](https://secure.travis-ci.org/stealth35/symfony.png?branch=fix_2630)](http://travis-ci.org/stealth35/symfony)Fixes the following tickets: #2630
Todo: Not happy with the pluralize style

```
msgid "foo"
msgid_plural "foos"
msgstr[0] "bar"
msgstr[1] "bars"
```
to

```
array (
    'foo' => 'bar',
    'foos' => '{0} bar|{1} bars'
)
```
c93bea6
@stof
Symfony member

@drak do you still plan to work on it sometimes ?

@jiru

If that’s any interest to you, I implemented a Gettext pluralization rules parser as part of a PR to CakePHP: cakephp/cakephp#5348. They ended up keeping hardcoded rules though. You can reuse my code freely if you want.

@javiereguiluz
Symfony member

@aitboudad do you think that this feature is needed and doable without complicating code much? Or do you think we could close it as "won't fix"? Thanks.

@aitboudad
Symfony member

@javiereguiluz Well I think this require many changes, the way we handle pluralization is a bit different than this one which need to define the pluralization rules per files so I'm ok to mark it as wont fix

@javiereguiluz
Symfony member

Closing it as "won't fix" as per @aitboudad's comments. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment