Drupal gettext #4235

Closed
wants to merge 6 commits into
from
@clemens-tolboom

(This patch is mostly intended as a discussion)

This is a contrib back from Drupal to make PO file parsing better and hopefully use the translation component within Drupal into the near future.

It delivers:

Reading PO Headers
Multi-line source / translation

See more @ clemens-tolboom@69b7d25#L1L56

@clemens-tolboom

(This patch is mostly intended as a discussion)

This is a contrib back from Drupal to make PO file parsing better and hopefully use the translation component within Drupal into the near future.

It delivers:

  • Reading PO Headers
  • Multi-line source / translation

It currently has coding style 'problems' as part of the code is copied verbatim from Drupal to make sure those keep in sync for now.

  • this will be fixed within the next commit as that is bogus :p

Problems from a Drupal perspective are:

  • loading large PO files (or slow servers) are handled by a browser oriented batch interface. Can we do that within the Symfony framework?
  • The loaded PO files (or messages) are editable on a Drupal site. I'm not sure how a Symfony workflow could support this right now. This patch has an 'almost verbatim' copy of http://drupal.org/node/1189184 which is a patch in progress.
  • so code style in not ok (but that code needs to be kept in sync with the progress on Drupal patch)

The POLoader produces a parser log of errors for which there is no way to read.

  • Is that only Gettext PO problem or also ie MO and other resources?

It still needs a test for parsing a header successfully.

The header info is not used anywhere within the system.

  • That sounds bad as the header contains credentials about the translation team.
@clemens-tolboom

I'm not sure where to test the multiline parsing.

  • adding a Translator gives Class 'Symfony\Component\Translation\Tests\Loader\Translator' not found.

I want to make sure the texts are joined correctly by the PoFileLoader.

@travisbot

This pull request passes (merged 69b7d25 into e54f4e4).

@stof
Symfony member

@clemens-tolboom this class indeed does not exist. You probably forgot a use statement

@stof
Symfony member

I will let someone else review it. I don't know gettext enough to know if the implementation is right, and my brain's parser gives me too much warnings when reading the code...

@stloyd

@clemens-tolboom First of all you should follow Symfony2 CS.

@Tobion
Symfony member

@stloyd he wrote that the CS is copied from Drupal and will be fixed later. So no need to tell him again... (especially since its intended as RFC)

@clemens-tolboom

@stof : how many symfony people use PO files? Drupal only uses PO so is this worth the effort?!?

@stloyd & @Tobion : I somehow expected travis to kick my CS ass ... Netbeans does not support two PHP CS formats at once afaik :(

Is the problem about batch loading PO file solvable within the symfony framework?

@beberlei

I don't want to crush the party, but this code is GPL licensed.

@fabpot
Symfony member

As @beberlei mentioned, the biggest problem is the license of the code... Symfony being licensed under the MIT license.

@stof
Symfony member

@clemens-tolboom Travis is about running tests, not fixing CS.

@goba

Looks like Symfony already supported multiline string parsing (storing the string value in the buffer when reading the string). So that does not seem to be new with this (mentioned above as "Multi-line source / translation").

@ghost

I would suspect anything ported from Drupal might be hopelessly entangled with the GPL and be uncovertable to another license.

@goba

drak: interesting observation. The code posted above is not yet in Drupal and never was. It is from a work in progress patch from the Drupal issue queues. It is based on code from Drupal 7 but is mostly rewritten from the ground up (I think Clemens could vouch for a better estimate on how much different it is). The Drupal 7 Gettext parser is hopelessly ugly (even though it does work well :).

@stof
Symfony member

@clemens-tolboom how much of this code was already in Drupal previously (and so subject to the GPL) ? For the new code, licensing is not an issue as it can be submitted as MIT

@ghost
@clemens-tolboom

Wow ... thanks for the quick feedback :)

And darn ... you guys are right about license (why didn't I think of that)

Doing a diff on prepared files (locale.inc from Drupal 7 and the PoFileLoader.php) to only check the readLine code the following stats show we have license problems.

Lines that differ ... mostly by having a $this-> in front of state variables
$ diff --suppress-common-lines -F 100 -y -E -b PoFileLoader.php locale-d7.inc | wc -l
137

where total lines counts are
264 PoFileLoader.php
265 locale-d7.inc

Almost 50% are common. Most of the rest has a $this-> and some lines are new related to Batch/state management.

I think I should retract this pull request right?

Sorry for wasting your time :(

@lsmith77

can you figure out the original contributors and ask them for permission?

@goba

This part of the Drupal codebase has a history of about 9 years. It would likely be impossible to track all the people down who wrote it originally and then fixed bugs and added new features to them.

@lsmith77

I see .. in that case there are 2 options:
1) you simply leave this file inside the Drupal code base, but try to make it easy for users to grab standalone
2) you write the code from scratch, while using the original code as inspiration .. this is perfectly legal as we are only talking about copyright and not patents here.

@Crell

We're trying to do option 1 already as part of internal cleanup. Long road, but hopefully. :-) That still keeps it GPL, though.

If clemens is up for option 2, I would have no problem backing us using that in place of the existing code provided it is in fact technically better than what we have now. We'd have to work out how we deal with "Drupal-created 3rd party dependencies", though. That's a discussion better had over on Drupal.org than here.

@clemens-tolboom

@Crell If we go for option 2 we from Drupal only have to add another symfony component 'Translation' to the plate right?

@chx
chx commented May 9, 2012

Gabor, let me challenge that. 9 years or not, we have git blame and this code , as far as I can remember is not a particularly often changed. The likely culprits are you and Gerhard anyways and both of you are still within easy reach. Someone needs to run a bunch of git blames to figure this out. I think it's doable.

@beberlei

git blame is not enough, whitespace changes or coding standard adjustments move the blame away from the copyright holders of the code. You have to walk the history of files back, plus handling potential moves of the file or parts of the code. Its really not that easy :)

@lsmith77

i just clicked through https://github.com/drupal/drupal/commits/7.x/includes/locale.inc?page=8 .. seems like mostly the same names .. though many commits were done on behalf of someone else .. but as @chx points out .. its only a portion of the file ..

@chx
chx commented May 9, 2012

It really is not that hard. We need someone with more time than me to verify this completely but I think only 57c9a13e, 30c2e89c, 9841e29a, 941139bf are responsible. At least that's when the import one was introduced. This suggests the same names as I suspected originally plus kkaefer/timcn.

@goba

Well, for that, we should assume that the commit message surely includes everybody who contributed to the patches involved. Is that legally possible to assume? Eg. how do you parse "New locale module thanks to Gerhard, Goba, Marco, Kristjan and others." for the complete list of people involved? From http://drupalcode.org/project/drupal.git/commit/1831e1b690f02d7f551d38ef88a0ba200f786497 Then how do you dig up the discussion history from 8 years ago (2004) on that commit to figure out who were the "others"?

@chx

Do we have code from that commit? If yes, that's unfortunate.

@goba

Well, you might be surprised how little changed in the actual .po file / plural formula parsing code since then. For example, the much dreaded eval() for formula evaluation was introduced with this 2004 patch and is still used in the 2011 released Drupal 7. But this is just the big early commit. There are more than likely intermediate commits with "and others" mentioned from times where we did not have an issue queue based process to track people down retroactively.

@clemens-tolboom

Puzzled by the multiline problem and other I did another #4249

@ #drupal-i18n we discussed our effort and decided to follow two tracks of which another is to try to use as much as possible of the Translation component:

I'm not sure whether we should close this pull for now. I'm willing to write a new parser when a sponsor is available.

@davidpersson

As one of the original authors of the currently used implementation. I'd also be interested in any developments regarding plural rule parsing and evaluation without using evil eval(). I remember CakePHP wasn't using eval for that task. You may also want to look if anything has changed/improved since you last looked on the implementation.

https://github.com/UnionOfRAD/lithium/blob/master/g11n/catalog/adapter/Gettext.php
https://github.com/UnionOfRAD/lithium/blob/master/tests/cases/g11n/catalog/adapter/GettextTest.php

Full disclosure: I'm part of the Lithium team and CakePHP alumni.

However: I just discovered you're using the implementation and didn't know it before - hope it does a good job.

@chx

http://drupal.org/node/1273968 rips out the eval from Drupal. This code, however, was written by me, and as such, unless someone convinces me otherwise, is GPL.

@davidpersson

Than it can't be used in non-GPL projects :(
But nice work!

@clemens-tolboom

I retract this pull request as it is be replaced by #4249

@clemens-tolboom

From a Symfony perspective we have PluralizationRules which can be extended by ones own pluralization rule.

I hope @chx can interface with that for Drupal as my Drupal patch http://drupal.org/node/1570346 (needs lots of work still) is using MessageSelector already.

@davidpersson tnx for the pointers.

@ghost

@davidpersson - there is nothing evil about eval in this particular circumstance: please see #2630

@ghost
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment