New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update_translation.sh fix #430

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
5 participants
@crunchyjohn
Contributor

crunchyjohn commented Nov 19, 2015

Hello folks,

I would like to request submission of a patch for the postgresql-jdbc project. There seems to be a breakage in the translation package generator script. I can explain a bit in detail below.

Whenever a user runs update-translations.sh, it "looks like" it's working, but it really isn't. This script first generates a list of java files and stores them in translation.filelist, and then invokes "xgettext" on that file list in order to generate org/postgresql/translation/messages.pot. Unfortunately, this task throws an error that looks suspiciously like a warning:
xgettext: Non-ASCII string at ./ubenchmark/src/main/java/org/postgresql/benchmark/encoding/UTF8Encoding.java:50.
Sadly, it's really an error, so a new version of messages.pot never gets generated. Because of this, the translations only run on the current version of messages.pot that is checked into the source code base. This only modifies about three files. So, again, it "looks like" it's running, but it really isn't.

I'm suggesting the following changes:

  1. Change line 50 of ubenchmark/src/main/java/org/postgresql/benchmark/encoding/UTF8Encoding.java, which has some Cyrillic (Russian) characters in it, which are not a part of the standard ASCII character set. These three letters are the cause of the xgettext failure.

  2. Remove *.class files and messages.pot from version control in org/postgresql/translation. These are generated files and probably should never have been tracked with git.

  3. Add these four lines to .gitignore:
    org/postgresql/translation/.class
    org/postgresql/translation/
    .po~
    org/postgresql/translation/messages.pot
    translation.filelist

The .class files and messages.pot are from step 2. The .po~ files seem to be backup files of the .po files that are changed during a translation update, and translation.filelist is there from update_translations.sh just as a safety precaution.

  1. Add one line to update-translations.sh:
    rm org/postgresql/translation/messages.pot

By adding this line before messages.pot is generated, it will force the end of the script to fail in the event that there are problems generating messages.pot (the for loop will be null). This might not be the best solution, but it's adequate to catch such problems again in the future.

  1. Checked-in new versions of the .po scripts after the translation script now successfully runs. These changes are the bulk of the patch-file; if desired, I can leave them out so that the jdbc build engineers can do it on their own and to prevent confusion.

My only question is whether the changes to UTF8Encoding.java are harmful in any way, as I'm not familiar with this code (I'm more of a build engineer). Perhaps someone can take a look and make sure that it's safe before inclusion.

Please consider these for submission.

Thank you!

@vlsi

This comment has been minimized.

Member

vlsi commented Nov 20, 2015

which are not a part of the standard ASCII character set

Is there a way to setup xgettext to support UTF-8 encoding?

My only question is whether the changes to UTF8Encoding.java are harmful in any way

In fact non-ASCII characters are there for the purpose of testing performance of UTF-8 conversion of non-ASCII characters.
So, please use "Hello \u043c\u0438\u0440" instead of "Hello мир".
However, I would prefer to setup UTF-8.

@crunchyjohn, we have long-pending PR to migrate to maven builds: #322
It does change lots of file locations.
As far as I can see, pot files include file paths, thus migration to maven results in lots of merge conflicts on pot files.

Can you please clarify if file locations is really important part of pot files?
Is there a way to "regenerate pot files" instead of manual conflict resolution?

@panchenko

This comment has been minimized.

panchenko commented Nov 20, 2015

Russian characters are only in benchmarks, perhaps that folder could be just excluded from xgettext?

@vlsi

This comment has been minimized.

Member

vlsi commented Nov 20, 2015

Russian characters are only in benchmarks, perhaps that folder could be just excluded from xgettext?

One day I might want to add benchmark for xgettext :)

revert UTF8Encoding.java change and instead add a flag to update-tran…
…slation script; make sure build.xml on ant clean doesn't accidentally wipe out lib/.gitignore
@crunchyjohn

This comment has been minimized.

Contributor

crunchyjohn commented Nov 20, 2015

Thanks everyone for the feedback.

I've made a new push, and can explain the changes.

  1. Instead of modifying ubenchmark/src/main/java/org/postgresql/benchmark/encoding/UTF8Encoding.java, I followed the advice of @vlsi and found more-correct answer, which is to simply add "--from-code=UTF-8" to xgettext. I've reverted the UTF8Encoding.java change.

  2. I found an issue on my local Centos 6 build where the "ant clean" at the top of update-translations.sh wiped out the lib/.gitignore file, so I modified build.xml to preserve it.

With respect to the discussion about the pot file impacting the maven branch, that is a good discussion to have. I haven't really taken a look at the content of that pull-request to see how update-translations.sh is invoked there. Perhaps it isn't. It is worth noting that the update-translations.sh script is not actually invoked at all by the ant commands as part of the current build process; it's an extra step that seems optional right now.

Since the messages.pot file itself is a derived object generated from the update-translations.sh script, that file itself wouldn't be in conflict anymore, with the exception that the maven branch would need to have that file git removed to match this pull request. If there are additional conflicts in the .po files due to java code moving around, then the maven-branch submitters can pick either my contents or theirs-- there is no need to merge anything. However, they will need to invoke update-translations.sh script afterwards. This will generate a new messages.pot file and subsequently new .po files with the correct/latest content. Since the messages.pot file content is generated by a liberal "find" command, it should find everything in the new hierarchy without issue and generate the correct content.

Does that make sense?

Thanks
-John

@davecramer

This comment has been minimized.

Member

davecramer commented Nov 21, 2015

On 20 November 2015 at 15:18, crunchyjohn notifications@github.com wrote:

Thanks everyone for the feedback.

I've made a new push, and can explain the changes.

  1. Instead of modifying
    ubenchmark/src/main/java/org/postgresql/benchmark/encoding/UTF8Encoding.java,
    I followed the advice of @vlsi https://github.com/vlsi and found
    more-correct answer, which is to simply add "--from-code=UTF-8" to
    xgettext. I've reverted the UTF8Encoding.java change.

  2. I found an issue on my local Centos 6 build where the "ant clean" at
    the top of update-translations.sh wiped out the lib/.gitignore file, so I
    modified build.xml to preserve it.

With respect to the discussion about the pot file impacting the maven
branch, that is a good discussion to have. I haven't really taken a look at
the content of that pull-request to see how update-translations.sh is
invoked there. Perhaps it isn't. It is worth noting that the
update-translations.sh script is not actually invoked at all by the ant
commands as part of the current build process; it's an extra step that
seems optional right now.

It is not part of the ant build as it rarely changes

Since the messages.pot file itself is a derived object generated from the
update-translations.sh script, that file itself wouldn't be in conflict
anymore, with the exception that the maven branch would need to have that
file git removed to match this pull request. If there are additional
conflicts in the .po files due to java code moving around, then the
maven-branch submitters can pick either my contents or theirs-- there is no
need to merge anything. However, they will need to invoke
update-translations.sh script afterwards. This will generate a new
messages.pot file and subsequently new .po files with the correct/latest
content. Since the messages.pot file content is generated by a liberal
"find" command, it should find everything in the new hierarchy without
issue and generate the correct content.

Does that make sense?

Yes, it makes sense. I apologize for not following this thread but what
problem are we trying to solve here ?

Thanks
-John


Reply to this email directly or view it on GitHub
#430 (comment).

@crunchyjohn

This comment has been minimized.

Contributor

crunchyjohn commented Nov 21, 2015

Hi @davecramer,

In short, update-translations.sh is broken, and has been since around June when some non-ASCII characters were introduced to a java file-- one of the tools invoked by the script can't handle the characters, so I added a flag to allow it to handle them. Without this change, the script throws a subtle warning but keeps going. Since the script failed to generate messages.pot when invoked (which it uses for the translation .po files), it instead uses a stale messages.pot that's part of the git repo and out-of-date for generating .po files.

My changes fix the script to allow it to handle the non-ASCII characters, check in the generated .po files from the script's invocation, and do some cleanup (removing .class files from the git repository).

Summary: translation files seem to have been broken since June; this commit fixes them and makes them up-to-date.

-John

@davecramer

This comment has been minimized.

Member

davecramer commented Nov 21, 2015

Hi @crunchyjohn,

Awesome, thanks!

@davecramer

This comment has been minimized.

Member

davecramer commented Nov 22, 2015

Hi @crunchyjohn,

Why all the changes in the .po files ?

@crunchyjohn

This comment has been minimized.

Contributor

crunchyjohn commented Nov 22, 2015

Hi @davecramer,

Good question. After fixing update-translations.sh, I ran it myself, which modified the .po files with all of the outstanding changes that have been missing since June (which looks to be line number adjustments and some translations). If it makes more sense for the project builder to run the script on their own after merging, I'd be happy to remove those from the build request; I just figured I'd save that person a step. I did not modify the files by hand in any way; these changes are just the result of running the script that I fixed. Hope that helps!

Regards,
-John

@davecramer

This comment has been minimized.

Member

davecramer commented Nov 25, 2015

@crunchyjohn

IIRC the reason the .class files are included is because not everyone has the tools on their machines to produce them. Specifically xgettext, msgmerge, and msgfmt.

I did run update-translations.sh with your changes for the 1206 build and it ran successfully; thanks!

I'm open to debate about the .class files though

@crunchyjohn

This comment has been minimized.

Contributor

crunchyjohn commented Nov 27, 2015

@davecramer To help me understand a bit, what's the worst thing that happens if the class files are not checked in, and a user does not have the proper tools to generate them? Are they just not able to trace certain symbols in intelliJ? How often would they need to access these symbols, since they're not part of the main build? I'm more of a build guy than a java developer, so I'd like to fully understand the impact of leaving them out versus checking them in.

-John

@davecramer

This comment has been minimized.

Member

davecramer commented Nov 27, 2015

John,

The only impact I can see is that if you didn't have the class files and
couldn't generate them, then you wouldn't be able to build the jars.

What issues do you see including them? They rarely change.

Dave Cramer

On 27 November 2015 at 12:20, crunchyjohn notifications@github.com wrote:

@davecramer https://github.com/davecramer To help me understand a bit,
what's the worst thing that happens if the class files are not checked in,
and a user does not have the proper tools to generate them? Are they just
not able to trace certain symbols in intelliJ? How often would they need to
access these symbols, since they're not part of the main build? I'm more of
a build guy than a java developer, so I'd like to fully understand the
impact of leaving them out versus checking them in.

-John


Reply to this email directly or view it on GitHub
#430 (comment).

@crunchyjohn

This comment has been minimized.

Contributor

crunchyjohn commented Nov 27, 2015

Hi @davecramer

I think I understand the problem now. Since these class files aren't generally built by the consumers of this project, every person building this project with these changes will be missing this content in their generated jar-files. It all comes down to a simple decision-- either making the translation update script become mandatory for all users and having it become part of the build process itself (which requires installation of the extra tools, etc), or checking in the class files, provided that the project builder runs the script periodically. I can see why it isn't necessarily desirable to force installation of those special tools for everyone, as the tools are not necessarily available on all platforms. In the end, I guess it's the build engineer's responsibility to make sure the class files are up-to-date. And, I think that decision makes sense.

With that being said, I'd rather have the responsibility for generating the class files that are checked in fall to the project builder instead of myself. It's my first commit to the project, and I don't have an established relationship with the folks here. I could easily see somebody making the case that my .class files could potentially be harmful and a security risk, as they are hard to verify if they were generated from that same source code.

For this specific case, would it be possible to:

  1. Merge my changes in, and then
  2. Have the build engineer remove the .class line from .gitignore, generate the files, and check them in (along with the .gitignore change)?

I feel like that might be a good compromise that lets this responsibility stay in the build engineer's hand instead of a random contributor.

Regards
-John

@lordnelson

This comment has been minimized.

Contributor

lordnelson commented Nov 27, 2015

Once we move to Maven we could have a build profile which invokes these
tools. In fact I've already done this in my local branch and once the Maven
PR is embedded I'll open a separate one for this.

On Fri, 27 Nov 2015, 19:06 crunchyjohn notifications@github.com wrote:

Hi @davecramer https://github.com/davecramer

I think I understand the problem now. Since these class files aren't
generally built by the consumers of this project, every person building
this project with these changes will be missing this content in their
generated jar-files. It all comes down to a simple decision-- either making
the translation update script become mandatory for all users and having it
become part of the build process itself (which requires installation of the
extra tools, etc), or checking in the class files, provided that the
project builder runs the script periodically. I can see why it isn't
necessarily desirable to force installation of those special tools for
everyone, as the tools are not necessarily available on all platforms. In
the end, I guess it's the build engineer's responsibility to make sure the
class files are up-to-date. And, I think that decision makes sense.

With that being said, I'd rather have the responsibility for generating
the class files that are checked in fall to the project builder instead of
myself. It's my first commit to the project, and I don't have an
established relationship with the folks here. I could easily see somebody
making the case that my .class files could potentially be harmful and a
security risk, as they are hard to verify if they were generated from that
same source code.

For this specific case, would it be possible to:

  1. Merge my changes in, and then
  2. Have the build engineer remove the .class line from .gitignore,
    generate the files, and check them in (along with the .gitignore change)?

I feel like that might be a good compromise that lets this responsibility
stay in the build engineer's hand instead of a random contributor.

Regards
-John


Reply to this email directly or view it on GitHub
#430 (comment).

@crunchyjohn

This comment has been minimized.

Contributor

crunchyjohn commented Dec 7, 2015

Hello folks,

Is there anything else I need to do with regard to this request?

-John

@davecramer

This comment has been minimized.

Member

davecramer commented Dec 7, 2015

I want to wait til the maven PR is done. Plus I have very limited internet
for the next week

Dave Cramer

On 7 December 2015 at 09:43, crunchyjohn notifications@github.com wrote:

Hello folks,

Is there anything else I need to do with regard to this request?

-John


Reply to this email directly or view it on GitHub
#430 (comment).

@vlsi vlsi force-pushed the pgjdbc:master branch 6 times, most recently from fbedb9f to 5d43712 Dec 29, 2015

@crunchyjohn

This comment has been minimized.

Contributor

crunchyjohn commented Dec 29, 2015

Hey folks,

Good job on getting the maven PR in! Now that that's addressed, I wanted to spend a little bit of time going over this PR and seeing what else needs to be done. It appears that the update-translations.sh script has already been patched with the --from-code=UTF-8 fix, so the main issue is gone. However, I did consider that it might make sense to try to mavenize this tool at this point in time.

I took a look today at migrating this tool into the maven infrastructure by method of a profile. I have some groundwork laid where a user would basically run:
mvn -Ptranslate clean package
and instead of running the shell script, the code would invoke the gettext-maven-plugin plugin.
A user still has to install the xgettext tool itself in order for things to work, but this seems like a possible convenience for the build engineers to save them a step. Does it make sense for me to take a look at this, or would it be a waste of time?

Thanks,
-John

@vlsi

This comment has been minimized.

Member

vlsi commented Dec 29, 2015

@crunchyjohn , I think update_translations.sh is good enough. However -Ptranslate is also good.

I wonder if we can add Travis check for "missing translations" or "invalid translations" or something like that. Is there an easy way to do that? I do not think it makes sense to spend lots of time for that, but it would be nice if Travis would report invalid translations.

@crunchyjohn

This comment has been minimized.

Contributor

crunchyjohn commented Jan 7, 2016

This can be closed, as its refactor went in with #479

@crunchyjohn crunchyjohn closed this Jan 7, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment