update_translation.sh fix #430
Conversation
Is there a way to setup xgettext to support UTF-8 encoding?
In fact non-ASCII characters are there for the purpose of testing performance of UTF-8 conversion of non-ASCII characters. @crunchyjohn, we have long-pending PR to migrate to maven builds: #322 Can you please clarify if file locations is really important part of pot files? |
Russian characters are only in benchmarks, perhaps that folder could be just excluded from xgettext? |
One day I might want to add benchmark for xgettext :) |
…slation script; make sure build.xml on ant clean doesn't accidentally wipe out lib/.gitignore
Thanks everyone for the feedback. I've made a new push, and can explain the changes.
With respect to the discussion about the pot file impacting the maven branch, that is a good discussion to have. I haven't really taken a look at the content of that pull-request to see how update-translations.sh is invoked there. Perhaps it isn't. It is worth noting that the update-translations.sh script is not actually invoked at all by the ant commands as part of the current build process; it's an extra step that seems optional right now. Since the messages.pot file itself is a derived object generated from the update-translations.sh script, that file itself wouldn't be in conflict anymore, with the exception that the maven branch would need to have that file git removed to match this pull request. If there are additional conflicts in the .po files due to java code moving around, then the maven-branch submitters can pick either my contents or theirs-- there is no need to merge anything. However, they will need to invoke update-translations.sh script afterwards. This will generate a new messages.pot file and subsequently new .po files with the correct/latest content. Since the messages.pot file content is generated by a liberal "find" command, it should find everything in the new hierarchy without issue and generate the correct content. Does that make sense? Thanks |
On 20 November 2015 at 15:18, crunchyjohn notifications@github.com wrote:
Yes, it makes sense. I apologize for not following this thread but what
|
Hi @davecramer, In short, update-translations.sh is broken, and has been since around June when some non-ASCII characters were introduced to a java file-- one of the tools invoked by the script can't handle the characters, so I added a flag to allow it to handle them. Without this change, the script throws a subtle warning but keeps going. Since the script failed to generate messages.pot when invoked (which it uses for the translation .po files), it instead uses a stale messages.pot that's part of the git repo and out-of-date for generating .po files. My changes fix the script to allow it to handle the non-ASCII characters, check in the generated .po files from the script's invocation, and do some cleanup (removing .class files from the git repository). Summary: translation files seem to have been broken since June; this commit fixes them and makes them up-to-date. -John |
Hi @crunchyjohn, Awesome, thanks! |
Hi @crunchyjohn, Why all the changes in the .po files ? |
Hi @davecramer, Good question. After fixing update-translations.sh, I ran it myself, which modified the .po files with all of the outstanding changes that have been missing since June (which looks to be line number adjustments and some translations). If it makes more sense for the project builder to run the script on their own after merging, I'd be happy to remove those from the build request; I just figured I'd save that person a step. I did not modify the files by hand in any way; these changes are just the result of running the script that I fixed. Hope that helps! Regards, |
IIRC the reason the .class files are included is because not everyone has the tools on their machines to produce them. Specifically xgettext, msgmerge, and msgfmt. I did run update-translations.sh with your changes for the 1206 build and it ran successfully; thanks! I'm open to debate about the .class files though |
@davecramer To help me understand a bit, what's the worst thing that happens if the class files are not checked in, and a user does not have the proper tools to generate them? Are they just not able to trace certain symbols in intelliJ? How often would they need to access these symbols, since they're not part of the main build? I'm more of a build guy than a java developer, so I'd like to fully understand the impact of leaving them out versus checking them in. -John |
John, The only impact I can see is that if you didn't have the class files and What issues do you see including them? They rarely change. Dave Cramer On 27 November 2015 at 12:20, crunchyjohn notifications@github.com wrote:
|
Hi @davecramer I think I understand the problem now. Since these class files aren't generally built by the consumers of this project, every person building this project with these changes will be missing this content in their generated jar-files. It all comes down to a simple decision-- either making the translation update script become mandatory for all users and having it become part of the build process itself (which requires installation of the extra tools, etc), or checking in the class files, provided that the project builder runs the script periodically. I can see why it isn't necessarily desirable to force installation of those special tools for everyone, as the tools are not necessarily available on all platforms. In the end, I guess it's the build engineer's responsibility to make sure the class files are up-to-date. And, I think that decision makes sense. With that being said, I'd rather have the responsibility for generating the class files that are checked in fall to the project builder instead of myself. It's my first commit to the project, and I don't have an established relationship with the folks here. I could easily see somebody making the case that my .class files could potentially be harmful and a security risk, as they are hard to verify if they were generated from that same source code. For this specific case, would it be possible to:
I feel like that might be a good compromise that lets this responsibility stay in the build engineer's hand instead of a random contributor. Regards |
Once we move to Maven we could have a build profile which invokes these On Fri, 27 Nov 2015, 19:06 crunchyjohn notifications@github.com wrote:
|
Hello folks, Is there anything else I need to do with regard to this request? -John |
I want to wait til the maven PR is done. Plus I have very limited internet Dave Cramer On 7 December 2015 at 09:43, crunchyjohn notifications@github.com wrote:
|
fbedb9f
to
5d43712
Hey folks, Good job on getting the maven PR in! Now that that's addressed, I wanted to spend a little bit of time going over this PR and seeing what else needs to be done. It appears that the update-translations.sh script has already been patched with the --from-code=UTF-8 fix, so the main issue is gone. However, I did consider that it might make sense to try to mavenize this tool at this point in time. I took a look today at migrating this tool into the maven infrastructure by method of a profile. I have some groundwork laid where a user would basically run: Thanks, |
@crunchyjohn , I think I wonder if we can add Travis check for "missing translations" or "invalid translations" or something like that. Is there an easy way to do that? I do not think it makes sense to spend lots of time for that, but it would be nice if Travis would report invalid translations. |
This can be closed, as its refactor went in with #479 |
Hello folks,
I would like to request submission of a patch for the postgresql-jdbc project. There seems to be a breakage in the translation package generator script. I can explain a bit in detail below.
Whenever a user runs update-translations.sh, it "looks like" it's working, but it really isn't. This script first generates a list of java files and stores them in translation.filelist, and then invokes "xgettext" on that file list in order to generate org/postgresql/translation/messages.pot. Unfortunately, this task throws an error that looks suspiciously like a warning:
xgettext: Non-ASCII string at ./ubenchmark/src/main/java/org/postgresql/benchmark/encoding/UTF8Encoding.java:50.
Sadly, it's really an error, so a new version of messages.pot never gets generated. Because of this, the translations only run on the current version of messages.pot that is checked into the source code base. This only modifies about three files. So, again, it "looks like" it's running, but it really isn't.
I'm suggesting the following changes:
Change line 50 of ubenchmark/src/main/java/org/postgresql/benchmark/encoding/UTF8Encoding.java, which has some Cyrillic (Russian) characters in it, which are not a part of the standard ASCII character set. These three letters are the cause of the xgettext failure.
Remove *.class files and messages.pot from version control in org/postgresql/translation. These are generated files and probably should never have been tracked with git.
Add these four lines to .gitignore:
org/postgresql/translation/.class
org/postgresql/translation/.po~
org/postgresql/translation/messages.pot
translation.filelist
The .class files and messages.pot are from step 2. The .po~ files seem to be backup files of the .po files that are changed during a translation update, and translation.filelist is there from update_translations.sh just as a safety precaution.
rm org/postgresql/translation/messages.pot
By adding this line before messages.pot is generated, it will force the end of the script to fail in the event that there are problems generating messages.pot (the for loop will be null). This might not be the best solution, but it's adequate to catch such problems again in the future.
My only question is whether the changes to UTF8Encoding.java are harmful in any way, as I'm not familiar with this code (I'm more of a build engineer). Perhaps someone can take a look and make sure that it's safe before inclusion.
Please consider these for submission.
Thank you!