Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translation problem, property files is not read as UTF8 #2090

Closed
johannilsson opened this Issue Aug 7, 2015 · 10 comments

Comments

Projects
None yet
3 participants
@johannilsson
Copy link
Contributor

commented Aug 7, 2015

It seems like there's an issue when applying translations.

According to the docs for Properties, http://docs.oracle.com/javase/7/docs/api/java/util/Properties.html#load(java.io.InputStream) the reader assumes the encoding to be ISO 8859-1 but OTP itself assumes it to be UTF-8 which gives strange characters in the response.

Tested with the swedish translations.

@johannilsson

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2015

Also tried with a custom implementation of ResourceBundle.Control similar to http://stackoverflow.com/questions/4659929/how-to-use-utf-8-in-resource-properties-with-resourcebundle#answer-4660195 that also resolves this issue. Not sure which approach you prefer.

@abyrd

This comment has been minimized.

Copy link
Member

commented Aug 7, 2015

First, the era of multiple character encodings must come to an end. In this day and age all our files must be in UTF-8 Unicode if there is any solution that permits it.

The custom ResourceBundle Control seems to work by reading nonstandard UTF-8 encoded Properties files, internally re-encoding them as ISO-8859-1, then reading those bytes into String objects. This does solve the problem, but because of the built in ISO-8859-1 assumption I'm inclined to see Properties files as a legacy standard to be abandoned or replaced.

@johannilsson

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2015

Yeah totally with you on that, not sure it helps to force encoding of the properties here though seems like the IDEs performs some magic to ensure they conform to what Java expects here.

I picked this way of dealing with it sense it was a bit less code to maintain, and no need to copy and maintain the internals for the sdk here.

For translations, I would personally rather like to see them removed from OTP completely and replaced with error codes and or constants and let the clients provide translations instead.

@abyrd

This comment has been minimized.

Copy link
Member

commented Aug 7, 2015

We have recently come to the opposite consensus, that as much translation as possible should be done on the server. The main reason is to avoid duplicated code and translation effort in every OTP client. My initial instinct was the same as yours, but feedback from client developers was definitely in favor of having the server pre-translate all text.

@abyrd

This comment has been minimized.

Copy link
Member

commented Aug 7, 2015

Many translators will also be providing files that were not edited in an IDE, so I'd really like to use plain UTF-8 files to hold the translation text. @buma is working on server-side translation in #2084, he may have some insight on how the libraries he's using handle encodings.

@johannilsson

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2015

Ah got it. It's not a big deal for us, we only use the translations for street names at the moment and proxy OTP through our own API. We do need some work on our clients when passing the locale through to OTP though to make sure it doesn't return translations for a locale we don't support in our clients.

@buma

This comment has been minimized.

Copy link
Contributor

commented Aug 7, 2015

I also had a problem because different properties files had different encodings. I used prop2po to convert them to PO files an look at each one to be sure that encoding was interpreted correctly. Library did some guessing to found the encoding and I needed to specify some encoding myself but at the end I got all properties files correctly converted to PO files.

They are currently used as Gettext Resource bundles which are compiled class files converted from PO files. Since PO files have encoding specified in them and are usually UTF-8 there are no problems with encoding anymore.

@johannilsson

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2015

Thanks for the update, we'll be running with this patch til the other changes has been merged then. Feel free to close this issue if you want :)

@abyrd

This comment has been minimized.

Copy link
Member

commented Aug 7, 2015

Since @buma has apparently made this a non-issue, I will close it :)

@abyrd abyrd closed this Aug 7, 2015

@johannilsson

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2015

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.