-
Notifications
You must be signed in to change notification settings - Fork 120
Unicode problem in po to JSON conversion #677
Comments
No clue what's going on. I tried a different grunt-po-json module and it seemed to translate the Hebrew and German files fine. This may be some issue in the grunt-po2json module or maybe some additional flag we need to set. It doesn't look like an issue w/ the The closest I've gotten to fixing this is explicitly setting the "utf8" charset on the var data = fs.readFileSync(fs.realpathSync(fileName), 'utf8'); After hacking that, it seems that Hebrew started parsing fine. Not sure if there is some weird buffer bug somewhere in the po2json dependency, unless there is a weird flag I'm missing. Figure 1: We need to go deeper! |
@pdehaan Looks like a bug in po2json for sure. When there's debate, use |
@zaach, @pdehaan - funny, I wrote a blog post about this a month ago - https://shanetomlinson.com/2014/l10n-gotcha-missing-charset-in-content-type-header/ |
Thanks for the tip, @shane-tomlinson! I did a bit more poking and maybe found a workaround. Shane mentioned "expects character encoding" but I couldn't find any params we could pass to set that, so I searched the po2json repo for 'charset' and noticed this in their .po file:
But if I look at our Hebrew locale I see the following:
I did a few quick tests locally and it seems changing the Not sure how to fix this in our source. I can certainly submit a big PR in the mozilla/fxa-content-server-l10n repo if adding the |
Thanks for finding this po2json bug! I'll definitely merge in any patches you guys come up with for this. |
Thanks @mikeedwards. I'm not sure if the fix is as simple as adding |
@mathjazz Does verbatim set or overwrite the charset in .po files? Or can we set those ourselves and trust they'll remain so? |
Ah, ok, good to know, @pdehaan . If that seems like the best route for you to take (vs. adding the |
@zaach Verbatim does not change the charset. We should stick to UTF-8. |
@mathjazz, So, should I add the |
Curious, it looks like the .pot files have the charset defined. |
@pdehaan Yes, see the example of a working file here: Please let me know when you're planning to update the files in the repo, so I'll also update them in Verbatim. |
I have a PR that I can submit today, I'll just have to double check if i used "utf-8" or "UTF-8" (if we care). I'll also need to ping @zaach on why I was seeing the charset defined in the .pot files but not the .po files. Not sure if I'm misunderstanding some part of the workflow, or if we need to rerun the extract strings and regenerate and merge .PO files scripts. |
PR submitted; mozilla/fxa-content-server-l10n#2 |
Verbatim updated. |
Closing as fixed. |
See @pdehaan's comment: #676 (comment)
The weird characters are present in the generated JSON found in
app/i18n
.Another example, from the Back/Zurück button on the /legal/terms page:
Expected:
The text was updated successfully, but these errors were encountered: