You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have some -- not much -- HTML in our translations. From my point of view, this is not a rare or bad thing. However, this means we must trust the translations since we have to send them unescaped.
In an ideal situation, PO files are considered safe. All locales are committed to the repo and approved via PRs.
However, we've strayed away from this pattern, and now we only have the source PO files (french) committed, and the rest of translations are downloaded during the release from our translation management system.
Since we can't guarantee anymore that strings are safe (accounts can be compromised, we need to find a way to sanitize the translations, either at compile time or at runtime).
For now, since we have just a few occurrences of these calls that return html, we just sanitize at these specific usages, at runtime with https://github.com/rrrene/html_sanitize_ex.
It works fine, however this is a bit expensive and since the strings are known in advance it makes little sense to just sanitize again and again.
For comparison:
The 100 visible calls are the sanitized call, and the empty space after represents 100 regular calls to gettext w/o sanitization.
So the idea would be to run this step at compile time. We could even tag them as "to be sanitized" in the POT files.
So my questions are:
Do I have an easy way to pre-process the PO files before gettext evaluates them? I see this line
where at compile-time the lib calls the parser (I didn't know if this issue should've been opened on the expo repo).
If I can wire things myself, would you have any suggestion?
The last alternative would be to parse the file, process then write the file at compile-time before gettext, then gettext does it thing
Last note: I'm approaching this preprocess/transform feature request by the security aspect, but for other projects I used to pre-process translations for other things as well, to fix some issues such as spacing between punctuation symbols.
awaiti18next.use(i18nextFsBackend).init({backend: {loadPath: '/languages/{{lng}}/{{ns}}.json',parse: function(string){// Here we can do basically what we wantreturnJSON.parse(string,(_key,value)=>{if(typeofvalue!=='string'){returnvalue;}returnvalue.replace(/\s([:?!»])/g,'\u202f$1').replace(/([«])\s/g,'$1\u202f');});},// [...]},});
One last point, I'm open to any suggestions of how to avoid html in translations while keeping some flexibility.
On image build, we pull the translations with the localazy binary in the GHA build
Then we run docker build (all the elixir deps download / build happens in docker)
Here AFAIK I can't just add another mix task to run expo, as it would require building the app twice, once for the mix step, and once more once the po files are modified
But maybe we can have another repo cloned and compile that does that indeed, then compile the app
Will try and report back, but not my current focus at the moment
We have some -- not much -- HTML in our translations. From my point of view, this is not a rare or bad thing. However, this means we must trust the translations since we have to send them unescaped.
In an ideal situation, PO files are considered safe. All locales are committed to the repo and approved via PRs.
However, we've strayed away from this pattern, and now we only have the source PO files (french) committed, and the rest of translations are downloaded during the release from our translation management system.
Since we can't guarantee anymore that strings are safe (accounts can be compromised, we need to find a way to sanitize the translations, either at compile time or at runtime).
For now, since we have just a few occurrences of these calls that return html, we just sanitize at these specific usages, at runtime with https://github.com/rrrene/html_sanitize_ex.
It works fine, however this is a bit expensive and since the strings are known in advance it makes little sense to just sanitize again and again.
For comparison:
The 100 visible calls are the sanitized call, and the empty space after represents 100 regular calls to gettext w/o sanitization.
So the idea would be to run this step at compile time. We could even tag them as "to be sanitized" in the POT files.
So my questions are:
gettext/lib/gettext/compiler.ex
Line 505 in 002dee1
Last note: I'm approaching this preprocess/transform feature request by the security aspect, but for other projects I used to pre-process translations for other things as well, to fix some issues such as spacing between punctuation symbols.
For example, see an implementation of that using JS's https://github.com/i18next/i18next
One last point, I'm open to any suggestions of how to avoid html in translations while keeping some flexibility.
One of our main motivation is described here: https://elixirforum.com/t/how-to-create-an-i18n-able-link/55030
We ended up with a similar implementation of what's described here: https://gist.github.com/angelikatyborska/cebc3de03c08307edebf6054ed09ff5f#gistcomment-4762617
The text was updated successfully, but these errors were encountered: