Allow to transform messages at compile time #380

saveman71 · 2023-11-17T16:14:01Z

We have some -- not much -- HTML in our translations. From my point of view, this is not a rare or bad thing. However, this means we must trust the translations since we have to send them unescaped.

In an ideal situation, PO files are considered safe. All locales are committed to the repo and approved via PRs.

However, we've strayed away from this pattern, and now we only have the source PO files (french) committed, and the rest of translations are downloaded during the release from our translation management system.

Since we can't guarantee anymore that strings are safe (accounts can be compromised, we need to find a way to sanitize the translations, either at compile time or at runtime).

For now, since we have just a few occurrences of these calls that return html, we just sanitize at these specific usages, at runtime with https://github.com/rrrene/html_sanitize_ex.

It works fine, however this is a bit expensive and since the strings are known in advance it makes little sense to just sanitize again and again.

For comparison:

The 100 visible calls are the sanitized call, and the empty space after represents 100 regular calls to gettext w/o sanitization.

So the idea would be to run this step at compile time. We could even tag them as "to be sanitized" in the POT files.

So my questions are:

Do I have an easy way to pre-process the PO files before gettext evaluates them? I see this line

gettext/lib/gettext/compiler.ex

Line 505 in 002dee1

%Messages{messages: messages, file: file} = messages_struct = PO.parse_file!(path)

where at compile-time the lib calls the parser (I didn't know if this issue should've been opened on the expo repo).
If I can wire things myself, would you have any suggestion?
The last alternative would be to parse the file, process then write the file at compile-time before gettext, then gettext does it thing

Last note: I'm approaching this preprocess/transform feature request by the security aspect, but for other projects I used to pre-process translations for other things as well, to fix some issues such as spacing between punctuation symbols.

For example, see an implementation of that using JS's https://github.com/i18next/i18next

await i18next
  .use(i18nextFsBackend)
  .init({
    backend: {
      loadPath: '/languages/{{lng}}/{{ns}}.json',
      parse: function (string) {
		// Here we can do basically what we want
        return JSON.parse(string, (_key, value) => {
          if (typeof value !== 'string') {
            return value;
          }
          return value
            .replace(/\s([:?!»])/g, '\u202f$1')
            .replace(/([«])\s/g, '$1\u202f');
        });
      },
      // [...]
    },
  });

One last point, I'm open to any suggestions of how to avoid html in translations while keeping some flexibility.

One of our main motivation is described here: https://elixirforum.com/t/how-to-create-an-i18n-able-link/55030
We ended up with a similar implementation of what's described here: https://gist.github.com/angelikatyborska/cebc3de03c08307edebf6054ed09ff5f#gistcomment-4762617

maennchen · 2023-11-29T18:05:20Z

@saveman71 Is there a reason why this needs to happen inside of gettext and not as a separate step before feeding the file into gettext?

You can use expo to read / manipulate & write .po files.

saveman71 · 2023-11-29T19:04:48Z

Our setup is as follow:

Only french translations are stored in git
On image build, we pull the translations with the localazy binary in the GHA build
Then we run docker build (all the elixir deps download / build happens in docker)
Here AFAIK I can't just add another mix task to run expo, as it would require building the app twice, once for the mix step, and once more once the po files are modified
But maybe we can have another repo cloned and compile that does that indeed, then compile the app

Will try and report back, but not my current focus at the moment

Thanks for your insight!

maennchen · 2023-11-29T19:08:53Z

@saveman71 Ok. I'll close this issue in the meantime then. Reopen if you think we should follow up on this.

And let me know if you have trouble getting expo to do what you want :)

maennchen added the Kind:Discussion label Nov 29, 2023

maennchen self-assigned this Nov 29, 2023

maennchen closed this as completed Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow to transform messages at compile time #380

Allow to transform messages at compile time #380

saveman71 commented Nov 17, 2023

maennchen commented Nov 29, 2023

saveman71 commented Nov 29, 2023 •

edited

maennchen commented Nov 29, 2023

Allow to transform messages at compile time #380

Allow to transform messages at compile time #380

Comments

saveman71 commented Nov 17, 2023

maennchen commented Nov 29, 2023

saveman71 commented Nov 29, 2023 • edited

maennchen commented Nov 29, 2023

saveman71 commented Nov 29, 2023 •

edited