Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to transform messages at compile time #380

Closed
saveman71 opened this issue Nov 17, 2023 · 3 comments
Closed

Allow to transform messages at compile time #380

saveman71 opened this issue Nov 17, 2023 · 3 comments
Assignees

Comments

@saveman71
Copy link

We have some -- not much -- HTML in our translations. From my point of view, this is not a rare or bad thing. However, this means we must trust the translations since we have to send them unescaped.

In an ideal situation, PO files are considered safe. All locales are committed to the repo and approved via PRs.

However, we've strayed away from this pattern, and now we only have the source PO files (french) committed, and the rest of translations are downloaded during the release from our translation management system.

Since we can't guarantee anymore that strings are safe (accounts can be compromised, we need to find a way to sanitize the translations, either at compile time or at runtime).

For now, since we have just a few occurrences of these calls that return html, we just sanitize at these specific usages, at runtime with https://github.com/rrrene/html_sanitize_ex.

It works fine, however this is a bit expensive and since the strings are known in advance it makes little sense to just sanitize again and again.

For comparison:
image

The 100 visible calls are the sanitized call, and the empty space after represents 100 regular calls to gettext w/o sanitization.

So the idea would be to run this step at compile time. We could even tag them as "to be sanitized" in the POT files.

So my questions are:

  1. Do I have an easy way to pre-process the PO files before gettext evaluates them? I see this line
    %Messages{messages: messages, file: file} = messages_struct = PO.parse_file!(path)
    where at compile-time the lib calls the parser (I didn't know if this issue should've been opened on the expo repo).
  2. If I can wire things myself, would you have any suggestion?
  3. The last alternative would be to parse the file, process then write the file at compile-time before gettext, then gettext does it thing

Last note: I'm approaching this preprocess/transform feature request by the security aspect, but for other projects I used to pre-process translations for other things as well, to fix some issues such as spacing between punctuation symbols.

For example, see an implementation of that using JS's https://github.com/i18next/i18next

await i18next
  .use(i18nextFsBackend)
  .init({
    backend: {
      loadPath: '/languages/{{lng}}/{{ns}}.json',
      parse: function (string) {
		// Here we can do basically what we want
        return JSON.parse(string, (_key, value) => {
          if (typeof value !== 'string') {
            return value;
          }
          return value
            .replace(/\s([:?!»])/g, '\u202f$1')
            .replace(/([«])\s/g, '$1\u202f');
        });
      },
      // [...]
    },
  });

One last point, I'm open to any suggestions of how to avoid html in translations while keeping some flexibility.

One of our main motivation is described here: https://elixirforum.com/t/how-to-create-an-i18n-able-link/55030
We ended up with a similar implementation of what's described here: https://gist.github.com/angelikatyborska/cebc3de03c08307edebf6054ed09ff5f#gistcomment-4762617

@maennchen
Copy link
Member

@saveman71 Is there a reason why this needs to happen inside of gettext and not as a separate step before feeding the file into gettext?

You can use expo to read / manipulate & write .po files.

@saveman71
Copy link
Author

saveman71 commented Nov 29, 2023

Our setup is as follow:

  • Only french translations are stored in git
  • On image build, we pull the translations with the localazy binary in the GHA build
  • Then we run docker build (all the elixir deps download / build happens in docker)
  • Here AFAIK I can't just add another mix task to run expo, as it would require building the app twice, once for the mix step, and once more once the po files are modified
  • But maybe we can have another repo cloned and compile that does that indeed, then compile the app

Will try and report back, but not my current focus at the moment

Thanks for your insight!

@maennchen
Copy link
Member

@saveman71 Ok. I'll close this issue in the meantime then. Reopen if you think we should follow up on this.

And let me know if you have trouble getting expo to do what you want :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants