Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translation errors #36

Open
AA-Turner opened this issue Apr 24, 2024 · 6 comments
Open

Translation errors #36

AA-Turner opened this issue Apr 24, 2024 · 6 comments

Comments

@AA-Turner
Copy link
Member

I've had to blacklist Tamil, and zh_CN is newly failing, meaning the translations PRs don't get updated.

https://github.com/sphinx-doc/sphinx/actions/workflows/transifex.yml

Is there a way of ensuring in transifex that these errors don't happen?

A

@rffontenelle
Copy link
Collaborator

These errors are for UI, not docs. Considering reporting in there.

Important subject, though.

@rffontenelle
Copy link
Collaborator

@AA-Turner I already mentioned the Tamil issue in #28

@rffontenelle
Copy link
Collaborator

Is there a way of ensuring in transifex that these errors don't happen?

@AA-Turner Sorry for not answering this question specific before.

TL;DR; I don't think there is straightforward way to avoid it in Transifex.

Translations not honoring placeholders %s are reported in-screen as error for the translations (see Tamil example), but I haven't found a straightforward way to tx pull filtering these error strings. Haven't found a API endpoint either.

Pulling only reviewed translations would reduce the chance of these errors, but not ensure (plus adding a big burden to the existing contributors). I don't think it is worth.

A manual solution would be to have me editing and fixing, or clearing the problematic translation strings. I can do that if you need, but I need to be made aware (via CI etc.) whenever it happens.

Is it possible to programmatically retrieve the language codes causing the compilation to fail? It occurred to me that the transifex.yml CI workflow could keep going by first clearing these problematic language codes with git checkout <lang>.

@n-peugnet
Copy link

n-peugnet commented Jun 7, 2024

A manual solution would be to have me editing and fixing, or clearing the problematic translation strings. I can do that if you need, but I need to be made aware (via CI etc.) whenever it happens.

I just thought about it, but maybe the simplest solution would be to mark all failing messages as fuzzy. This will skip these messages, allowing the rest to compile without errors.

Is it possible to programmatically retrieve the language codes causing the compilation to fail?

It is possible with msgfmt --check (from gettext). I made a pull request on Sphinx to do exactly this for the internal messages.

Maybe another script could use this information to add the fuzzy tag automatically.

I made this very quick script to add the fuzzy flag to all failing strings using babel. It should probably be tuned a little bit to limit the diffs produced:

from sys import argv
from babel.messages.pofile import read_po, write_po

file = open(argv[1], "r+b")
catalog = read_po(file)
for message in catalog:
    errs = message.check()
    if errs:
        message.flags.add('fuzzy')

file.seek(0)
file.truncate()
write_po(file, catalog)
file.close()

But it seems msgfmt still finds errors that babel don't:

msgfmt --check -o /dev/null ta/LC_MESSAGES/sphinx.po
ta/LC_MESSAGES/sphinx.po:504: 'msgid' and 'msgstr' entries do not both end with '\n'
ta/LC_MESSAGES/sphinx.po:805: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:840: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:906: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:924: a format specification for argument 'outdir' doesn't exist in 'msgstr'
ta/LC_MESSAGES/sphinx.po:941: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:962: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:975: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:982: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:1022: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1033: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1038: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1048: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1228: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:1235: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:1409: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:1915: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:2745: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:2882: a format specification for argument 'outdir' doesn't exist in 'msgstr'
ta/LC_MESSAGES/sphinx.po:2940: a format specification for argument 'outdir' doesn't exist in 'msgstr'
ta/LC_MESSAGES/sphinx.po:3463: 'msgid' and 'msgstr' entries do not both begin with '\n'
ta/LC_MESSAGES/sphinx.po:3489: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3494: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3499: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3506: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3694: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The string ends in the middle of a directive.
ta/LC_MESSAGES/sphinx.po:3718: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3732: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3744: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The string ends in the middle of a directive.
ta/LC_MESSAGES/sphinx.po:3751: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3801: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: In the directive number 1, the character 'S' is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3806: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: The character that terminates the directive number 1 is not a valid conversion specifier.
ta/LC_MESSAGES/sphinx.po:3811: 'msgstr' is not a valid Python format string, unlike 'msgid'. Reason: In the directive number 1, the character 'S' is not a valid conversion specifier.
msgfmt: found 36 fatal errors

So maybe using instead the stderr of msgfmt --check with babel's message.lineno (with the largest lineno inferior to the line of the error message) to add the fuzzy flag would be the best option.

@n-peugnet
Copy link

A probably simpler possibility would be to make a babel_runner.py check command that would only return the errors found by Babel. This way there is no dependency on gettext and the python script I showed earlier could be used to implement some kind of babel_runner.py check --fix command.
The only inconvenient is that we can now miss the errors discovered by msgfmt --check, but nothing prevent us from adding them back in the python script later.

@rffontenelle
Copy link
Collaborator

Just to mention that Tamil team fixed the errors reported by Transifex, although I haven't checked the quality of the rest of the docs. Last time I checked, Changelog translation didn't have the issue number and link, so there's still room for improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants