Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't mark formated strings as c-format #57

Open
tsdgeos opened this issue Dec 4, 2018 · 7 comments
Open

Don't mark formated strings as c-format #57

tsdgeos opened this issue Dec 4, 2018 · 7 comments

Comments

@tsdgeos
Copy link

tsdgeos commented Dec 4, 2018

If we have a string like

<plurals name="received_files_title">
    <item quantity="one">Received file from %2$s</item>
    <item quantity="other">Received %1$d files from %2$s</item>
</plurals>

a2po will create

#, c-format
msgctxt "received_files_title"
msgid "Received file from %2$s"
msgid_plural "Received %1$d files from %2$s"
msgstr[0] ""
msgstr[1] ""

But the problem is that adding c-format enforces that the translation strings contain both %1 and %2 since printf c-format can't skip strings while android can.

So I think it's better not to include c-format, does that make sense to you?

@miracle2k
Copy link
Owner

I am not sure. Are you using a particular tool that enforces this? Except for this difference, it is basically c-format, right? If so, by removing the tag, are we losing tooling support that is enabled the the format being recognized?

I guess I see this as a tooling question, if that makes sense.

@tsdgeos
Copy link
Author

tsdgeos commented Dec 23, 2018

msgfmt --check

which is "the tool" to deal with .po files.

Yes by removing the c-format marker the format is not being recognized anymore as c-format, which is a good thing since "android strings" are not really c-format

@miracle2k
Copy link
Owner

I'm not familiar with the details of c-format strings; I think I chose to add c-format because they looked superficially similar/closest.

With tooling I mean validation that might happen in translation UIs, for example. In other words, I am wondering (and I haven't used this tool or gettext in a long time, so I might be ways off):

  • With c-format, the translation has to contain all the original placeholders, and the Android feature of skipping a placeholder (because they are numbered) is not supported.

  • Without c-format, the placeholders are not recognized at all, and even in simple cases such as %s where placeholder numbering is not used, no validation happens.

Is this wrong? I am certainly open to changing this, since c-format is imperfect, but just wanted to investigate whether there are better options, for example making this an option, or being smart about it and recognizing whether the numbering feature is used.

@tsdgeos
Copy link
Author

tsdgeos commented Jan 4, 2019

I agree with you that trying to keep c-format if the string is c-format compatible may be a good idea.

The problem though is that a string being being c-format compatible means


/* C format strings are described in POSIX (IEEE P1003.1 2001), section
   XSH 3 fprintf().  See also Linux fprintf(3) manual page.
   A directive
   - starts with '%' or '%m$' where m is a positive integer,
   - is optionally followed by any of the characters '#', '0', '-', ' ', '+',
     "'", or - only in msgstr strings - the string "I", each of which acts as
     a flag,
   - is optionally followed by a width specification: '*' (reads an argument)
     or '*m$' or a nonempty digit sequence,
   - is optionally followed by '.' and a precision specification: '*' (reads
     an argument) or '*m$' or a nonempty digit sequence,
   - is either continued like this:
       - is optionally followed by a size specifier, one of 'hh' 'h' 'l' 'll'
         'L' 'q' 'j' 'z' 't',
       - is finished by a specifier
           - '%', that needs no argument,
           - 'c', 'C', that need a character argument,
           - 's', 'S', that need a string argument,
           - 'i', 'd', that need a signed integer argument,
           - 'o', 'u', 'x', 'X', that need an unsigned integer argument,
           - 'e', 'E', 'f', 'F', 'g', 'G', 'a', 'A', that need a floating-point
             argument,
           - 'p', that needs a 'void *' argument,
           - 'n', that needs a pointer to integer.
     or is finished by a specifier '<' inttypes-macro '>' where inttypes-macro
     is an ISO C 99 section 7.8.1 format directive.
   Numbered ('%m$' or '*m$') and unnumbered argument specifications cannot
   be used in the same string.  When numbered argument specifications are
   used, specifying the Nth argument requires that all the leading arguments,
   from the first to the (N-1)th, are specified in the format string.
 */

Which i'm not sure how easy is to get right if the string is c-format compatible or not.

@BugsBeGone
Copy link
Contributor

BugsBeGone commented Oct 9, 2019

Probably it should be using java-format rather than c-format: https://www.gnu.org/software/gettext/manual/gettext.html#Translators-for-other-Languages

Edit: the links provided in the gettext manual are long dead, so here is the relevant Java API:
https://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html#dpos

In particular:

If there are more arguments than format specifiers, the extra arguments are ignored.

@BugsBeGone
Copy link
Contributor

At first I thought poedit et al. just happened to have terrible support for java-format, so maybe that's why nobody is using it. I made my own test .po files, ran them directly through gettext's msgfmt and got equally disappointing results.

It turns out that Java has two completely different types of format strings: the above mentioned Formatter class which provides printf-like functionality via String.format(), as well as the MessageFormat class (which is the one actually referred to in the gettext documentation).

It is the MessageFormat type that is covered by gettext's java-format, but presumably most Android developers are much more familiar with the Formatter (albeit through String.format()).

Coincidentally, Formatter support has actually been added recently to gettext under the new java-printf-format flag: http://savannah.gnu.org/bugs/?51367.

However, general platform support for an as-of-yet unreleased version of gettext is probably a long way off. Perhaps moving from c-format to java-printf-format would be best suited to a future 2.0 release?

@miracle2k
Copy link
Owner

I think I'd merge a PR that makes it configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants