Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upi10n #1292
Conversation
GuillaumeGomez
added some commits
Sep 24, 2015
This comment has been minimized.
This comment has been minimized.
killercup
reviewed
Sep 24, 2015
|
|
||
| ```Shell | ||
| rustc --install-lang fr # downloads an official language pack from the server | ||
| rustc --install-lang fr=pack.zip # a custom pack can be installed this way |
This comment has been minimized.
This comment has been minimized.
killercup
Sep 24, 2015
Member
I'm not sure I like the idea of adding a subcommand that downloads and extracts files to rustc. I'd rather see a separate utility to install these language packs (could also be part of multirust).
This comment has been minimized.
This comment has been minimized.
Manishearth
Sep 24, 2015
Member
Or a part of install.sh/rustup.sh (or whatever it's called these days).
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Sep 24, 2015
Author
Member
I don't see the issue to add such a thing to rust. Since localization will be handled by compiler directly, why not the language packs too ?
This comment has been minimized.
This comment has been minimized.
Manishearth
Sep 24, 2015
Member
The issue is that rustc will be talking over the network, we don't want that.
So the compiler will have support for language packs, just not for downloading/installing them.
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Sep 24, 2015
Author
Member
Why not talking over the network ? I don't really see the issue here.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
nagisa
Sep 24, 2015
Contributor
Rustc also won’t be able to save these packs most of the time anyways since you need administrative privileges for that in current versions of all major systems.
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Sep 24, 2015
Author
Member
Just like rust_install.sh. I don't think this is a real issue here. They can just launch the command with sudo if needed.
This comment has been minimized.
This comment has been minimized.
nagisa
Sep 24, 2015
Contributor
You don’t simply launch compilers as root. If a compiler needs administrative privileges, then something went wrong somewhere.
This comment has been minimized.
This comment has been minimized.
killercup
reviewed
Sep 24, 2015
|
|
||
| ##Storage of localizations | ||
|
|
||
| The localization files should be stored in a folder called i10n, which is part of the rust installation folder. By default, the english files will be there, but if you put the french files there, the option `--lang fr` will work. The folder will look like this: |
This comment has been minimized.
This comment has been minimized.
killercup
Sep 24, 2015
Member
Are there any plans on a file format? Or is this just an implementation detail at this point?
This comment has been minimized.
This comment has been minimized.
GuillaumeGomez
Sep 24, 2015
Author
Member
What do you mean by file format ? Is it about how the localization file is written or how it is compressed ? On the first one, it has been explained a bit upper, for the second one, not really.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
killercup
Sep 24, 2015
Member
Is it about how the localization file is written
Yes, I thought the text format shown above was just as an example.
Just a text file with keyvalue pairs as rust strings.
With lots of keys and sub-keys you might want to consider an indexed structure, for example. This could also be added in the future, though, if there was a clear way of versioning these files. (Also, I'm sure there are a lot of existing formats for these kind of thing.)
This comment has been minimized.
This comment has been minimized.
Manishearth
Sep 24, 2015
Member
With lots of keys and sub-keys you might want to consider an indexed structure
I don't think we'll need that.
This particular format is basically a copy of Firefox's .properties files for JS l10n (they use DTDs for HTML/XUL l10n).
Firefox has tons of these, but each JS file only loads the necessary ones for performance. They don't have nested keys though.
We don't need to do what Firefox does since these are for errors and warnings -- pretty cheap to do file I/O to fetch this information.
This comment has been minimized.
This comment has been minimized.
|
Theoretically you could use macros/syntax extension to directly allow |
This comment has been minimized.
This comment has been minimized.
|
@Kimundi I like the spirit of that idea, but it would make language packs much less portable |
This comment has been minimized.
This comment has been minimized.
|
@Kimundi: And if you want to change the key-string, you'll have to change it in all other language files. I don't think that it would be very convenient... |
This comment has been minimized.
This comment has been minimized.
|
I would really like to see a comparison to other i18n libraries, efforts, and such. This is an incredibly complex topic, and this is a very, very brief RFC. i18n is really important, and we should gain support for it. But it's really easy to do poorly. |
This comment has been minimized.
This comment has been minimized.
|
I feel like it's still too early to implement these "luxury" features. I think translating now would just lead to a worse experience (= many untranslated errors in the output) as diagnostics are improved and new ones get added. I also don't get how we're supposed to change the structure of existing messages (ie. add a That said, even if I'll never use this (I'm much more used to english jargon than german), this does seem like a good feature to have in general. |
This comment has been minimized.
This comment has been minimized.
we should use named arguments as much as possible. |
This comment has been minimized.
This comment has been minimized.
|
It occurs to me that we forgot about pluralization. That's nontrivial to handle. I think Firefox handles it by asking for two versions of the string. |
This comment has been minimized.
This comment has been minimized.
|
Do not re-invent a wheel. There’s gettext and lots of infrastructure around it. IMHO this proposal is strongly inferior to implementing gettext library (that works equally well on all supported platforms, as opposed to python’s gettext implementation) in rust and pulling that into rustc. If gettext is not satisfactory in some way, then at least port something that is known to already work; the rust project really doesn’t need to solve the already-mostly-solved l10n problem all over again. P.S. this RFC is proposing infrastructure for l10n (localisation), not i18n (internationalisation). i18n is much more involved and I don’t see how rust needs it at all. |
This comment has been minimized.
This comment has been minimized.
|
@Manishearth Pluralization is more complicated than that, as some languages have more intricate rules than just I agree with @steveklabnik that this needs far more investigation. l20n should certainly be brought up here. |
This comment has been minimized.
This comment has been minimized.
|
@nagisa: You're absolutely right. I got confused but it's i10n. |
GuillaumeGomez
changed the title
i18n
i10n
Sep 24, 2015
This comment has been minimized.
This comment has been minimized.
infrastructure for l10n is i18n. i18n is making a piece of software localizable, l10n is creating the translations. |
This comment has been minimized.
This comment has been minimized.
Agreed. I think they have some handling for that, but I haven't looked into it. |
This comment has been minimized.
This comment has been minimized.
olivren
commented
Sep 25, 2015
|
I agree with the intent of this RFC, but not on the proposed solution. In my experience, translations based on simple key/value formats are a real pain to work with. Finding consistent or meaningful key names is impossible. Developers now have to follow an indirection to know what the content of the string is. The most important part of translating software lies in the tools that make it easy for the translators to keep the translations up to date, and to do so at their own pace. So I think this RFC should just be "internationalize |
This comment has been minimized.
This comment has been minimized.
|
I highly recommend looking at l20n. I don't recommend looking to deeply at this implementation, as it was my first rust code ever, but I feel some absurd urge to include it with my comment about l20n. |
This comment has been minimized.
This comment has been minimized.
|
I don't know a lot about localization (is it just me or are these acronyms ironic given that this is an accessibility issue?), but wouldn't it make sense for this to be built on top of semantic error values that could also have a machine readable (e.g. JSON) output form, as well as a localized human readable String, similar to this RFC? |
nrc
added
the
T-compiler
label
Sep 26, 2015
This comment has been minimized.
This comment has been minimized.
fbstj
commented
Sep 29, 2015
|
I agree with @withoutboats that machine readable debugging is probably much more easily translatable than embedding the i10n/etc inside everywhere. |
This comment has been minimized.
This comment has been minimized.
Nashenas88
commented
Oct 1, 2015
|
Let's not forget that pluralization is not the only thing that varies in translated strings. A huge class of languages does noun declension, where the spelling and pronunciation of a noun changes depending on its usage in a sentence, which can also be mixed with genders. It's not just the messages we have now that would need to change, but the code around how some of those messages are generated too. There are some cases where we programmatically build up parts of the string (I'm not talking about the user's own code, but the messages themselves). This can lead to cases where figuring out how many strings need to be translated will be very difficult to do. I'd also propose that when we do these translation files that the original English translation include an additional column for the context. This is usually a piece of text that describes more information about the text and the words used in order to help a translator understand the context that the translation is used. Not everyone coming up with translations is going to completely understand the code it's used in. They can also be used to describe the sections of the string that are replaced with user content. For example, whether the value that appears in |
This comment has been minimized.
This comment has been minimized.
|
@Nashenas88 That's a good point. l20n makes solving these kinds of grammatical issues relatively simple. |
nikomatsakis
self-assigned this
Oct 1, 2015
This comment has been minimized.
This comment has been minimized.
Nashenas88
commented
Oct 2, 2015
|
@apasel422, I just looked up l20n, and I'm impressed by what it offers. I haven't had a chance to look at the code yet though; I hope it's something we could take advantage of easily. |
This comment has been minimized.
This comment has been minimized.
|
Maybe I missed something in the RFC, but how would this actually work? If the translations are regular format strings (with |
This comment has been minimized.
This comment has been minimized.
|
There is machinery to iterate through format strings at runtime, it's easy to use that. |
This comment has been minimized.
This comment has been minimized.
|
So there is a change I have been contemplating doing that is related to this RFC. It frequently happens in Rust that we have "multipart" error messages, like an error with several explanatory notes, and maybe a help suggestion as well. Currently, each of these is a distinct message, and each has its own span, and it's kind of a big mess. Furthermore, many of the messages -- such as those produced by the borrow checker -- involve "program flow". We currently display this as a multipart message highlighting each point in the code, but this must be "cross-referenced" against the original source somehow. On a related note, our messages often include a lot of terminology that users may not know. Research shows that even simple terms like "function body" can be confusing to new users and so on, to say nothing of things like "object type" or "lvalue". It'd be great if we could find a way to make these terms clearer to people. I was hoping to address all of these points by allowing us to construct richer errors. The idea would be to have:
Anyway, I bring this up here because while these goals are not directly I10N goals, they are obviously served by some of the measures in this RFC. Furthermore, for I10N purposes, I imagine these "multipart" messages ought to be a unit, since the right breakdown will probably be translated differently. I know that sometimes I have to really torture the phrasing to make it make grammatical sense in English, and I assume it would be near impossible to port that across languages. |
This comment has been minimized.
This comment has been minimized.
I was working on something similar but only for Giving a better access to information for newcomers (and even more experimented rust users!) should be a little more considered. |
This comment has been minimized.
This comment has been minimized.
|
I've updated l20n so that it now compiles on stable rust. The parsing and resolving works, but locale negotiation is non-existent at the moment. (repo: https://github.com/seanmonstar/l20n.rs) With some more work, it could be possible to use syntax extensions (or codegen like serde) to compile the l20n templates into rust code at compile time, instead of runtime. (Runtime should stay though, since it's also a possible strategy for an application to download updated language resources and need to compile them at runtime.) |
This comment has been minimized.
This comment has been minimized.
graydon
commented
Nov 12, 2015
|
I will play community memory here and point out that the current format string syntax in rust was chosen explicitly to be compatible with ICU MessageFormat syntax (itself derived from Java's). This is the standard (and has been worked over very thoroughly to accommodate variations in plural, gender and similar dimensions). http://icu-project.org/apiref/icu4j/com/ibm/icu/text/MessageFormat.html |
This comment has been minimized.
This comment has been minimized.
graydon
commented
Nov 12, 2015
|
(The most thorough conversion we had about this was in 2013, starting with https://mail.mozilla.org/pipermail/rust-dev/2013-May/003999.html ... There are lots of arguments and informative links in that thread. ) |
This comment has been minimized.
This comment has been minimized.
|
We discussed this RFC in the @rust-lang/compiler meeting yesterday. The rough consensus was that it is too early to "internationalize" the compiler, even though we would like to do so eventually. Even when just considering English, it is difficult to maintain the quality of error messages, and adding other languages into the mix would be a significant burden. It's also hard for us to judge the quality of those error messages. That said, we are doing some work on overhauling the error reporting infrastructure for IDE integration and better usability, and I expect that this should make internationalization easier longer term (though at the moment we have not been focusing on extracting the text of the messages themselves outside of the compiler). Therefore, I'm inclined to close this RFC for the time being (and open a corresponding issue), but I'd like to hear feedback on that first. |
This comment has been minimized.
This comment has been minimized.
|
What we had in mind @Manishearth and myself was more to provide the structure to allow users to add localization (so rust team can not internationalize anything). However I approve this way of doing it, for now steps need to be done before going more into this. |
This comment has been minimized.
This comment has been minimized.
|
@GuillaumeGomez OK. I will close then for now, but thanks for the interesting discussion. |
GuillaumeGomez commentedSep 24, 2015
rendered
cc @Manishearth