Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localization team discussion issue #178

Open
sebasmagri opened this issue Jul 12, 2017 · 28 comments
Open

Localization team discussion issue #178

sebasmagri opened this issue Jul 12, 2017 · 28 comments

Comments

@sebasmagri
Copy link

sebasmagri commented Jul 12, 2017

During RustDay in Mexico City, erickt, brson and I discussed about the benefits of having a dedicated localization team in the project.

The goals of such a team would include:

  • Orchestrate Rust content i18n
    • Documentation (partnering with -docs)
    • Videos (Amara)
    • Blog posts / Websites
  • Coordinate efforts to bring/improve i18n/l10n tooling into the ecosystem
  • Evaluate feasibility and implement i18n/l10n into the core tools
    • Compiler error messages (LANG/LC_* compliant on *nix?)

What else should we consider? What examples from other communities and open source projects can we use?

@GuillaumeGomez
Copy link

A long time ago, I opened a PR about adding localization for rustc itself. It was way too early but might be worth being discussed again?

@sebasmagri
Copy link
Author

@GuillaumeGomez Sure thing... the idea of this issue is to start gathering feedback and ideas to define goals for the l10n team. rustc is definitely one of the targets.

@skade
Copy link
Contributor

skade commented Jul 20, 2017

I would also try to get a list of currently running translation efforts and get in touch with the people doing them.

@carols10cents
Copy link
Contributor

Translation efforts for the book are listed in this appendix and they each have an issue labeled Translations.

@carols10cents
Copy link
Contributor

Also this is the mdBook issue for multilingual support.

@spastorino
Copy link

I was talking with @sebasmagri and I think I'd focus on educational resources and the tooling ecosystem part for now.

@dvigneshwer
Copy link

We should also consider reaching out to the existing experienced Mozilla l10n community members for help, contributions, and guidance. They use a lot of cool tools like transifex etc.

@sebasmagri
Copy link
Author

Thanks for the suggestion @vigneshwerd. We're definitely willing to get in touch with them for this initiative.

I would also like to gather some feedback from the Chinese community. cc @KiChjang @tennix, @wayslog, @3442853561, @zonyitoo

@KiChjang
Copy link

Also cc @KaiserY.

@KiChjang
Copy link

KiChjang commented Jul 29, 2017

I believe the most up-to-date Chinese translation efforts are still in https://github.com/ctjhoa/rust-learning/blob/master/zh_CN.md. In particular, I think RustPrimer is being used by quite a lot of people in the Chinese community to learn Rust.

@KaiserY has just told me that they've done translation of the Rust book 2nd edition up to chapter 19. More details in his repo: https://github.com/KaiserY/trpl-zh-cn.

@skade
Copy link
Contributor

skade commented Jul 29, 2017

@sebasmagri Can #124 and #125 be folded into this?

@sebasmagri
Copy link
Author

@skade yep, I think those issues should be part of this. Lets fold it and then we can reopen issues with the requirements well defined in the localisation repo.

@ariasuni
Copy link

Two issues where created a long time ago concerning localization crates: rust-lang/rfcs#822 and rust-lang/rust#14495. I would definitely want to contribute to discussion and code about this.

@sebasmagri
Copy link
Author

cc @hngnaig for Vietnamese efforts. 👍

@3442853561
Copy link

I thought it would be nice to have a multi-lingual document annotation, although I didn't figure out how to do this

@Manishearth
Copy link
Contributor

One thing worth mentioning is that stuff like i18n of the compiler is a really tricky business if we want to do it right. Languages are hard, and building systems that support all of them (e.g. supporting the 6 different kinds of pluralization Arabic has) is a tricky business. If we find a good i18n library in Rust we can use that, but it's likely we'll have to build our own.

We probably should focus on organizing community to translate docs/etc (and organizing these translated docs themselves, we already have a couple), and once this is bootstrapped we can look into i18ning the compiler.

(It seems like the above proposal is very much in line with this, just wanted to reiterate why we should do it that way)

@skade
Copy link
Contributor

skade commented Dec 23, 2017 via email

@KiChjang
Copy link

Indeed, I do see that there has been several discussions about i18n, but we've never moved beyond words, and I think the reason is pretty much because nobody knows how to kick start an i18n project for the Rust compiler. Coupled with the fact that such a change requires an RFC, it makes the task more daunting.

@Manishearth
Copy link
Contributor

Manishearth commented Dec 23, 2017 via email

@psychoslave
Copy link

@sebasmagri invited me to join this discussion following a conversation on the internationalization of Rust itself.

In a nutshell, the proposal is to allow localized source code happen, identifiers and keywords included. So English, or something like "EN-Rust", might still be used as the preferred default locale, while allowing others locales to operate with the level of integration, especially for debugging/profiling sessions. Plus some tools might help to make quick translexicalisation from one locale to an other, in case of migration, or willing to post some snippets when requiring help from an other linguistic community.

@sebasmagri
Copy link
Author

Hi! It's awesome to finally get more feedback here. So I'd like to do a quick review for those who are joining the conversation.

The rustc i18n of error messages is something that has been discussed many times. However, a final solution to this has not been identified. There is no centralized evidence of the different opinions on this front, though. It's all scattered in several issues and comments in RFCs and PRs.

OTOH, there are a bunch of initiatives in the ecosystem to provide good quality and standards based crates for ICU and l10n/i18n. However, there is little communication between them and I think they could take advantage of having a common ground. This common ground will probably be provided by the compiler since it's gonna need it anyway for its internal l10n/i18n support, but it doesn't need to support all the features that fully pledged libraries support. I like to think of this working in a similar way to the log crate; providing base traits and a reference implementation used in rustc and std.

The other part of this is resources and documentation, namely The Rust Programming Language book, which means improving mdBook's support for i18n, and rustdoc support for multilingual docs. Efforts to translate TRPL, for example, has been tracked by @carols10cents, yet it needs more coordination if we want to establish having the official docs translation as a goal.

So, I'd like to invite you all to provide feedback to an in-progress preRFC for a Localisation Team, which considers all the different fronts n which we'd need to work in l10n/i18n in the broader Rust project.

Thanks!

@psychoslave
Copy link

@sebasmagri I'm not sure my suggestion regarding enabling internationalisation and localisation of Rust itself would have its place in this preRFC. Would you be kind enough to confirn or infirm that I should come add this topic in this conversation?

Apart from that, speaking about communication and common ground, I started a research project on Internationalisation of Programming Languages. I will add a Rust section in the part about state of the art. Everyone is welcome to join this wiki project to enrich it on Rust on any other programming language, as long as it is to talk about the topic treated in the research of course.

Kind regards

@Manishearth
Copy link
Contributor

Manishearth commented Dec 23, 2017 via email

@psychoslave
Copy link

OK, then it is a topic which does belong on any open issue, or that a new issue on this topic is welcome, just let me know. Otherwise I will stop to add further comment on this topic on this repository, and will only follow update on the discourse thread.

By the way, you might be interested with
GopherCon 2017: Aditya Mukerjee - Translating Go to Other (Human) Languages, and Back Again - YouTube

I also discovered that Perl 6 slangs open large flexibility regarding what is parsable to feed the underlying interpreter, all through native facilities as far as I can judge from reading the doc.

@feefladder
Copy link

feefladder commented Jan 24, 2024

Heyy, sooo.... here's my 2 cents.

the link @psychoslave shared answers the question from the forum: translation would happen at the level of the lexer. I don't exactly know how strict cargofmt is, but it could very well be possible to specify a locale (or verbosity level) there?

Furthermore, it'd be very nice for crate owners to ensure consistent naming, if they have to define naming themselves. Take this random sample of source code:

(GenericFraction::Rational(sb, vb), GenericFraction::Infinity(se)) => {
  if self.is_one() {
    Ok(GenericFraction::NaN)
  } else {
    match (vb < 1, se) {
      (true, Sign::Plus) => Ok(GenericFraction::zero()),
      (false, Sign::Minus) => Ok(GenericFraction::zero()),
      _ if sb.is_positive() => Ok(GenericFraction::Infinity(Sign::Plus)),
      _ => Ok(GenericFraction::NaN),
    }
  }
}

To the writer, it makes a lot of sense that vb means Value of the Base, since we are calculating a power here. However, for a person who just comes into the crate, it is difficult to understand what is going on here. What could be done is: in my enum:

enum fraction {
  Rational(Sign, Rational)
}


fn some_function_that_is_very_intricate_and_uses_values_a_lot_so_i_have_short_names(input: Fraction, output: Fraction) {
  /// naming scheme: TYPE_VAR_SUPER_SHORT = { <type_name>[0]<var_name>[0] }
  match (input, output) {
    (Rational(si, ri), Rational(so, ro)) => {
      // perform calculations on si ri so ro
    }
  }
  /// end TYPE_VAR_SUPER_SHORT
}

then later, I can change my naming convention if I am confused about my own source code. like:
cargofmt TYPE_VAR_SUPER_SHORT={<type_name>_<var_name>} to get:

enum fraction {
  Rational(Sign, Rational)
}


fn some_function_that_is_very_intricate_and_uses_values_a_lot_so_i_have_short_names(input: Fraction, output: Fraction) {
  /// naming scheme: TYPE_VAR_SUPER_SHORT = { <type_name>_<var_name> }
  match (input, output) {
    (Rational(sign_input, ratio_input), Rational(sign_output, ratio_output)) => {
      // perform calculations on sign_input ratio_input sign_output ratio_output
    }
  }
  /// end TYPE_VAR_SUPER_SHORT
}

which would also write to some file translexications.log:

TYPE_VAR_SUPER_SHORT = { <type_name>[0]<var_name>[0] } -> {<type_name>_<var_name>}

So that when I commit, this gets reverted. (or not and it will be a big mess, bc pre-commit hooks haven't been set up properly)

The important part here is scoping: I promise that all functions in this part (or maybe the file) will adhere to that naming scheme. blabla isomorphic translations blabla

@Manishearth
Copy link
Contributor

I also think that is out of scope for this issue.

@feefladder
Copy link

ohw sorry, I thought it could be a stepping-stone for localization of rust source code, but didn't explain it properly. The basic idea is: Once there is a way to define how identifiers are named and formatted by cargo fmt, it should be possible for my Dutch colleague to translate identifiers to Dutch. Thus allowing for multilingual crates. e.g.

// T = "Type"[0]
// name = { "My" + "Structure"[0..4] }
impl<T> MyStruct {
  // some macro doing:
  // function name = {"power"[0..2]+"integer"[0]}
  // arguments = ["base" + "exponent"]
  fn powi(base: T, exponent: T) -> T;
}

then, my Dutch colleague could do cargo fmt --locale=NL to get:

impl<T> MijnStruct {
  // functienaam = {"macht"[0..2] + "heel_getal"[0]} -> "mach"
  // argumenten = ["grondtal" + "exponent"]
  fn mach(grondtal: T, exponent: T) -> T;
}

which would get translexicalized back to English on commit. It's kind of what https://github.com/ChimeraCoder/koro does, but also has huge benefits for consistent naming in English-only environments. The actual translation of identifiers could be offloaded to fluent-rs or something. I'll open an issue for this at fmt, that's maybe a better place?

@Manishearth
Copy link
Contributor

@feefladder Again, this issue is not about localizing source code identifiers and comments. This is for localizing documentation and other resources.

Localizing source code is a worthwhile endeavor, but it is out of scope for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests