New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate implementing an ECMA 402-based internationalization library #858

Open
steveklabnik opened this Issue Feb 15, 2015 · 52 comments

Comments

Projects
None yet
@steveklabnik
Copy link
Member

steveklabnik commented Feb 15, 2015

Issue by alexcrichton
Wednesday May 28, 2014 at 18:11 GMT

For earlier discussion, see rust-lang/rust#14494

This issue was labelled with: A-libs, P-high in the Rust repository


We have been told that internationalization is pretty standardized at this point on the web and many languages are starting to follow along. The authoritative spec for this is located here, and it sounds like we shouldn't deviate from that spec much (as it's what everyone is expecting).

Nominating, but I do not believe this is a 1.0 issue.

@listochkin

This comment has been minimized.

Copy link

listochkin commented Feb 16, 2015

I actually attempted to start work on this as part of my learning Rust. I wasn't successful at that time but would like to get back to this idea soonish.

@mrhota

This comment has been minimized.

Copy link

mrhota commented Feb 25, 2016

@steveklabnik @alexcrichton any thoughts on why this ECMAScript proposal is the preferred model for an i18n API in Rust instead of, for example, ICU, Java, or C#?

The idea of implementing an ECMAScript API in Rust never occurred to me.

@mastodonfarm

This comment has been minimized.

Copy link

mastodonfarm commented Feb 25, 2016

The 2nd edition of ECMA-402 was published in June 2015:

Web
PDF

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Mar 2, 2016

I'd like to take a stab on this, converting the locale crate (it is basically useless at the moment and that is the most suitable name).

@listochkin, do you have remains of your attempt anywhere for inspiration?

@mrhota, the adding of toLocaleString method (in Rust conventions it will be to_locale_string) to the types that have locale-specific formatting, via trait, seems like the correct Rust way, following the suit of many existing ToSomething traits and to_something methods.

However, there will have to be some differences. I have not read the ECMA-402 standard to as much detail, but:

  • It seems to pass around locale as just names everywhere. I would prefer passing objects.
  • The formatting, and parsing, functions need to be actually implemented on such locale object or its subobjects (C++'s std::locale::facet style)

I am also not sure about the NumberFormat and DateTimeFormat objects. I thought just the TR#35 patterns should be enough (they should be able to define everything needed) and it would leave smaller compatibility surface. However, once we add a feature to the pattern, we have to support it anyway and the we can add setters to the pattern object just fine and the fields don't have to be exported, so we can add features without breaking compatibility just fine.

And with an impl ToNumberFormat for str there would be no inconvenience in the common use-case.

@mrhota

This comment has been minimized.

Copy link

mrhota commented Mar 3, 2016

@jan-hudec I'll help with this effort. I'm excited someone else has an interest in this!

I gather that the desire here is to have a library inspired by this spec, since a Rust "implementation" would be ... strange and unidiomatic.

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Mar 3, 2016

Thank you, @mrhota. First thing will be to come up with reasonable design and name things.

My key requirement is, that formatting for new types can be easily defined. So we will have a trait, similar to Display, but only one and taking a formatting specification, because localization substitutes to dynamic strings, so compile-time checking like std::fmt is not possible.

Rust already uses separate types for times, so time formatting will be triggered by using them and I want similar approach to money, i.e. instead of passing a number and format string indicating monetary format you will pass something like 5 * USD (or 2.5 * EUR or 22 * CZK etc) and that will trigger money formatting, including comparing the currency to the local one and fetching suitable symbol if available (so 5 * GBP will come out as £5.00 is en_GB locale, but as 5.00 GBP in most others.

I also often work with other kinds of dimensional quantities. CLDR contains unit names and abbreviations for various locales, but I haven't seen it integrated in any localization library. And I don't want to integrate it either. I just want to make it easy for application that wants it to extract the necessary bits from CLDR, store it in its translation catalogue or somewhere, and define appropriate formatting for its dimensional quantity type, so it can format 5000 * metre with format ###@@ and get 5.0 km back.

@mrhota

This comment has been minimized.

Copy link

mrhota commented Mar 3, 2016

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Mar 3, 2016

Look at the referencing issue (rust-locale/rust-locale#7). I've put some outline (without much names yet) already.

Regarding i18n, I think locale, as the traditional name, is better for this. The backends will probably get long names, since they will be mostly internal matter and I was thinking about msgfmt or message_format for the formatting layer and either gettext or intl (because the gettext runtime part is called libintl) for the gettext component.

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Mar 4, 2016

The more I think about this, the less I see ECMA-402 as particularly reasonable inspiration for this. I have done internationalization support in work projects, have a pretty good idea of features it should have and I don't see them in the JavaScript version. The C++ locale seems like much better inspiration.

@mrhota

This comment has been minimized.

Copy link

mrhota commented Mar 6, 2016

I saw your post on the user forums about forward compatible locale design. I think github is probably the better place to continue that topic and to combine it with the above discussion. What do you think?

@mrhota

This comment has been minimized.

Copy link

mrhota commented Mar 6, 2016

Ah, never mind.

For future visitors to this issue, it looks like we'll just continue discussions over at rust-locale/rust-locale#7, like @jan-hudec already started doing. 😄

@alexreg

This comment has been minimized.

Copy link

alexreg commented Mar 5, 2017

Have there be any efforts towards this end, lately?

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Mar 6, 2017

@alexreg, I am slowly making some progress on https://github.com/rust-locale/rust-locale/tree/next, but there is still a lot work left. Including the main ECMA-402 bit, trait adding to_local_string and similar methods to various types that need it.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Mar 6, 2017

@jan-hudec Oh, cool. It looks like you're using the C++ stdlib approach of facets + locales then, eh? Let me know if I can help with development!

@alexreg

This comment has been minimized.

Copy link

alexreg commented Mar 6, 2017

(Silly me thought that project was dead just because the master branch was stale, by the way...)

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Mar 7, 2017

@alexreg, well, it's a mix of the C++, the Java and the JavaScript approach, really. I use the term facet from C++ (which I am most familiar with). But instead of storing them in the locale object as C++ does I have them in a static map keyed on the Locale and construct them on demand with a factory, which is closest to how Java works. The advantage is that is can be more easily extended by overriding some of the factories (which is not yet implemented, but shouldn't be hard) and by creating new types of facets—because I have idea that there will be domain-specific extensions (I can think of units and transliterations as possible ones, but there are probably some more).

(Silly me thought that project was dead just because the master branch was stale, by the way...)

It's not dead, but it's been progressing really slowly. I first started and got stuck half way in the exponential numbers. Then it slept for some time and then got a redesign that introduced the inverted objects and RFC5646 language tags, then then got reduced to more appropriate RFC4647 ones. But then I got stuck on the CLDR data and also still had some glitches in the numbers. So around new year I at least published the helper crate locale_config for obtaining the system configuration and couple of weeks ago I finally managed to finish the numbers.

To make a release, I would like to add date and time and add a facet (Localised) with the ECMA-402-like methods for formatting via appropriate facet, so the functions currently used in https://github.com/ogham/exa can be replaced with appropriate new version.

I would definitely welcome some help; we can discuss what in an issue on the https://github.com/rust-locale/rust-locale/ project.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Mar 7, 2017

@jan-hudec Okay, sounds fair. I already created an issue on there, if you can see it... we can also speak on IRC, if you're on there.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Mar 7, 2017

Incidentally, I'm not a big fan of this static map/factory design. It sounds like over-engineering. But maybe we could discuss this.

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Mar 8, 2017

@alexreg

Incidentally, I'm not a big fan of this static map/factory design. It sounds like over-engineering. But maybe we could discuss this.

What I want here is to have a base locale package providing decimal numbers, dates and times, collation, perhaps money and some basic message catalogue, and then add-on packages providing other catalogues, rule-based numbers (i.e. as words, which is often used on bills as tamper-proofing, and non-decimal systems like Roman numerals), quantities (with automatic conversion between metres, kilometres, feet, miles, litres, pints, gallons, degrees Celsius and Fahrenheit etc.), transliteration and other features, and add-on packages using alternate sources of the locale data (to reuse something that might be already installed in the system).

And the static map/factory design makes it trivial to add new facets and quite simple to replace existing ones with more advanced version (the later needs explicit initialisation for now).

@alexreg

This comment has been minimized.

Copy link

alexreg commented Mar 10, 2017

Oh right, I see your motivation better now. Fair enough then.

Just to clarify: when a Locale wants to create its facets, it will look to the factory, and get it to generate the facets. This may just be the default facets (for numbers, currency, date-time, etc.), or they may be "advanced" versions of the facets provided by alternative crates. These alternative crates would override the factory, perhaps by specialisation?

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Mar 10, 2017

@alexreg, yes.

Overriding factories for existing facets will have to be done by registering function pointers, because there are no new types involved (well, there are the concrete types, but most of the code does not know about them). For new types of facets, providing suitable impl is enough.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Mar 11, 2017

Sounds fair enough. I'm sure you've thought this out anyway, and will continue to tweak it where appropriate, to make the developer interface as simple as possible. :)

@petrochenkov petrochenkov added T-libs and removed A-libs labels Jan 28, 2018

@kud1ing

This comment has been minimized.

Copy link
Contributor

kud1ing commented Jun 26, 2018

Not an expert in this matter but in the mean time there is also https://github.com/projectfluent/fluent-rs

@alexreg

This comment has been minimized.

Copy link

alexreg commented Jun 26, 2018

Thanks @kud1ing. Looks interesting. Does the library support numeric, datetime, everything?

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 5, 2018

Hi all!

We just released https://crates.io/crates/intl_pluralrules which brings one of the foundations for any decent intl/l10n API. You can read about it here: https://blog.mozilla.org/l10n/2018/08/03/intl_pluralrules-a-rust-crate-for-handling-plural-forms-with-cldr-plural-rules/

As for l10n - I just released fluent-rs 0.3, which gets closer to be a complete l10n solution. It's still raw and doesn't have any syntactic sugar like macros, but it gets the job done the right way.

As for intl - with plural rules ready, we can start talking about CLDR based date/time formatting, number formatting and units. Maybe collation?
I'm one of the core contributors to ECMA402 (JS Intl) and would be happy to help anyone willing to drive the implementation.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 5, 2018

Does the library support numeric, datetime, everything?

It's ready to support it (which is non-trivial - gettext for example will never be on the syntax level). Once Rust gains any intl date/time/number formatting library, we can plug it into Fluent.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Aug 5, 2018

@zbraniecki Sounds good. Does your project have an impedance mismatch of some sort with @jan-hudec's library, or would you consider using it? I believe his might be further along on things like numerics and datetimes.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 5, 2018

Not sure yet. I think the ultimate goal is aligned - get ECMA402-like basic internationalization features:

  • Date/Time
  • Number
  • Collation

and with time advanced ones:

  • Units
  • RelativeTime
  • ListFormatting

But based on my experience working on CLDR, ECMA402 and Rust, I'm more prone to follow the rust-unic approach of building an array of specialized crates/APIs rather than trying to follow POSIX approach of building some unified locale crate.
There's another difference I noticed. @jan-hudec's crate follows the (once again - POSIX?) approach of using the OS APIs to internationalize data. In my experience this approach has many shortcomings that make most major projects move away from it. In fact, Firefox used to do that, until around 2 years ago we moved away to bundle CLDR/ICU package and unify our internationalization API around it.

intl_pluralrules follows that approach - it bundles the CLDR data making it universal across platforms and independent of them.

I'll investigate upstreaming the crate to make it part of rust-unic, and I'd be interested in pursuing CLDR based intl_datetime, intl_number, intl_list and so on.

As for fluent-rs - this crate can plug into any formatter, including OS-wrapper like @jan-hudec's crate.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 5, 2018

Also, in ECMA402 we're currently standardizing core Intl.Locale crate [0] and in Rust I wrote a very similar crate fluent-locale-rs [1] which handles basic BCP47 language tag manipulation and language negotiation (based on RFC4647, but customized).

I'm interested in upstreaming at least the core manipulation portion of it to unic since it is part of Unicode.

[0] https://github.com/tc39/proposal-intl-locale
[1] https://crates.io/crates/fluent-locale

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 5, 2018

I submitted a request to update arewewebyet - bashyHQ/arewewebyet#120

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Aug 5, 2018

@zbraniecki, I basically got into analysis paralysis and got nowhere with the locale crate. Exactly because putting it in one crate is ultimately the wrong way, because there are too many options and most programs won't need much of that anyway.

The plan with locale was changed from using the system API to doing it myself—except it is too much work—and eventually to binding ICU. Which I started, but didn't get around to compiling it when there is no system one available yet (their system for bundling the data is Insane™ and cross-compilation-unfriendly).

There is one think of it I think is in useful state, the locale_config, which reads the user locale from system configuration (supporting POSIX, Windows (old api) and CGI). It returns basically a BCP47 tag, but extended with category locales to be able to represent the POSIX locale categories and the separate UI language setting.

There are also already two libraries for manipulating the BCP47 tags: language-tag and language-tags; it would perhaps be nice to avoid duplication here. Though I see you did get further—I don't think either of them actually does matching.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Aug 5, 2018

@zbraniecki Sounds like a fair approach. I'm no expert on internationalisation presently, but if you want to get things moving in Rust, I could potentially contribute to this efforts (especially *intl_numberandintl_datetime– I don't know whatintl_list` is). I had planned to contribute to @jan-hudek's project previously, but sadly got very busy at that time (sorry!). That said, it looks like Jan has just written a good analysis (or brief post-mortem, if you can call it that) on his library, and can be salvaged from his and other's work. It makes sense to me. Anyway, let me know if you want to discuss this at some point.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 6, 2018

I basically got into analysis paralysis and got nowhere with the locale crate.

Hahah, every software engineer has their own skeleton closet, right? :)

(their system for bundling the data is Insane™ and cross-compilation-unfriendly).

Yeah, for that reason, both in ECMA402 and in Rust, I'm leaning toward centralizing around CLDR which is a very well maintained database.

It returns basically a BCP47 tag, but extended with category locales to be able to represent the POSIX locale categories and the separate UI language setting.

That's cool! I'm wondering if it should be two smaller crates then: posix-bcp47-converter and os-locale? I see more scenarios where you may want to convert the POSIX extensions to BCP47 unicode extensions and variants.

On the other hand, os-locale may want to read the POSIX and not convert it to BCP47.

it would perhaps be nice to avoid duplication here. Though I see you did get further—I don't think either of them actually does matching.

Yeah, I'm not sure what's the difference between those two. I evaluated the language-tags and reported a small bug I found. Overall the crate works well and I'm not sure what does language-tag brings to the picture.

I would hope that with some sort of intl_locale create (similar to ECMA402 Intl.Locale) we will be able to reduce this crate (and my fluent-locale) to language negotiation role.

I don't know what intl_list is

Internationalized list formatting, think of cases like Anna, John and Mary (standard list), or 2 feet, 10 inches (unit list). It's very useful later in combination with unit formatter, date formatter ("2 hours and 10 minutes") or relative time formatter ("in 2 hours and 22 minutes").

Anyway, let me know if you want to discuss this at some point.

Sure! I'm mainly focused now on finalizing Fluent 1.0 and its rust port, so not much time for intl, but I think the next step can be one of:

I would recommend to wait with intl_number just because ECMA402 is working on a new revision based on the second iteration of ICU NumberFormat and it may make sense to wait for them to settle things.

I'm unlikely to kick off anything until Fluent 1.0, but I'll be happy to advise and contribute code :)

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Aug 6, 2018

@zbraniecki, @alexreg, actually I now remember why I didn't select a set of separate crate for each facet (numbers, datetime etc.): to make the formatting extensible, there should be one trait for formatting things—defining the toLocaleString method—but due to the orphan rules the crate that introduces it must immediately implement it for any standard types that should have it, which means all built-in numeric types and also the std::time::SystemTime (for date+time).

Of course if the toLocaleString method would have different options for each category (number, time etc.), it can be implemented by separate crates, but then the formatting options can't be taken from the formatting template—which may not be a problem, because they have no business being there, it's just (false) convenience.

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Aug 6, 2018

@zbraniecki,

(their system for bundling the data is Insane™ and cross-compilation-unfriendly).

Yeah, for that reason, both in ECMA402 and in Rust, I'm leaning toward centralizing around CLDR which is a very well maintained database.

Well, ICU is CLDR based and already implements most of the things, so binding it should be faster route to something working. And while their data bundling is pretty weird, I actually think when statically linking with Rust, we can simply include! the .dat file in a Rust source defining appropriate extern symbol and things should work. So it's not that insurmountable.

I have three issues why I see it more as a temporary solution than a final one:

  1. The data pack, even if we discard the unicode tables (which Rust already has elsewhere), is something like 20MB and cutting it further down to drop more obscure functionality is not really possible. 20MB statically linked into everything is a lot (with system ICU it's not a problem though). I suspect a better format taking advantage of the inheritance could do better.

  2. While their message formatting should actually support most of the things fluent aims to (they have choice by plural or by property in the template), it does not seem to be extensible in the way of defining new types of parameters, so we can't make it go through the Rust trait for formatting the parameters. But there is a reimplementation of that already—and you'll want the fluent format anyway.

  3. The number formatting seems to only be able to specify precision via the patterns, but they are unwieldy.


It returns basically a BCP47 tag, but extended with category locales to be able to represent the POSIX locale categories and the separate UI language setting.

That's cool! I'm wondering if it should be two smaller crates then: posix-bcp47-converter and os-locale? I see more scenarios where you may want to convert the POSIX extensions to BCP47 unicode extensions and variants.

It is more than BCP47 unicode extensions and variants. In POSIX, you can set e.g. LANG=cs_CZ LC_MESSAGES=en_GB and it will translate it as cs-CZ,messages=en-GB. POSIX has other categories too, but separate messages category exists on Windows too and I believe other systems and users do tend to have mixed locales.

It might make sense to extract the little bit that reads the POSIX environment variables, but it's just that—a bunch of environment variables. And they are not really well defined, so you either convert them to BCP47, or you let setlocale("") interpret them and that's about all you can do with them, so I am not sure it's actually worth it.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Aug 6, 2018

Well, ICU is CLDR based and already implements most of the things, so binding it should be faster route to something working.

Sure, if you're good with "something working" :) My point is that ICU is a massive codebase with quite significant technical debt. I'd argue that CLDR has very little of it.

For that reason, I'd prefer to invest resources (hah, I mean, suggest for us to focus on) in implementing similar APIs in Rust, rather than wrapping C.
I understand that there's work to be done, but quite frankly, to get the basics (date/time/number) it feels quite feasible.

The added benefit is that slicing CLDR is relatively easy and we can make our crates handle data selection both by locale selection and table selection, much easier.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Aug 7, 2018

@jan-hudec Sounds fair to me with regards to keeping it in one crate. Not sure what @zbraniecki thinks though...

@alexreg

This comment has been minimized.

Copy link

alexreg commented Aug 7, 2018

@zbraniecki

Sure! I'm mainly focused now on finalizing Fluent 1.0 and its rust port, so not much time for intl, but I think the next step can be one of:

Fair enough. Let me know when you're done with that though.

  • take intl_pluralrules and merge it into UNIC

As a new crate you mean? Something like unic-pluralrules? I guess we'd want to get that accepted by the UNIC project developers. Do confirm it belongs there, however.

Can you elaborate on this maybe?

  • kick off intl_datetime based on ECMA402's Intl.DateTimeFormat

Yep, this is the first big task, once the smaller things are out of the way!

@NeverGivinUp

This comment has been minimized.

Copy link

NeverGivinUp commented Sep 6, 2018

I was wondering, if the plans you are making will allow for compiling to WASM and using in the context of a single page web application. This requires a) a pure Rust implementation, that does not wrap a pre-compiled library (say C or C++) b) a manageable download size of the internationalization libraries and c) no reliance on the std library.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Sep 6, 2018

intl_pluralrules 1.0 has been released - https://crates.io/crates/intl_pluralrules

Can you elaborate on this maybe?

It would be great to have a low-level BCP47 language tag manipulation library that can parse/operate and serialize language tags as locale objects.. `ECMA402 Intl.Locale is just that, so it's likely a good start.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Sep 7, 2018

@zbraniecki Okay, I'd be happy to work on this, if you think it's easier than one of the above tasks you suggested? Maybe get on Discord/IRC?

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Sep 7, 2018

if you think it's easier than one of the above tasks you suggested

I think it is definitely easier than date time formatting, and lays foundation for it. You could start by taking the fluent-locale-rs crate and extracting just the Locale struct and its parsing/serializing. Add the impl methods that ECMA402 Intl.Locale has and call it unic-locale.

I'm not sure how to fit it into unic crate system, but I hope @behnam can help with that.

@behnam

This comment has been minimized.

Copy link

behnam commented Sep 9, 2018

Sorry for the delays on UNIC-related work. I'm trying to fit the work in my schedule. I'm starting by putting the core data into standalone repos, to make it easier to define a maintainable Locale implementation in UNIC. I'll give more updates on that as soon as I can.

Regarding, ECMA-402-based int'l library, I think it would be a great thing to have, specially to be able to backport for older JS environments. But what we're trying to do in UNIC (and other implementations that are not open-source yet) is to create a more modern API. Again, more on that soon.

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Sep 10, 2018

It would be great to have a low-level BCP47 language tag manipulation library that can parse/operate and serialize language tags as locale objects.

There are two or three on crates.io already. In various state of completeness.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Sep 10, 2018

There are two or three on crates.io already. In various state of completeness.

I only saw crates handling accepted-headers and nothing related to manipulation and serialization of langage tag objects.

In other words, if you mean language-tags crate, then I don't think that's what we're looking for. Same with accept-language.

We're looking for something similar to ECMA402 Intl.Locale and more modern version of ICU's Locale.

Example of usage:

let loc1 = Locale::from_string("en-us-u-hc-h24");
assert_eq!(loc1.get_region(), "US");
assert_eq!(loc1.get_extension_value("unicode", "hourCycle"), "h24");
loc1.set_script("latn");
loc1.set_extension_value("unicode", "hourCycle", "h11");
loc1.set_extension_value("unicode", "calendar", "buddhist");
assert_eq!(loc1.to_string(), "en-Latn-US-u-ca-buddhist-hc-h11");
assert_eq!(loc1.matches_language("en-GB"), true);
assert_eq!(loc1.matches_locale("en-US-u-ca-buddhist-hc-12"), true)

and so on.

@jan-hudec

This comment has been minimized.

Copy link

jan-hudec commented Sep 10, 2018

@zbraniecki, no, I mean primarily language-tag with no s at the end. That one does do the parsing, though I think it's not complete as far as matching goes (and there are two kinds of matching and both are needed).

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Sep 10, 2018

Oh, yeah, this one looks similar. So yep, some merge of this and fluent-locale seems like the right direction. I'd prefer to call the struct locale over langtag, but that's purely a preference.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Sep 14, 2018

So, what we need to do now is the following, regarding a unic-locale crate, as I understand:

  • Include a Locale type.
  • Try to reuse parsing code from fluent-locale crate.
  • Try to reuse parsing code from language-tag crate. (Is this for a different purpose?)
  • "Add the impl methods that ECMA402 Intl.Locale has" -- you mean take the listed operations on locale from the ones listed in the standard?

Correct me if I'm wrong on any of the above. I have a bit of time to work on this now, but I do want to make sure I'm not going to waste effort.

@zbraniecki

This comment has been minimized.

Copy link

zbraniecki commented Sep 14, 2018

This list looks good! fluent-locale and language-tag crates may have different parsing approaches (one seems to be a parser generator, the other is written by hand), but this should be an implementation detail. I'd expect the hand written to be faster, but you can probably benchmark and use whatever is easier - swapping it later will be backward compatible.

For the Ecma402 Intl.Locale - it's worth checking the API there and applying it onto the class. It'll likely come useful in Rust just as much. :)

Happy to provide feedback and/or review, but you may want to also check with @behnam - he seems to be interested in this effort as well and has some experience from his recent work in Python.

@alexreg

This comment has been minimized.

Copy link

alexreg commented Sep 14, 2018

Okay, sure. Let's see what @behnam has to say, then I can get going hopefully. :-)

@behnam

This comment has been minimized.

Copy link

behnam commented Oct 12, 2018

Okay, some updates are coming soon to UNIC, but it will take longer to build the Locale crate. To start, I'm first making a Territory model, which will be responsible for the region part of BCP-47 codes, as well as the sd extension. See open-i18n/rust-unic#234 for more details.

Having the Territory model, then the next focus will be the Language model. And then building Locale on top of those.

The general idea is to not use &str/String for representing i18n data. The top reason is to not hide all the problems under the rug. For example, if a country code goes out of use, the code having conditions against that country code should get compile errors. IMHO, this is necessary for getting a solid i18n library that doesn't require playing whack a mole with bugs down the road.

As we have mentioned over the past year or so, UNIC is still experimenting with how to improve i18n architecture. For any application that doesn't want to get into this experiment, I would recommend using ICU-based solutions, which would work with all of these as strings.

What do you think?

@NeverGivinUp

This comment has been minimized.

Copy link

NeverGivinUp commented Oct 12, 2018

I like the idea of the compiler catching stuff a lot. Specifically in the example you are giving of a country code going out of use though, I believe it should not refuse compiling. Instead it should compile with a warning, so that code that once did compile but needs a change somewhere (say a critical security fix) will continue to compile and run the way it did before (say interfacing with another system, that has not been updated).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment