Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number localisation for Asian #16683

Open
tribela opened this issue Aug 31, 2021 · 6 comments
Open

Number localisation for Asian #16683

tribela opened this issue Aug 31, 2021 · 6 comments
Labels
area/web interface Related to the Mastodon web interface i18n Internationalization and localization suggestion Feature suggestion

Comments

@tribela
Copy link
Contributor

tribela commented Aug 31, 2021

Pitch

Currently, Mastodon displays statuses count, user count, etc. like "30k", "1.8M".
But it is not translatable for most Asian locale. Because asian uses 10000 based numbering system instead of 1000 based (western) system.

For example: Twitter already implemented this behaviour
Twitter :verified: 1.4만 트윗
Twitter :verified: 14.7k Tweets

Motivation

For better localisation.

@tribela
Copy link
Contributor Author

tribela commented Aug 31, 2021

If using TwitterCldr, It looks like this:

before after
image image

@ClearlyClaire
Copy link
Contributor

This is, I think, supposed to be handled by number_to_human, but we have overridden the precision and other settings because the data from rails-i18n was incorrect for several locales.

It could probably be fixed by using just number_to_human but we'd have to review how it's called as well as each locale definition.

@brawaru
Copy link
Contributor

brawaru commented Sep 9, 2021

[More of a maintenance note than a contribution to the issue]

My PR #14061 is affected by this issue too and it's a bit tricky to fix because it's nothing but a hack.

The problem with it is that creators of Intl.NumberFormat and Intl.PluralRules browser APIs (which we rely on) did not supply us with methods to correctly pluralise the words based on the number of the compact notation. In some languages, and I can speak for Russian here, plural rules change for compact notation: for example — ‘10 321 пост’, but ‘1.3 тыс. постов’.

In comparison, in ICU4J (Java library), you can format a number to short notation and it returns you an object which you can either convert to string (if you need a value right away) or supply to your plural rule select function, which gets you a plural category to use when localizing the string. That is how things should be, actually, but uh oh. We've got tc39/ecma402#397, but there have been no updates for over a year now.

Current our solution manually finds a ‘best way’ to short a number based on, unfortunately, this 1000 ‘western’ system, while finding a way, it provides us a division, based on which we can calculate a value which we then base plural on. This is why if you worked on a translation you saw two variables count and counter, where count will be that exact ‘pluralisation’ value and the ‘counter’ is actual number of compact notation.

Walk-through

Given number 10,321:

  1. 10,321 is less than a million, so we use a ‘thousands’ division. Because it is also less than ten thousand we allow up to 1 fraction digit. The result of 10,321 / 1,000 = 10.321 (that is the number we're going to display in counter, but formatted and will append ‘K’ for ‘thousands’ to it).
  2. To get ‘plural ready’ number:
    1. We check that the division is not less than one hundred (100), otherwise we return number as is.
    2. We take the division (which is 1,000) and divide it to 10 to get ‘closest scale’.
    3. Then we divide our number, 10,321, to the closest scale and throw away the fraction point (giving us nice 103).
    4. The result we then multiply by the closest scale, which gets us 10,300.
  3. Then we simply use ‘plural ready’ number when formatting a message while using counter placeholder containing real value inside the plural placeholder.
    • {pluralReady, plural, one {{count} user} other {{count} users}} + { pluralReady: 10_300, count: '10K' }10.3K users.
    • {pluralReady, plural, one {{count} пользователь} few {{count} пользователя} many {{count} пользователей} other {{count} пользователя}} + { pluralReady: 10_300, count: '10 тыс.' }10.3 тыс. пользователей.
    • {count}ユーザー + { pluralReady: 10_300, count: '10.3K' }10.3Kユーザー (expected count to equal 1万)
    • {count}사용자 + { pluralReady: 10_300, count: '10.3K' }10.3K사용자 (expected count to equal 1만)

I will be thinking about the solution to that on free time, but right now I have no worthy ideas. Worst case scenario we'll have to wait for someone to propose the solution in TC39.

The showstopper is that we need CLDR data in order to perform smart calculations, in this case we need patterns and thresholds for compact notations (example for Korean, for Japanese, for Russian). Without this data we can't be sure how to exactly get those numbers — one of the steps in CLDR guide basically tells you ‘from a threshold yeet a number of zeroes from the pattern and then you get a divisor’ (e.g. for threshold 10_000 you remove... no zeroes, because one zero is always skipped; but from 100_000 you remove 1 zero (pattern ‘00만’)).

Even Twitter is prone to this issue:

Screenshot of Twitter user header for user ‘Surma’, who has ‘21.2 тыс. твита’.

Screenshot of React Developer Tools properties (supposedly for an element on screenshot above)

It'd be ‘твита’ for 21,243 (matching ‘few’ per Russian plural rules), but it's ‘твитов’ for 21.2 thousands tweets (matching ‘many’ for [visible] 21,200). It'd be ‘other’ for 21.2, which would result in ‘21.2 тыс. твита’, which is basically like a few, and so is incorrect.

So yeah, that's the note. ‘thanks for coming to my TED talk [about how internationalisation can be a headache]’.

@mashirozx
Copy link
Contributor

Another strange behavior is it display as something like 1.539k, which seems meaningless…
IMG_20211004_084457

@brawaru
Copy link
Contributor

brawaru commented Oct 6, 2021

https://codesandbox.io/s/pn5qp?file=/src/main.tsx

This is one hacky solution, it:

  • forcefully replaces browser provided Intl.NumberFormat with a polyfill
  • loads data for required locales (in the example it loads all locales at once but it's pretty possible to load them individually on demand, although that should be happening after injecting Intl.NumberFormat and before actually using it)
  • (ab)uses internal APIs like ComputeExponent, FormatNumericToString of polyfill implementation
  • implements CompactNumber component (similar to ShortNumber we already have)

The prototype code and ‘design’ is a mess of course, pretty sure implementation can be cleaner in ways.

But yeah, here's that :fundyDead:

GIF taken from demo link, demonstrating compact notation in various languages.

@brawaru
Copy link
Contributor

brawaru commented Oct 17, 2023

Since Mastodon has been updated to the newest versions of FormatJS libraries, perhaps you can check out @vintl/compact-number developed by me based on the solution above. It doesn't require a polyfill since I generate slice of CLDR data manually, but it's still not the smallest library out there, although some stuff will be de-duplicated because it relies of FormatJS APIs.

I don't think the direct fix for this problem lands in browsers any time soon, so we can only resort to all sorts of hacks.

@vmstan vmstan added i18n Internationalization and localization suggestion Feature suggestion area/web interface Related to the Mastodon web interface labels Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/web interface Related to the Mastodon web interface i18n Internationalization and localization suggestion Feature suggestion
Projects
None yet
Development

No branches or pull requests

5 participants