fix(non-latin): properly handle non-latin locale scripts when initializing money. #167

pshoukry · 2024-04-05T15:12:44Z

Money.new when attempting maybe_decimal always assumes a latn script locale. ( ie: latn is hardcoded) which fails for any other locale script.

pshoukry · 2024-04-05T15:13:19Z

lib/money.ex

@@ -2471,8 +2473,8 @@ defmodule Money do
         {:ok, symbols} <- Cldr.Number.Symbol.number_symbols_for(locale, backend) do
      decimal =
        string
-        |> String.replace(symbols.latn.group, "")
-        |> String.replace(symbols.latn.decimal, ".")
+        |> String.replace(symbol_script(symbols, locale).group, "")


@kipcole9 this is the important part.

kipcole9 · 2024-04-05T18:02:41Z

test/money_test.exs

+    Money.default_backend().put_locale("ar-EG")
+
+    assert Money.new(:EGP, "100")
+           |> Money.to_string() == {:ok, "‏١٠٠٫٠٠ ج.م.‏"}


I don't think this test is exercising the change you've proposed since formatting doesn't pass through the parsing code. I think "100" would need to change to "10,000.00" with the appropriate localised separators.

This opens up the broader issue in that parsing still only accepts latin digits when it should accept digits in other scripts (at least digits that are non-algorithmic).

@kipcole9 I updated the test to include a larger number so see the actual separator.

and yes it only accepts latin digits with no separators.

kipcole9 · 2024-04-05T18:07:41Z

Thanks very much for raising the issue and the PR. Navigating some of the Cldr.Number.Symbol...... code isn't that straight forward or easy so I definitely appreciate you making the effort. Definitely this needs to be fixed.

A couple of questions come to mind where I'd like your thoughts:

Should the script used to resolve the symbols be the locale as specified (either as a :locale param or from Cldr.get_locale/0) or should it be derived from the number itself. Which means doing script detections - which is totally possible but may required adding my unicode lib as a dependency.
Should ex_money parse non-latin digits - not only non-latin separators. It seems logical to me that it should - using latin digits but non-latin separators would seem unusual?

kipcole9 · 2024-04-05T18:12:29Z

lib/money.ex

+  defp symbol_script(symbols, locale) do
+    script =
+      locale
+      |> Cldr.Locale.script_from_locale()


That you have to go through this atom/string munging is horrible. I need to update ex_cldr_numbers to use a canonical form, which will be :Latn. I will fix that in time for the next ex_cldr release cycle which will be April 17th alongside CLDR 45's release.

Actually I created some confusion between a script and a number system. :Latn is the script. :latn is a number system. So here the call should be Cldr.Number.System.number_system_from_locale/1 which will return :arab for the locale ar-EG as expected. I'll do that on my end, no action required on your end. And this will get rid of the horrible atom/string/atom munging.

@kipcole9 sorry I must have confused both .. perfect .. thank you for the solution

pshoukry · 2024-04-05T18:47:11Z

@kipcole9

I think it should detect them since at least in Egypt although non latin numbers are the default they are rarely used. So no guarantee if locale is AR the digits will be non latin.
Agree it should parse both.
Sorry I rethought my answer since I am not very familiar with CLDR! but i am willing to help if i can.

kipcole9 · 2024-04-05T22:47:20Z

Agree it should parse both.

This is very similar to what I already do in Cldr.Number.Parser.parse/2. For example:

iex> Cldr.Number.Parser.parse "١٢٣٤٥", locale: "ar"
{:ok, 12345}

And it also handles the separators correctly for a locale.

I think the code in ex_money here came before the generalised number parsing. The right approach now, I think, is to use Cldr.Number.Parser.parse/2 rather than the implementation in ex_money. It's also much more lenient in respect of separators - Unicode has quite a lot of them that look very similar - much like your example for ar-EG.

Given the need to make changes in both ex_cldr_numbers and ex_money, are you ok if the implementation is published on April 17th?

Lastly, for now, thank you for the engagement. I would really value having a collaborator in the arabic-speaking world. I have no working knowledge of arabic, or right-to-left languages. Would you mind if I ask you questions from time to time as I work on implementation?

kipcole9 · 2024-04-05T22:49:13Z

One question that immediately comes to mind, if I'm parsing "١٢٣٤٥" and there is no direction indicator (LTR or RTL), how do I know if I should be parsing this as 12345 or 54321?

pshoukry · 2024-04-06T05:13:16Z

@kipcole9 not at all, I would love to answer questions and also help with coding if you are ok with that. I worked with MS arabic localization team for a couple of years back in the 2000s so i have some knowledge their ... just needs a bit of brushing :)

For the arabic numbers their direction is still LTR not RTL please check this article https://www.linkedin.com/pulse/why-did-arabs-ditch-arabic-numerals-ayman-el-ghazzawi#:~:text=English%20is%20written%20from%20left,written%20from%20left%20to%20right.

kipcole9 · 2024-04-06T06:34:33Z

Thanks Peter, thats very kind and helpful of you. I look forward to collaborating more.

If I've understood you correctly, it is common to use latin numerals, at least in ar-EG, but with arabic separators. Being 0x066c for the grouping separator and 0x66b for the decimal separator. (wow, they look almost identical on my system!).

Are there cases where the latin separators are used? I'm trying to work out how lenient to be in parsing - as long as there's no ambiguity introduced.

pshoukry · 2024-04-06T10:39:05Z

@kipcole9 sorry maybe i wasn't clear, Its common to use latin numberals with latin separators but if someone uses non-latin ( hindi numerals) they will then use also non-latin separators.

kipcole9 · 2024-04-13T02:31:50Z

I'm back working on this @pshoukry. And will have a fix as part of the CLDR 45 release cycle on April 17th.

kipcole9 · 2024-04-20T22:25:52Z

Closing as fixed. Thanks of the PR, it's much appreciated. In the end I added some slightly different code for the same purpose. I will be publishing today a new release with this fix.

pshoukry · 2024-04-21T12:34:35Z

Thanks @kipcole9

fix(non-latin): properly represent non-latin currency

323a084

pshoukry commented Apr 5, 2024

View reviewed changes

kipcole9 reviewed Apr 5, 2024

View reviewed changes

fix: fix large numbers with separator

7bb5f0b

kipcole9 closed this Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(non-latin): properly handle non-latin locale scripts when initializing money. #167

fix(non-latin): properly handle non-latin locale scripts when initializing money. #167

pshoukry commented Apr 5, 2024 •

edited

Loading

pshoukry Apr 5, 2024

kipcole9 Apr 5, 2024

pshoukry Apr 5, 2024

pshoukry Apr 5, 2024

kipcole9 commented Apr 5, 2024 •

edited

Loading

kipcole9 Apr 5, 2024

pshoukry Apr 5, 2024

kipcole9 Apr 13, 2024

pshoukry Apr 13, 2024

pshoukry commented Apr 5, 2024 •

edited

Loading

kipcole9 commented Apr 5, 2024

kipcole9 commented Apr 5, 2024

pshoukry commented Apr 6, 2024

kipcole9 commented Apr 6, 2024

pshoukry commented Apr 6, 2024 •

edited

Loading

kipcole9 commented Apr 13, 2024

kipcole9 commented Apr 20, 2024

pshoukry commented Apr 21, 2024

fix(non-latin): properly handle non-latin locale scripts when initializing money. #167

fix(non-latin): properly handle non-latin locale scripts when initializing money. #167

Conversation

pshoukry commented Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kipcole9 commented Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pshoukry commented Apr 5, 2024 • edited Loading

kipcole9 commented Apr 5, 2024

kipcole9 commented Apr 5, 2024

pshoukry commented Apr 6, 2024

kipcole9 commented Apr 6, 2024

pshoukry commented Apr 6, 2024 • edited Loading

kipcole9 commented Apr 13, 2024

kipcole9 commented Apr 20, 2024

pshoukry commented Apr 21, 2024

pshoukry commented Apr 5, 2024 •

edited

Loading

kipcole9 commented Apr 5, 2024 •

edited

Loading

pshoukry commented Apr 5, 2024 •

edited

Loading

pshoukry commented Apr 6, 2024 •

edited

Loading