Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(non-latin): properly handle non-latin locale scripts when initializing money. #167

Closed

Conversation

pshoukry
Copy link

@pshoukry pshoukry commented Apr 5, 2024

Money.new when attempting maybe_decimal always assumes a latn script locale. ( ie: latn is hardcoded) which fails for any other locale script.

@@ -2471,8 +2473,8 @@ defmodule Money do
{:ok, symbols} <- Cldr.Number.Symbol.number_symbols_for(locale, backend) do
decimal =
string
|> String.replace(symbols.latn.group, "")
|> String.replace(symbols.latn.decimal, ".")
|> String.replace(symbol_script(symbols, locale).group, "")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kipcole9 this is the important part.

Money.default_backend().put_locale("ar-EG")

assert Money.new(:EGP, "100")
|> Money.to_string() == {:ok, "‏١٠٠٫٠٠ ج.م.‏"}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this test is exercising the change you've proposed since formatting doesn't pass through the parsing code. I think "100" would need to change to "10,000.00" with the appropriate localised separators.

This opens up the broader issue in that parsing still only accepts latin digits when it should accept digits in other scripts (at least digits that are non-algorithmic).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kipcole9 I updated the test to include a larger number so see the actual separator.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and yes it only accepts latin digits with no separators.

@kipcole9
Copy link
Owner

kipcole9 commented Apr 5, 2024

Thanks very much for raising the issue and the PR. Navigating some of the Cldr.Number.Symbol...... code isn't that straight forward or easy so I definitely appreciate you making the effort. Definitely this needs to be fixed.

A couple of questions come to mind where I'd like your thoughts:

  1. Should the script used to resolve the symbols be the locale as specified (either as a :locale param or from Cldr.get_locale/0) or should it be derived from the number itself. Which means doing script detections - which is totally possible but may required adding my unicode lib as a dependency.
  2. Should ex_money parse non-latin digits - not only non-latin separators. It seems logical to me that it should - using latin digits but non-latin separators would seem unusual?

defp symbol_script(symbols, locale) do
script =
locale
|> Cldr.Locale.script_from_locale()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That you have to go through this atom/string munging is horrible. I need to update ex_cldr_numbers to use a canonical form, which will be :Latn. I will fix that in time for the next ex_cldr release cycle which will be April 17th alongside CLDR 45's release.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I created some confusion between a script and a number system. :Latn is the script. :latn is a number system. So here the call should be Cldr.Number.System.number_system_from_locale/1 which will return :arab for the locale ar-EG as expected. I'll do that on my end, no action required on your end. And this will get rid of the horrible atom/string/atom munging.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kipcole9 sorry I must have confused both .. perfect .. thank you for the solution

@pshoukry
Copy link
Author

pshoukry commented Apr 5, 2024

@kipcole9

  1. I think it should detect them since at least in Egypt although non latin numbers are the default they are rarely used. So no guarantee if locale is AR the digits will be non latin.
  2. Agree it should parse both.
    Sorry I rethought my answer since I am not very familiar with CLDR! but i am willing to help if i can.

@kipcole9
Copy link
Owner

kipcole9 commented Apr 5, 2024

Agree it should parse both.

This is very similar to what I already do in Cldr.Number.Parser.parse/2. For example:

iex> Cldr.Number.Parser.parse "١٢٣٤٥", locale: "ar"
{:ok, 12345}

And it also handles the separators correctly for a locale.

I think the code in ex_money here came before the generalised number parsing. The right approach now, I think, is to use Cldr.Number.Parser.parse/2 rather than the implementation in ex_money. It's also much more lenient in respect of separators - Unicode has quite a lot of them that look very similar - much like your example for ar-EG.

Given the need to make changes in both ex_cldr_numbers and ex_money, are you ok if the implementation is published on April 17th?

Lastly, for now, thank you for the engagement. I would really value having a collaborator in the arabic-speaking world. I have no working knowledge of arabic, or right-to-left languages. Would you mind if I ask you questions from time to time as I work on implementation?

@kipcole9
Copy link
Owner

kipcole9 commented Apr 5, 2024

One question that immediately comes to mind, if I'm parsing "١٢٣٤٥" and there is no direction indicator (LTR or RTL), how do I know if I should be parsing this as 12345 or 54321?

@pshoukry
Copy link
Author

pshoukry commented Apr 6, 2024

@kipcole9 not at all, I would love to answer questions and also help with coding if you are ok with that. I worked with MS arabic localization team for a couple of years back in the 2000s so i have some knowledge their ... just needs a bit of brushing :)

For the arabic numbers their direction is still LTR not RTL please check this article https://www.linkedin.com/pulse/why-did-arabs-ditch-arabic-numerals-ayman-el-ghazzawi#:~:text=English%20is%20written%20from%20left,written%20from%20left%20to%20right.

@kipcole9
Copy link
Owner

kipcole9 commented Apr 6, 2024

Thanks Peter, thats very kind and helpful of you. I look forward to collaborating more.

If I've understood you correctly, it is common to use latin numerals, at least in ar-EG, but with arabic separators. Being 0x066c for the grouping separator and 0x66b for the decimal separator. (wow, they look almost identical on my system!).

Are there cases where the latin separators are used? I'm trying to work out how lenient to be in parsing - as long as there's no ambiguity introduced.

@pshoukry
Copy link
Author

pshoukry commented Apr 6, 2024

@kipcole9 sorry maybe i wasn't clear, Its common to use latin numberals with latin separators but if someone uses non-latin ( hindi numerals) they will then use also non-latin separators.

@kipcole9
Copy link
Owner

I'm back working on this @pshoukry. And will have a fix as part of the CLDR 45 release cycle on April 17th.

@kipcole9
Copy link
Owner

Closing as fixed. Thanks of the PR, it's much appreciated. In the end I added some slightly different code for the same purpose. I will be publishing today a new release with this fix.

@kipcole9 kipcole9 closed this Apr 20, 2024
@pshoukry
Copy link
Author

Thanks @kipcole9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants