-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fn:format-number: relax restrictions on exponent-separator (possibly minus-sign, percent, per-mille) #1048
Comments
Use of multi-character representations in the formatted number is not a problem (as with NaN, Infinity). Use of such symbols in the picture string is potentially much more problematic as we need to be sure that the picture string parses unambiguously. For example using |
True; it would be easier if the picture string was not language-specific. |
A pragmatic albeit ugly solution would be allow |
I think we should try to avoid manually tweaking decimal formats that are provided by existing languages and libraries. Instead, we should rather try to get closer to what other languages do… // JavaScript
new Intl.NumberFormat('de').format(1234.56);
// Java, with default picture string
DecimalFormat.getInstance(Locale.GERMANY).format(1234.56); …and simplify the most common requests. Indeed, I would guess that a syntax as simple as… format-number(1234.56, 'de') …is what would make most users more than happy. Related to the original thread, it would possibly be better to introduce a mode in which the default pattern can be used to specify patterns. That’s what Java and other languages offer: DecimalFormat df = (DecimalFormat) DecimalFormat.getInstance(Locale.GERMANY);
df.applyPattern("#,##0.00")
df.format(1234.56); The difficult question remains what would be the easiest syntax… |
It would be quite a big departure to use locales for number formatting rather than explicit specification; and personally I'm not sure it would be a good move. I've always taken the view that the way you format dates and numbers is more likely to depend on what publisher you are working for than on what country you are in. For example the decision whether to use a period (.) or a middle dot (·) as a decimal separator is a question of editorial house-style, not a question of what language you speak. The notion that Norwegians always write exponential/scientific notation differently from the rest of Europe is clearly absurd. (I'm afraid when it comes to localization, my views have always been a bit maverick. Partly because we live in an increasingly globalised|globalized world; and probably a consequence of growing up in a bilingual family). |
The good thing is that locales don’t only stand for languages, but regions as well. In Germany, I would claim that the grouping and decimal separator is almost always the same, so there are hardly reasons to deviate from the standard What I can safely confirm is that many users seem to simply avoid string(1.23) ! replace('\.', ',')
string(1.23) => translate('.,', ',.') …which works fine, but would probably not be what we would encourage to do. From the implementor perspective, we can benefit a lot from the extensive work that has been done by the Java and ICU folks. It’s straightforward to apply settings of the predefined locales to And sorry again for mixing up two topics; my thoughts won’t provide a solution for the original problem of multi-character symbols. An obvious option would be providing an option… format-number(1234.56, '#,##0.00' ,'de', options := map { 'default-picture': true() }) …but it would certainly be too verbose. I would favor a fixed character or prefix in the picture string (provided it doesn’t render the picture string ambiguous): (: 1,234.56 :) format-number(1.2 , '=0.0' , 'de')
(: ١٢٣٠٫٠٠e :) format-number(1.2e3, '=0.0e0', 'ar')
(: 10 0/00 :) format-number(.01, '=0 ‰' , 'en-US-posix') Of course another option is to regard the cases as too specific. I have only encountered them by analysing the pre-defined decimal format symbols of Java and ICU, and I don’t know how many people have ever missed the possibility of outputting correctly formatted Arabic exponential signs. |
That might be true*, but I would also claim that whether you write *And even then it's not always true. A quick glance at the Frankfurter Allgemeine quickly found a reference to "iOS 17.3". |
Yes, I can sense that.
It’s true: For version numbers of software, I would never use commas either (or any formatting functions at all). My intuitive rationale would be that |
Going off at a tangent here, but yes, with version numbers we would tend to think that 17.10 comes after 17.9. Let's hope we never have an XQuery version 4.10, because the current spec says it is the same thing as version 4.1. |
I'm told that the distinction between "Samstag" and "Sonnabend" for Saturday is traditionally based on religious affiliation, which is only loosely correlated with region; but these days it certainly depends mainly on which publisher's house style you are following. |
Maybe not. I’m guilty of |
Ironically, and as I learned just recently, the term “Sonnabend” is an Anglicism: It was brought to Germany by an Anglo-Saxon missionary centuries ago (while England itself eventually went for “Saturday”), and it became an official term much later in the officially non-religious GDR/DDR, which is why it’s still popular in Eastern/Northern areas of Germany. Today, as far as I know, “Samstag” is the official/recommended term, which I would assume most publishers use (as one advantage is that the short version “Sa” differs from “So”, which is used for “Sonntag”), but… there may be exceptions (indeed, ICU won’t help you here, as it’s not possible to name an exact region for it; it only knows „Samstag”).
;·) Let’s see if we can avoid it. |
Proposal. For decimal format properties that define characters used both in the picture string and the result string, specifically decimal-separator, grouping-separator, exponent-separator, percent and per-mille (but not zero-digit) we allow the format property to take the value "x:y", where x is a single character indicating the marker used in the picture string, and y is an arbitrary string indicating the form used in the result string. For example, if the For minus-sign, we remove the constraint that the value must be a single character. |
The PR was accepted so the issue is closed. |
The current rules for decimal formats are too restrictive (i.e., too much focused on Anglo-Saxon formatting rules). The most prominent case is the Arabic exponent-separator „character“, which consists of two characters:
عر
(https://www.localeplanet.com/icu/ar/). The exponent separator of other locales is not restricted to a single character either. For example,se-NO
uses·10^
.When we include the ICU library in the analysis, we also find
minus-sign
,percent
andper-mille
properties that are longer than 1 character. Examples:minus-sign
character forhe
consists of200e
and002d
(200e
is the Left-to-Right Mark).percent
character consists of066a
and061c
(061c
is the “Arabic Letter Mark”).per-mille
property ofen-US-posix
is0/00
.The text was updated successfully, but these errors were encountered: