Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fn:format-number: Specifying decimal format #340

Closed
ChristianGruen opened this issue Feb 7, 2023 · 6 comments
Closed

fn:format-number: Specifying decimal format #340

ChristianGruen opened this issue Feb 7, 2023 · 6 comments
Labels
Enhancement A change or improvement to an existing feature Propose for V4.0 The WG should consider this item critical to 4.0 XQuery An issue related to XQuery

Comments

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Feb 7, 2023

It would be nice if the decimal format for fn:format-number could also be supplied via an additional argument. The current syntax is:

(: result: 12.345,67 :)
declare decimal-format de decimal-separator = ',' grouping-separator = '.';

format-number(
  value := 12345.67,
  picture := '#.##0,00',
  decimal-format-name := 'de'
)

The syntax could be enhanced as follows:

format-number(
  value := 12345.67,
  picture := '#.##0,00',
  format := map { 'decimal-separator': ',', 'grouping-separator': '.' }
)

If both decimal-format-name and format are supplied, an error should be raised.

Edit 2023-05-02, adopted from a comment further below:

Next, language-specific default settings would be sensible. The existing syntax could be used:

format-number(12345.67, '#.##0,00', 'de')

As known from the other functions for formatting numbers and dates, it could be up to the implementation to decide which languages are supported. The defaults could be overwritten by custom decimal-format declarations in the prolog to ensure that a setting is applied, even if an implementation does not support it.

@ChristianGruen ChristianGruen added XQuery An issue related to XQuery Enhancement A change or improvement to an existing feature labels Feb 7, 2023
@michaelhkay
Copy link
Contributor

Yes, the current mechanism is very clumsy. I think the original intent in XSLT 1.0 was probably to define presentation "at arm's length" so that the logic didn't need to change if the output format changed, but that can be achieved perfectly well by putting the options in a global variable.

@ChristianGruen
Copy link
Contributor Author

I think that language-specific default settings would be sensible:

format-number(123.45, '#.##0,00', 'de')

As known from the other functions for formatting numbers and dates, it could be up to the implementation to decide which languages are supported. The defaults could be overwritten by custom decimal-format declarations in the prolog to ensure that a setting is applied, even if an implementation does not support it.

@michaelhkay
Copy link
Contributor

I'm not convinced this would give good interoperability. Consider Arabic for example: should it default to using western or eastern decimal digits? Both are in widespread use, and the idea that everyone with a particular (country, language) combination uses the same conventions is fundamentally misguided. This doesn't matter too much if it merely affects the format of the output, but it does matter if it makes a picture string valid in one implementation and invalid in another.

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Feb 12, 2023

I agree, there are cases which are easier to handle and others are more sophisticated. I think the same is true for formatting integers and dates: The rules are rich and sophisticated, but for more advanced use cases (such as spelling out correct hiragana for numbers with Japanese counter words, or considering declension of numerals in Russian), you’ll be lost without writing custom code.

With ICU and Java, it’s fairly straightforward to choose language-specific formatting rules. I haven’t checked if there are flags to e.g. control formatting for Arabic numbers, and it could be that ICU has really taken the wrong path. From a German perspective, though, it’s restrictive that an implementation cannot provide sane defaults for Non-English users.

This is how ICU formats integers with different locales:

Result Locales
1,234,567 ak, ak_GH, am, am_ET, ar_AE, ar_EH, asa, asa_TZ, bem, bem_ZM, bez, bez_TZ, bm, bm_ML, bo, bo_CN, bo_IN, ce, ce_RU, ceb, ceb_PH, cgg, cgg_UG, chr, chr_US, cy, cy_GB, dav, dav_KE, doi, doi_IN, ebu, ebu_KE, ee, ee_GH, ee_TG, en, en_001, en_150, en_AE, en_AG, en_AI, en_AS, en_AU, en_BB, en_BI, en_BM, en_BS, en_BW, en_BZ, en_CA, en_CC, en_CK, en_CM, en_CX, en_CY, en_DG, en_DM, en_ER, en_FJ, en_FK, en_FM, en_GB, en_GD, en_GG, en_GH, en_GI, en_GM, en_GU, en_GY, en_HK, en_IE, en_IL, en_IM, en_IO, en_JE, en_JM, en_KE, en_KI, en_KN, en_KY, en_LC, en_LR, en_LS, en_MG, en_MH, en_MO, en_MP, en_MS, en_MT, en_MU, en_MV, en_MW, en_MY, en_NA, en_NF, en_NG, en_NR, en_NU, en_NZ, en_PG, en_PH, en_PK, en_PN, en_PR, en_PW, en_RW, en_SB, en_SC, en_SD, en_SG, en_SH, en_SL, en_SS, en_SX, en_SZ, en_TC, en_TK, en_TO, en_TT, en_TV, en_TZ, en_UG, en_UM, en_US, en_VC, en_VG, en_VI, en_VU, en_WS, en_ZM, en_ZW, es_419, es_BR, es_BZ, es_CU, es_DO, es_GT, es_HN, es_MX, es_NI, es_PA, es_PE, es_PR, es_SV, es_US, fil, fil_PH, ga, ga_GB, ga_IE, gd, gd_GB, guz, guz_KE, gv, gv_IM, ha, ha_GH, ha_NE, ha_NG, haw, haw_US, he, he_IL, ig, ig_NG, ii, ii_CN, ja, ja_JP, jmc, jmc_TZ, kam, kam_KE, kde, kde_TZ, ki, ki_KE, kln, kln_KE, kn, kn_IN, ko, ko_KP, ko_KR, kok, kok_IN, ks_Deva, ks_Deva_IN, ksb, ksb_TZ, kw, kw_GB, lag, lag_TZ, lg, lg_UG, lkt, lkt_US, luo, luo_KE, luy, luy_KE, mai, mai_IN, mas, mas_KE, mas_TZ, mer, mer_KE, mg, mg_MG, mgo, mgo_CM, mi, mi_NZ, mn, mn_MN, ms, ms_MY, ms_SG, mt, mt_MT, naq, naq_NA, nd, nd_ZW, nus, nus_SS, nyn, nyn_UG, om, om_ET, om_KE, pcm, pcm_NG, qu, qu_EC, qu_PE, rof, rof_TZ, rwk, rwk_TZ, saq, saq_KE, sbp, sbp_TZ, sd_Deva, sd_Deva_IN, si, si_LK, sn, sn_ZW, so, so_DJ, so_ET, so_KE, so_SO, sw, sw_KE, sw_TZ, sw_UG, ta_MY, ta_SG, teo, teo_KE, teo_UG, th, th_TH, ti, ti_ER, ti_ET, to, to_TO, ug, ug_CN, ur, ur_PK, vai, vai_Latn, vai_Latn_LR, vai_Vaii, vai_Vaii_LR, vun, vun_TZ, xog, xog_UG, yi, yi_001, yo, yo_BJ, yo_NG, yue, yue_Hans, yue_Hans_CN, yue_Hant, yue_Hant_HK, zh, zh_Hans, zh_Hans_CN, zh_Hans_HK, zh_Hans_MO, zh_Hans_SG, zh_Hant, zh_Hant_HK, zh_Hant_MO, zh_Hant_TW, zu, zu_ZA
1.234.567 ar_DZ, ar_LY, ar_MA, ar_TN, ast, ast_ES, az, az_Cyrl, az_Cyrl_AZ, az_Latn, az_Latn_AZ, bs, bs_Cyrl, bs_Cyrl_BA, bs_Latn, bs_Latn_BA, ca, ca_AD, ca_ES, ca_FR, ca_IT, da, da_DK, da_GL, de, de_BE, de_DE, de_IT, de_LU, dsb, dsb_DE, el, el_CY, el_GR, en_AT, en_BE, en_DE, en_DK, en_NL, en_SI, es, es_AR, es_BO, es_CL, es_CO, es_EA, es_EC, es_ES, es_GQ, es_IC, es_PH, es_PY, es_UY, es_VE, eu, eu_ES, fo, fo_DK, fo_FO, fr_LU, fr_MA, fur, fur_IT, fy, fy_NL, gl, gl_ES, hr, hr_BA, hr_HR, hsb, hsb_DE, ia, ia_001, id, id_ID, is, is_IS, it, it_IT, it_SM, it_VA, jgo, jgo_CM, jv, jv_ID, kgp, kgp_BR, kkj, kkj_CM, kl, kl_GL, km, km_KH, ku, ku_TR, lb, lb_LU, ln, ln_AO, ln_CD, ln_CF, ln_CG, lo, lo_LA, lu, lu_CD, mgh, mgh_MZ, mk, mk_MK, ms_BN, ms_ID, mua, mua_CM, nl, nl_AW, nl_BE, nl_BQ, nl_CW, nl_NL, nl_SR, nl_SX, nnh, nnh_CM, pt, pt_BR, qu_BO, rn, rn_BI, ro, ro_MD, ro_RO, rw, rw_RW, sc, sc_IT, seh, seh_MZ, sg, sg_CF, sl, sl_SI, sr, sr_Cyrl, sr_Cyrl_BA, sr_Cyrl_ME, sr_Cyrl_RS, sr_Cyrl_XK, sr_Latn, sr_Latn_BA, sr_Latn_ME, sr_Latn_RS, sr_Latn_XK, su, su_Latn, su_Latn_ID, sw_CD, tr, tr_CY, tr_TR, vi, vi_VN, wo, wo_SN, yrl, yrl_BR, yrl_CO, yrl_VE
12,34,567 brx, brx_IN, en_IN, gu, gu_IN, hi, hi_IN, hi_Latn, hi_Latn_IN, ml, ml_IN, or, or_IN, pa, pa_Guru, pa_Guru_IN, ta, ta_IN, ta_LK, te, te_IN
1234567 en_US_POSIX
1 234 567 af, af_NA, af_ZA, agq, agq_CM, bas, bas_CM, be, be_BY, bg, bg_BG, br, br_FR, cs, cs_CZ, cv, cv_RU, de_AT, dje, dje_NE, dua, dua_CM, dyo, dyo_SN, en_FI, en_SE, en_ZA, eo, eo_001, es_CR, et, et_EE, ewo, ewo_CM, ff, ff_Latn, ff_Latn_BF, ff_Latn_CM, ff_Latn_GH, ff_Latn_GM, ff_Latn_GN, ff_Latn_GW, ff_Latn_LR, ff_Latn_MR, ff_Latn_NE, ff_Latn_NG, ff_Latn_SL, ff_Latn_SN, fi, fi_FI, fr_CA, hu, hu_HU, hy, hy_AM, ka, ka_GE, kab, kab_DZ, kea, kea_CV, khq, khq_ML, kk, kk_KZ, ksf, ksf_CM, ksh, ksh_DE, ky, ky_KG, lt, lt_LT, lv, lv_LV, mfe, mfe_MU, nb, nb_NO, nb_SJ, nmg, nmg_CM, nn, nn_NO, no, os, os_GE, os_RU, pl, pl_PL, pt_AO, pt_CH, pt_CV, pt_GQ, pt_GW, pt_LU, pt_MO, pt_MZ, pt_PT, pt_ST, pt_TL, ru, ru_BY, ru_KG, ru_KZ, ru_MD, ru_RU, ru_UA, sah, sah_RU, se, se_FI, se_NO, se_SE, ses, ses_ML, shi, shi_Latn, shi_Latn_MA, shi_Tfng, shi_Tfng_MA, sk, sk_SK, smn, smn_FI, sq, sq_AL, sq_MK, sq_XK, sv, sv_AX, sv_FI, sv_SE, tg, tg_TJ, tk, tk_TM, tt, tt_RU, twq, twq_NE, tzm, tzm_MA, uk, uk_UA, uz, uz_Cyrl, uz_Cyrl_UZ, uz_Latn, uz_Latn_UZ, xh, xh_ZA, yav, yav_CM, zgh, zgh_MA
1’234’567 de_CH, de_LI, en_CH, gsw, gsw_CH, gsw_FR, gsw_LI, it_CH, rm, rm_CH, wae, wae_CH
1 234 567 fr, fr_BE, fr_BF, fr_BI, fr_BJ, fr_BL, fr_CD, fr_CF, fr_CG, fr_CH, fr_CI, fr_CM, fr_DJ, fr_DZ, fr_FR, fr_GA, fr_GF, fr_GN, fr_GP, fr_GQ, fr_HT, fr_KM, fr_MC, fr_MF, fr_MG, fr_ML, fr_MQ, fr_MR, fr_MU, fr_NC, fr_NE, fr_PF, fr_PM, fr_RE, fr_RW, fr_SC, fr_SN, fr_SY, fr_TD, fr_TG, fr_TN, fr_VU, fr_WF, fr_YT
١٬٢٣٤٬٥٦٧ ar, ar_001, ar_BH, ar_DJ, ar_EG, ar_ER, ar_IL, ar_IQ, ar_JO, ar_KM, ar_KW, ar_LB, ar_MR, ar_OM, ar_PS, ar_QA, ar_SA, ar_SD, ar_SO, ar_SS, ar_SY, ar_TD, ar_YE, ckb, ckb_IQ, ckb_IR, sd, sd_Arab, sd_Arab_PK
۱٬۲۳۴٬۵۶۷ fa, fa_AF, fa_IR, ks, ks_Arab, ks_Arab_IN, lrc, lrc_IQ, lrc_IR, mzn, mzn_IR, pa_Arab, pa_Arab_PK, ps, ps_AF, ps_PK, ur_IN, uz_Arab, uz_Arab_AF
१,२३४,५६७ bgc, bgc_IN, bho, bho_IN, raj, raj_IN
१२,३४,५६७ mr, mr_IN, ne, ne_IN, ne_NP, sa, sa_IN
১,২৩৪,৫৬৭ mni, mni_Beng, mni_Beng_IN
১২,৩৪,৫৬৭ as, as_IN, bn, bn_BD, bn_IN
༡༢,༣༤,༥༦༧ dz, dz_BT
၁,၂၃၄,၅၆၇ my, my_MM
᱑,᱒᱓᱔,᱕᱖᱗ sat, sat_Olck, sat_Olck_IN
𑄷𑄸,𑄹𑄺,𑄻𑄼𑄽 ccp, ccp_BD, ccp_IN
𞥑⹁𞥒𞥓𞥔⹁𞥕𞥖𞥗 ff_Adlm, ff_Adlm_BF, ff_Adlm_CM, ff_Adlm_GH, ff_Adlm_GM, ff_Adlm_GN, ff_Adlm_GW, ff_Adlm_LR, ff_Adlm_MR, ff_Adlm_NE, ff_Adlm_NG, ff_Adlm_SL, ff_Adlm_SN

Code:

import java.util.*;
import java.util.Map.*;
import java.util.stream.*;
import com.ibm.icu.number.*;
import com.ibm.icu.util.*;

public class IcuSpellout {
  public static void main(String... args) {
    Map<String, TreeSet<ULocale>> numbers = new TreeMap<>();
    
    for(ULocale l : ULocale.getAvailableLocales()) {
      String string = NumberFormatter.withLocale(l).format(1234567).toString();
      numbers.computeIfAbsent(string, k -> new TreeSet<>()).add(l);
    }
    for(Entry<String, TreeSet<ULocale>> entry : numbers.entrySet()) {
      System.out.println(entry.getKey() + " | " +
          entry.getValue().stream().map(ULocale::toString).collect(Collectors.joining(", ")));
    }
  }
}

@ChristianGruen
Copy link
Contributor Author

I am not sure if I understand how the spec defines decimal formats in the static context. Is it currently allowed for an implementation to provide default formats (other than the unnamed one) that have not been specified by the user? For example, would it currently be legal for a processor to return a result for format-number(1, '0', 'de')?

In XQFO 4.0, 4.7.1 Defining a decimal format says:

Decimal formats are defined in the static context, and the way they are defined is therefore outside the scope of this specification. XSLT and XQuery both provide custom syntax for creating a decimal format.

In XQuery 4.0, statically known decimal formats are defined as follows:

This is a mapping from QNames to decimal formats, with one default format that has no visible name, referred to as the unnamed decimal format. Each format is available for use when formatting numbers using the fn:format-number function. […]

5.10 Decimal Format Declaration says:

A decimal format declaration adds a decimal format to the statically known decimal formats, which define the properties used to format numbers using the fn:format-number() function, as described in XQuery and XPath Functions and Operators 4.0. […]

ChristianGruen added a commit to ChristianGruen/qtspecs that referenced this issue Feb 28, 2024
ChristianGruen added a commit that referenced this issue Mar 5, 2024
* fn:format-number: Specifying decimal format. #340

* Minor revision
@ChristianGruen
Copy link
Contributor Author

I’m closing this issue, as the PR was accepted, and the last question has been answered in today’s meeting (https://qt4cg.org/meeting/minutes/2024/03-05.html):

  • It’s allowed for an implementation to provide default formats (other than the unnamed one) that have not been specified by the user.
  • For example, it’s legal for an XQuery processor to return results for queries as simple as format-number(1, '0', 'de').

@ChristianGruen ChristianGruen added the Tests Needed Tests need to be written or merged label Mar 6, 2024
@ChristianGruen ChristianGruen removed the Tests Needed Tests need to be written or merged label Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement A change or improvement to an existing feature Propose for V4.0 The WG should consider this item critical to 4.0 XQuery An issue related to XQuery
Projects
None yet
Development

No branches or pull requests

2 participants