Add option to specify the decimal separator character for `number` types in JTS #246

Closed
akariv opened this Issue Mar 4, 2016 · 9 comments

Projects

None yet

4 participants

@akariv
akariv commented Mar 4, 2016

Both , and . are commonly used throughout the world (reference: https://en.wikipedia.org/wiki/Decimal_mark#/media/File:DecimalSeparator.svg), and not specifying which decimal separator is used creates ambiguity (e.g. '1,500' can mean "one and a half" or "fifteen hundred").

We should be able to add to the format definition an indication for which decimal separator is used. I can see two options for that:

  • Create a new attribute (decimalSeparator) which will only be used in case type is number. Its value would be the decimal separator character. In case it's missing, number interpretation will be defined to be implementation dependent.
  • Create new formats in addition to default and currency: default-comma-decimal-separator and currency-comma-decimal-separator (we might consider shorter names). This removes the need for introducing a new attribute, and also doesn't face the backward compatibility issues of the first option.

@rgrp

@akariv akariv referenced this issue in openspending/openspending Mar 4, 2016
Closed

Add 'decimal separator' selection in mapping phase #851

@pwalsh
Member
pwalsh commented Mar 5, 2016

+1, and +1 on new attribute decimalSeparator.

@rufuspollock
Contributor

I think this is a simple useful change.

My only concern is that we are describing "input format parsing" in the schema. We already sort of do this for dates so not a huge change. What is important is to distinguish this from "display format" information e.g. this field is percentage - otherwise we can end up with confusion (IME from recline).

@akariv
akariv commented Mar 7, 2016

I thought this spec only defined constraints on the data itself, without any specific reference to displaying the data. Anyway, opening a PR.

@rufuspollock
Contributor

@akariv yes, exactly - i generally do not want presentation info in the spec.

@danfowler
Contributor

Copying from #249

+1 @akariv. To add to the discussion, I will point out that having format be a string or an object puts JSON Table Schema into greater alignment with W3C CSVW for the number format: http://w3c.github.io/csvw/syntax/#formats-for-numeric-types .

If the datatype base is a numeric type, the datatype format annotation indicates the expected format for that number. Its value MUST be either a single string or an object with one or more of the properties:

We can go further and rename decimalSeparator -> decimalChar and thousandsSeparator -> groupChar*

{ 
  "currency": true,
  "decimalChar": ".",
  "groupChar": ","
}

* http://stackoverflow.com/questions/15286545/why-is-the-thousands-separator-called-the-grouping-separator

@akariv
akariv commented Apr 18, 2016

@rgrp what is the next step here?
This will become a blocker pretty soon.

@rufuspollock rufuspollock added a commit that referenced this issue Apr 18, 2016
@rufuspollock rufuspollock [jts,#246][m]: number lexical form much clearer with new options.
* Much more detailed spec plus support for -INF, +INF etc
* New properties: decimalChar, groupChar etc
c820ce1
@rufuspollock
Contributor
rufuspollock commented Apr 18, 2016 edited

FIXED. See c820ce1

Note: tried to align as much as possible with W3C but kept a bit simpler and e.g. no support of per-mille (just percentage).

Also did not implement pattern stuff. I feel it is quite complex and difficult to grok (based on my experience reading and re-reading). Happy to have +1's for it (probably open a separate issue if wanted). If we did it would be something like allowing this for format string:

* **PATTERN**: A [number format pattern][num-format] as defined in the [Unicode
  Locale Data Markup Language][num-format]. Implementations MUST recognise
  number format patterns containing the symbols 0, #, the specified decimalChar
  (or "." if unspecified), the specified groupChar (or "," if unspecified), E,
  +, % and ‰. Implementations MAY additionally recognise number format patterns
  containing other special pattern characters defined in the [Unicode
  spec][num-format]. If the supplied value is not a string, or if it contains
  an invalid number format pattern or uses special pattern characters that the
  implementation does not recognise, implementations MUST issue a warning and
  proceed as if the property had not been specified.

[num-format]: http://www.unicode.org/reports/tr35/tr35-numbers.html#Number_Format_Patterns
@akariv
akariv commented Apr 24, 2016

@rgrp one comment - it's not clear right now from the wording what exactly should currency be - a boolean? a string?
In case we opt to leave this property and not remove it as #258 suggests, we should have a better definition for its value.

@rufuspollock
Contributor

@akariv good question 😉 Can you put comments on currency in #258 - even though that is about removal find to have comments like this plus comments to keep and define better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment