New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to specify the decimal separator character for `number` types in JTS #246

Closed
akariv opened this Issue Mar 4, 2016 · 9 comments

Comments

Projects
None yet
4 participants
@akariv
Contributor

akariv commented Mar 4, 2016

Both , and . are commonly used throughout the world (reference: https://en.wikipedia.org/wiki/Decimal_mark#/media/File:DecimalSeparator.svg), and not specifying which decimal separator is used creates ambiguity (e.g. '1,500' can mean "one and a half" or "fifteen hundred").

We should be able to add to the format definition an indication for which decimal separator is used. I can see two options for that:

  • Create a new attribute (decimalSeparator) which will only be used in case type is number. Its value would be the decimal separator character. In case it's missing, number interpretation will be defined to be implementation dependent.
  • Create new formats in addition to default and currency: default-comma-decimal-separator and currency-comma-decimal-separator (we might consider shorter names). This removes the need for introducing a new attribute, and also doesn't face the backward compatibility issues of the first option.

@rgrp

@pwalsh

This comment has been minimized.

Show comment
Hide comment
@pwalsh

pwalsh Mar 5, 2016

Member

+1, and +1 on new attribute decimalSeparator.

Member

pwalsh commented Mar 5, 2016

+1, and +1 on new attribute decimalSeparator.

@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Mar 7, 2016

Contributor

I think this is a simple useful change.

My only concern is that we are describing "input format parsing" in the schema. We already sort of do this for dates so not a huge change. What is important is to distinguish this from "display format" information e.g. this field is percentage - otherwise we can end up with confusion (IME from recline).

Contributor

rufuspollock commented Mar 7, 2016

I think this is a simple useful change.

My only concern is that we are describing "input format parsing" in the schema. We already sort of do this for dates so not a huge change. What is important is to distinguish this from "display format" information e.g. this field is percentage - otherwise we can end up with confusion (IME from recline).

@akariv

This comment has been minimized.

Show comment
Hide comment
@akariv

akariv Mar 7, 2016

Contributor

I thought this spec only defined constraints on the data itself, without any specific reference to displaying the data. Anyway, opening a PR.

Contributor

akariv commented Mar 7, 2016

I thought this spec only defined constraints on the data itself, without any specific reference to displaying the data. Anyway, opening a PR.

@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Mar 7, 2016

Contributor

@akariv yes, exactly - i generally do not want presentation info in the spec.

Contributor

rufuspollock commented Mar 7, 2016

@akariv yes, exactly - i generally do not want presentation info in the spec.

@danfowler

This comment has been minimized.

Show comment
Hide comment
@danfowler

danfowler Mar 23, 2016

Contributor

Copying from #249

+1 @akariv. To add to the discussion, I will point out that having format be a string or an object puts JSON Table Schema into greater alignment with W3C CSVW for the number format: http://w3c.github.io/csvw/syntax/#formats-for-numeric-types .

If the datatype base is a numeric type, the datatype format annotation indicates the expected format for that number. Its value MUST be either a single string or an object with one or more of the properties:

We can go further and rename decimalSeparator -> decimalChar and thousandsSeparator -> groupChar*

{ 
  "currency": true,
  "decimalChar": ".",
  "groupChar": ","
}

* http://stackoverflow.com/questions/15286545/why-is-the-thousands-separator-called-the-grouping-separator

Contributor

danfowler commented Mar 23, 2016

Copying from #249

+1 @akariv. To add to the discussion, I will point out that having format be a string or an object puts JSON Table Schema into greater alignment with W3C CSVW for the number format: http://w3c.github.io/csvw/syntax/#formats-for-numeric-types .

If the datatype base is a numeric type, the datatype format annotation indicates the expected format for that number. Its value MUST be either a single string or an object with one or more of the properties:

We can go further and rename decimalSeparator -> decimalChar and thousandsSeparator -> groupChar*

{ 
  "currency": true,
  "decimalChar": ".",
  "groupChar": ","
}

* http://stackoverflow.com/questions/15286545/why-is-the-thousands-separator-called-the-grouping-separator

@akariv

This comment has been minimized.

Show comment
Hide comment
@akariv

akariv Apr 18, 2016

Contributor

@rgrp what is the next step here?
This will become a blocker pretty soon.

Contributor

akariv commented Apr 18, 2016

@rgrp what is the next step here?
This will become a blocker pretty soon.

rufuspollock added a commit that referenced this issue Apr 18, 2016

[jts,#246][m]: number lexical form much clearer with new options.
* Much more detailed spec plus support for -INF, +INF etc
* New properties: decimalChar, groupChar etc
@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Apr 18, 2016

Contributor

FIXED. See c820ce1

Note: tried to align as much as possible with W3C but kept a bit simpler and e.g. no support of per-mille (just percentage).

Also did not implement pattern stuff. I feel it is quite complex and difficult to grok (based on my experience reading and re-reading). Happy to have +1's for it (probably open a separate issue if wanted). If we did it would be something like allowing this for format string:

* **PATTERN**: A [number format pattern][num-format] as defined in the [Unicode
  Locale Data Markup Language][num-format]. Implementations MUST recognise
  number format patterns containing the symbols 0, #, the specified decimalChar
  (or "." if unspecified), the specified groupChar (or "," if unspecified), E,
  +, % and ‰. Implementations MAY additionally recognise number format patterns
  containing other special pattern characters defined in the [Unicode
  spec][num-format]. If the supplied value is not a string, or if it contains
  an invalid number format pattern or uses special pattern characters that the
  implementation does not recognise, implementations MUST issue a warning and
  proceed as if the property had not been specified.

[num-format]: http://www.unicode.org/reports/tr35/tr35-numbers.html#Number_Format_Patterns
Contributor

rufuspollock commented Apr 18, 2016

FIXED. See c820ce1

Note: tried to align as much as possible with W3C but kept a bit simpler and e.g. no support of per-mille (just percentage).

Also did not implement pattern stuff. I feel it is quite complex and difficult to grok (based on my experience reading and re-reading). Happy to have +1's for it (probably open a separate issue if wanted). If we did it would be something like allowing this for format string:

* **PATTERN**: A [number format pattern][num-format] as defined in the [Unicode
  Locale Data Markup Language][num-format]. Implementations MUST recognise
  number format patterns containing the symbols 0, #, the specified decimalChar
  (or "." if unspecified), the specified groupChar (or "," if unspecified), E,
  +, % and ‰. Implementations MAY additionally recognise number format patterns
  containing other special pattern characters defined in the [Unicode
  spec][num-format]. If the supplied value is not a string, or if it contains
  an invalid number format pattern or uses special pattern characters that the
  implementation does not recognise, implementations MUST issue a warning and
  proceed as if the property had not been specified.

[num-format]: http://www.unicode.org/reports/tr35/tr35-numbers.html#Number_Format_Patterns
@akariv

This comment has been minimized.

Show comment
Hide comment
@akariv

akariv Apr 24, 2016

Contributor

@rgrp one comment - it's not clear right now from the wording what exactly should currency be - a boolean? a string?
In case we opt to leave this property and not remove it as #258 suggests, we should have a better definition for its value.

Contributor

akariv commented Apr 24, 2016

@rgrp one comment - it's not clear right now from the wording what exactly should currency be - a boolean? a string?
In case we opt to leave this property and not remove it as #258 suggests, we should have a better definition for its value.

@rufuspollock

This comment has been minimized.

Show comment
Hide comment
@rufuspollock

rufuspollock Apr 25, 2016

Contributor

@akariv good question 😉 Can you put comments on currency in #258 - even though that is about removal find to have comments like this plus comments to keep and define better.

Contributor

rufuspollock commented Apr 25, 2016

@akariv good question 😉 Can you put comments on currency in #258 - even though that is about removal find to have comments like this plus comments to keep and define better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment