Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Parsing & Serialization: Numbers #33

Closed
benibela opened this issue Dec 18, 2020 · 12 comments
Closed

JSON Parsing & Serialization: Numbers #33

benibela opened this issue Dec 18, 2020 · 12 comments
Labels
Enhancement A change or improvement to an existing feature Propose Closing with No Action The WG should consider closing this issue with no action Propose for V4.0 The WG should consider this item critical to 4.0 XQFO An issue related to Functions and Operators

Comments

@benibela
Copy link

benibela commented Dec 18, 2020

To-dos:

  • Add number-formatter to the json method in the serialization spec.

Sometimes people put very large numbers in a JSON file, if they are parsed as double, the numbers are corrupted. Or they become confused when 1000000 from the input becomes 1e6 in the output. Finally parsing a double is slower than parsing an integer.

There could be an additional option for parse-json/json-doc number-type with possible values:

  • double: Parse all numbers as xs:double
  • decimal: Parse all numbers as xs:decimal
  • string: Return numbers as xs:string (so 1e6 stays "1e6" and 1000000 stays "1000000")
  • auto: Parse numbers containing e as xs:double, numbers containing . as xs:decimal, and numbers containing neither as xs:integer
@michaelhkay
Copy link
Contributor

On the same theme, I've just had a request from a customer for an option on xml-to-json to avoid exponential notation when outputting numbers.

@michaelhkay
Copy link
Contributor

For fn:xml-to-json (see https://saxonica.plan.io/issues/4860) I propose to add an option

number-formatter as (function(xs:string) as xs:string)?

If present, this function is used to format any <fn:number> elements in the input XML. For example

number-formatter: fn:identity#1

outputs the value "as is", while

number-formatter: ->{format-number(number(.), '#.000')}

outputs the value with three decimal places, and

number-formatter: ->{xs:decimal(.) => string()}
parses and then serializes the value as an xs:decimal.

Note, I feel an urge to allow the last example to be written as

xml-to-json($xml, number-formatter: => xs:decimal() => xs:string())

i.e. allowing a function to be expressed as pipeline of functions to be composed.

@michaelhkay
Copy link
Contributor

For symmetry, I suggest that for parse-json we add a similar function as an option: number-parser: function(xs:string) as xs:anyAtomicType. If the input matches the JSON number production, it is passed as a string to the supplied number-parser function, and the output contains whatever this returns.

@michaelhkay
Copy link
Contributor

I have implemented the following (in the specs, the test suite, and Saxon):

(a) A new option in parse-json and json-doc: number-parser as function(xs:string) as xs:anyAtomicType which is called to process a number appearing in the JSON, and can return any atomic value.

(b) A new option in xml-to-json: number-formatter as function(xs:string) as xs:string which takes the number as it appears in the XML and generates the representation of the number to appear in the JSON

Note that json-to-xml leaves the representation of the number unchanged.

@Reino17
Copy link

Reino17 commented Apr 24, 2021

I have implemented the following in the specs [...] number-parser as function(xs:string) as xs:anyAtomicType

Where exactly can I find this change? What kind of functions do I need to think of for this number-parser key? And what will the default option be?

Although I'm a xidel power-user (Benito's brainchild), I wouldn't say I'm an XPath/XQuery expert, but...

string-join(parse-json("[1.0E6,0.0000001,1000000,0.100000,""string""]")?*," ")
1.0E6 1.0E-7 1.0E6 0.1 string

...this is the current output for parse-json() (with xidel).

I was hoping the default new output would leave the output as is (including the decimals):

string-join(parse-json("[1.0E6,0.0000001,1000000,0.100000,""string""]")?*," ")
1.0E6 0.0000001 1000000 0.100000 string

And like Benito I'm more in favor of a number-parser as xs:string kind of option. So for instance:

string-join(parse-json("[1.0E6,0.0000001,1000000,0.100000,""string""]",map{"number-parser":"string"})?*," ")
1.0E6 0.0000001 1000000 0.100000 string

string-join(parse-json("[1.0E6,0.0000001,1000000,0.100000,""string""]",map{"number-parser":"decimal"})?*," ")
1000000 0.0000001 1000000 0.1 string

string-join(parse-json("[1.0E6,0.0000001,1000000,0.100000,""string""]",map{"number-parser":"double"})?*," ")
1.0E6 1.0E-7 1.0E6 0.1 string

string-join(parse-json("[1.0E6,0.0000001,1000000,0.100000,""string""]",map{"number-parser":"auto"})?*," ")
1.0E6 0.0000001 1000000 0.1 string

@Reino17
Copy link

Reino17 commented May 9, 2021

Any comments?

@ChristianGruen
Copy link
Contributor

@Reino17 15 months later, if it still matters … Drafts of the QT4 specs can be found here: https://qt4cg.org/

@rhdunn rhdunn added XQFO An issue related to Functions and Operators Enhancement A change or improvement to an existing feature labels Sep 14, 2022
@ChristianGruen ChristianGruen added this to the QT 4.0 milestone Oct 14, 2022
@ChristianGruen ChristianGruen removed this from the QT 4.0 milestone Apr 27, 2023
@ChristianGruen ChristianGruen added the Propose for V4.0 The WG should consider this item critical to 4.0 label Oct 31, 2023
@ChristianGruen
Copy link
Contributor

It looks to me as if number-parser has already been added to the spec, so I’ve added the »Propose for V4.0« label.
@michaelhkay Do you agree?

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Jan 29, 2024

For symmetry, number-parser should also be added to fn:json-to-xml.

This issue was initially about the parsing of numbers, but it also includes hints to formatting numbers. I’ve revised the title and the first comment of the issue.

@ChristianGruen ChristianGruen changed the title json parsing number type option JSON Parsing & Serialization: Numbers Jan 29, 2024
@michaelhkay
Copy link
Contributor

I believe we can close this issue as it has effectively been implemented.

For parsing JSON, we now have a number-parser option that can handle numbers having more digits than an xs:double will accommodate.

Similarly xml-to-json has a number-formatter option.

We haven't changed the serialization spec, which allows the serializer to choose any representation of a number that is legal JSON. That basically puts the responsibility for finding an output format that meets user requirements on the implementor. Saxon is now using exponential notation only for double values outside the range 1e-18 to 1e+18, which is conformant with both the 3.1 and 4.0 specs. I believe this meets user requirements but I wouldn't want to impose it as standard behaviour.

@michaelhkay michaelhkay added the Propose Closing with No Action The WG should consider closing this issue with no action label Mar 16, 2024
@ChristianGruen
Copy link
Contributor

ChristianGruen commented Mar 17, 2024

Wouldn't it be intuitive to have an analogous serialization parameter? How does Saxon serialize NaN and INF?

serialize(
  { "invalid": xs:double('NaN') },
  { 'method': 'json', 'number-formatter': string#1 }
)

@ndw
Copy link
Contributor

ndw commented Mar 19, 2024

The CG agreed to close this issue without further action at meeting 070.

@ndw ndw closed this as completed Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement A change or improvement to an existing feature Propose Closing with No Action The WG should consider closing this issue with no action Propose for V4.0 The WG should consider this item critical to 4.0 XQFO An issue related to Functions and Operators
Projects
None yet
Development

No branches or pull requests

6 participants