-
-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make JSON more consumable, avoid very large numbers for repeating decimals #1195
Comments
Ooh, good point. I expect that's an irrational number or repeating decimal. In the JSON we don't round these based on display precision, as the UIs do. We want the JSON to be usable by others. We'd like to minimise imprecision. Ideally we'd like hledger -> JSON -> hledger roundtripping to not be lossy. What trade off would you suggest ? |
Right, it's definitely a repeating decimal—and if I'm not mistaken, we should never get irrational numbers because the unit price is computed by dividing two decimal numbers. Because of this, an option would be to give the unit price as a fraction. It could be something like this for the previous example: "aprice": {
"tag": "UnitPrice",
"contents": {
"aprice": null,
"acommodity": "CAD",
"aquantity": {
"isFraction": true,
"numerator": 1397,
"denominator": 2040,
},
"aismultiplier": false
}
} Another option would be to pass numbers as strings, but this is less elegant IMHO. |
I'm not sure what is the best solution here, considering all ramifications. @matteodelabre are you a Mole user ? I can suggest a hledger-web patch or two you could try, for some real world testing. |
print, register, balance etc. in master now support -O json, for easier troubleshooting. Here's an example based on the above:
Output with -x, showing the transaction prices inferred for postings a and b:
(Giant) output with -O json:
The unit prices are rendered with the maximum 255 digits supported by the Decimal library. They also seem to be in scientific notation - And FWIW here's the same example with a very high display precision. Text report rendering seems to max out at 44 decimal digits for some reason (and 88 in the unit price amounts, I'll guess because they're the product of two 44-digit amounts).
Should we replace the 255-digit amounts in JSON with fewer-digit amounts ? Or include both ? What would be the right maximum number of digits ? Should we use the same number as in text reports (display precision), or something more like 10, 20, 30.. ? |
..and/or include rational numbers (numerator & denominator), as suggested ? What could read such JSON ? Can Mole actually read the current JSON successfully, when numbers are normal sized ? |
I think such infinitely-precise numbers arise only with implicit transaction prices - where hledger divides two numbers. |
I am indeed! I would gladly test patches.
Do they? I just tried to build from
If we’re talking about MoLe, it parses the whole JSON input in memory and then uses only the fields it is interested in. At the moment, even though I haven’t read the whole source code, I am fairly confident that the app does not use transaction price-related fields. Therefore, it only crashes during the parsing step because some number it’s not interested in is too large in the input.
I assume it can, because, as far as I can tell, the app only crashes when those large numbers appear in the JSON output.
Good point, although rational numbers are interesting because they allow the user to choose whatever precision they want.
I agree, this is a good compromise. However as you showed, the number of digits in the posting amounts is not limited, so this would require capping the precision to a maximum number of digits anyway.
So, all in all, I think 15 digits is a safe limit for most JSON consumers. |
Amounts in JSON are now rendered as simple Numbers with up to 10 decimal places, instead of Decimal objects which would in some cases have 255 digits, too many for most JSON parsers to handle. A provisional fix, see the comment in Json.hs for more detail.
My apologies, you were right. I've pushed to master now. Great info, thanks. You mention integers - I'm still a bit confused, does this also in effect mean significant digits, including decimal digits ? Or is that a separate limit ? PR #1196 is ready for testing:
|
Amounts in JSON are now rendered as simple Numbers with up to 10 decimal places, instead of Decimal objects which would in some cases have 255 digits, too many for most JSON parsers to handle. A provisional fix, see the comment in Json.hs for more detail.
Apologies for the delay in testing.
Sorry about the confusion in my previous message. I specifically researched the number of digits that can be stored in integers because in the old format all numbers were integers. (A number like In my opinion the fact that all numbers were integers was quite an useful property, because most JSON readers will convert fractional numbers to floating point numbers with no choice to opt for another representation. Consider for example the following Python REPL session: >>> import json
>>> obj = json.loads('{"integerNumber": 204}')
>>> type(obj['integerNumber'])
<class 'int'>
>>>
>>> obj = json.loads('{"fractionalNumber": 20.4}')
>>> type(obj['fractionalNumber'])
<class 'float'> That’s a shame because floating point numbers are ill-suited to representing base-10 fractional numbers. For example, even a simple number like >>> 0.2 + 0.1
0.30000000000000004 As a consequence, if we were to formally consider the question “How many decimal digits can be exactly represented in a floating point number?”, I’d answer “0”.
After testing, it appears that I was mistaken. MoLe does not simply parse the whole JSON payload without consideration of its structure. It actually relies on the precise structure to do its parsing. In particular, it expects transaction amounts to have Therefore, the pull request code makes MoLe crash in a new way, because it can’t find these fields it expects inside transaction amounts. |
Oops. Thanks for the testing and info. Of course we don't like to use floating point for quantities, but I had formed the impression it was a necessary evil in JSON land. Apparently not for Mole! If other JSON clients need a simple floating point representation in future, we could provide it in addition. So what do you think is the best solution ? Keep using Decimal objects, but limit them to integers of 15 significant digits ? And prevent the intermittent/possibly wrong appearance of scientific notation ? Ccing @adept |
Original report says "This notably breaks MoLe because it tries to parse all numbers as Java longs." Java longs are up to 2^63-1 (9223372036854775808), which is 19 digits. Since Decimal objects would essentially cater for MoLe needs only (are there any other known consumers?), this could be the guideline ... Alternatively, we can ship Decimal object AND a float, for maximum flexibility ... |
I think this is a good solution because it would minimize change in the JSON structure. |
Related info: aeson uses the Scientific type when rendering our Decimals, or any numbers, as JSON. And scientific seems a bit buggy and under-maintained. Be aware. |
Here's how including both representations could look: instance ToJSON Decimal where
toJSON d = object
["decimalPlaces" .= toJSON decimalPlaces
,"decimalMantissa" .= toJSON decimalMantissa
,"floatingPoint" .= toJSON (fromRational $ toRational d' :: Double)
]
where d'@Decimal{..} = roundTo 15 d
Issues:
|
Do you think providing both the large integer and a small floating point will still break JSON consumers ? Ie does the presence of a large number somewhere in the JSON break them, even if they don't need to use it ? |
Just tested this with MoLe and it works. Looks like the app—luckily—does not check that no unexpected fields are present (like
I think that this is only an artifact of the JSON printer which is used for the Object
(fromList
[ ( "floatingPoint" , Number (-20.4) )
, ( "decimalPlaces" , Number 15.0 )
, ( "decimalMantissa" , Number (-2.04e16) )
]) When inspecting the JSON object produced by {"floatingPoint":-20.4,"decimalPlaces":15,"decimalMantissa":-20400000000000000} By the way, because of this kind of discrepancies, in my opinion it would be great (if technically possible) to use the same JSON printer in
I used the following piece of code to test whether this was the case (please excuse my non-existent Haskell skills): instance ToJSON Decimal where
toJSON d = object
["decimalPlaces" .= toJSON (decimalPlaces (roundTo 15 d))
,"decimalMantissa" .= toJSON (decimalMantissa (roundTo 15 d))
,"fullDecimalPlaces" .= toJSON (decimalPlaces d)
,"fullDecimalMantissa" .= toJSON (decimalMantissa d)
,"floatingPoint" .= toJSON (fromRational $ toRational d :: Double)
] Which creates a JSON object that looks like this (for the offending example): "aquantity": {
"floatingPoint": 0.6848039215686275,
"fullDecimalPlaces": 255,
"decimalPlaces": 15,
"fullDecimalMantissa": 684803921568627450980392156862745098039215686274509803921568627450980392156862745098039215686274509803921568627450980392156862745098039215686274509803921568627450980392156862745098039215686274509803921568627450980392156862745098039215686274509803921568627,
"decimalMantissa": 684803921568627
} This does not break the app, as it appears that it only converts to numbers the fields that it is interested in. Although I’m certain that there is a JSON parser out there that would not cope as well with this. |
Thanks for the testing @matteodelabre. hledger-web and hledger do use the same JSON output code. I think the scientific notation kicks in above a certain magnitude, I won't say more as I've forgotten the details. I think the question is, do we bother to include the high precision Decimal object in our JSON ? Or do we drop it until an actual need arises, and for now just provide the dreaded floating point, to make the JSON more compatible with the rest of the world ? AFAIK Mole is the only consumer of our JSON so far. I'm a bit confused about its status. @matteodelabre are you a Mole user ? Developer ? Do you know which hledger-web versions it works with today, if any ? |
I concur, I don’t see the need for high precision decimals in the use cases that we studied so far. MoLe just converts decimals to floating points internally anyway. However, simply dropping the
I’m afraid I don’t have much additional information. I’m mainly a MoLe user, having only contributed one patch in the past (also related to the JSON interface with hledger-web). But I don’t exclude the possibility of contributing more substantially to the project in the future. As for the version compatibility, some information is provided on the home page: “MoLe is known to work with hledger-web versions 1.10 and 1.14. Versions 1.11-1.13 are problematic because the HTML has changed compared to 1.10 and they lack the JSON API introduced in 1.14.” Additionally, looking at the source code and the changelog, it also supports versions 1.15 and 1.16. I’ve successfully used the app with hledger-web versions 1.14 through 1.16, provided that I avoid problematic inferred transaction prices. |
Hi, MoLe author here, I can confirm your observation that adding fields to the JSON API doesn't break MoLe. I have in my TODO list to make MoLe work with large values (using java's BigDecimal) for when there is a 20-digit integer in the ledger, entered by the user. This would fix the crash with conversions resulting in periodic decimals. The BigDecimal adoption will happen in the foreseeable future, but not very soon - months, not weeks. (At the moment I am working on commodity support when entering new transactions and time is scarce). The one time I experienced the crash (in a ledger from 2016), I was able to work around it by replacing the inferred price with a fixed conversion (not sure about the terminology):
Yes, that would result in a periodic decimal for the price, which would have the same problem, but since MoLe doesn't try to get a number out of the price JSON fields, so this is not a crasher. I don't feel I can recommend limiting precision or including floats in hledger-web's JSON output for something that I see as a bug in MoLe. |
Hi Damyan, Although I agree with you that it is mostly a bug on MoLe’s side, there is something in this that could be seen as a bug in hledger (or rather some unexpected behavior): we can get very large numbers in the JSON output even for journals that only contain small numbers. If we want to solve this and avoid the loss of precision, there is always the solution of providing the conversion rate as a ratio of two numbers (as mentioned above), which has the advantage of keeping the number of digits proportional to the number of digits in the original journal file. |
Would adding a
JSON doesn't provide a way to transport arbitrary precision integers or decimals, and having them as string might be the best way to avoid unintended rounding or errors due to very large numbers. Obviously you need to specify a cutoff point since strings don't have infinite precision either, but at least:
|
When using hledger’s inferring of transaction prices for unit prices that cannot be exactly represented in decimal, the values that show up in API responses can be very large (up to 255 decimal digits). Consider for example the following transaction:
Running
hledger-web
on this transaction, the following response is produced by the API (excerpt):Although this is technically valid JSON, it is far outside the range of usual numeric datatypes in most languages. (An unsigned long value would only allow for up to 19 decimal digits.) This notably breaks MoLe because it tries to parse all numbers as Java longs.
Do you think this is better addressed in here or that clients should be prepared to handle very large numbers?
The text was updated successfully, but these errors were encountered: