New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Float decoding problem #69
Comments
Unless there's some rounding issue hidden somewhere in UltraJSONs floating point decoder (and/or encoder) I really can't tell if this is an artifact caused by the very technical limitations of floating point numbers. Math geniuses to the rescue please :) Parts of this code is borrowed from the MODP_ASCII project at http://code.google.com/p/stringencoders/ |
I hacked around with the code a little and tried using the standard strtod() function inside decode_numeric(), and this passes the test. This is the same function that Python uses under the covers too, for string->float conversion.
This is obviously just a test to prove that the value is correct. More work would be required to support integers, NaN, Infinity, etc. I know this is a departure from using the stringencoders code, but it seems to return the correct value. What do you think about the approach of using the standard C functions instead? |
+1 for using strtod(). I'm trying to migrate a whole bunch of code from cjson to ujson, and this is the only blocker. -Joe I work with nashg482. |
I think I have solved this problem and if we're lucky also further speeded up decoding of integers and decimals. Needs a bit more testing. I'll check back shortly. |
Resolved in master branch |
Thanks! When will it be available via pip? |
Now ☺ From: Graham Nash [mailto:notifications@github.com] Thanks! When will it be available via pip? — |
It looks like it cuts off floating point numbers after 5 decimal places:
|
There's a kwArg option for that to encode/dumps called double_precision. |
I isolated the following case that breaks using the kwArg:
For reference I have been using this for testing:
|
Interesting error, I’ll look into it shortly In the mean time I'm pulling 1.28 from PyPi as I deem the error serious. //JT From: Graham Nash [mailto:notifications@github.com] I isolated the following case that breaks using the kwArg: import unittest import json import ujson class TestUJsonFloat(unittest.TestCase):
For reference I have been using this for testing: import random import unittest import json import ujson class TestUJsonFloat(unittest.TestCase):
— |
I've hopefully fixed the issues now. I did some redesign of the mantissa decoder. The expected funtionality at the moment is that it should decode all numeric values without decimals or exponents as 32-bit signed integers if they are 9 digits long or less and as 64-bit signed integers for all numbers up to LLONG_MIN and LLONG_MAX Looking forwards to you guys tearing it apart for the sake of further narrowing down and defining its capabilities from a can and can not do perspective. I would be especially interested in test cases for where the decoding to double precision floats would be consider failing, possibly based on comparsion. Big thanks to @gmnash Keep it up! |
The test I posted previously, test_random_range, still fails after a few iterations on the latest origin/master. Is there a reason to not try strtod? Are there performance problems, and if so are there any benchmarks available? As floating point is notoriously hard, it might be a good idea to start with a battle hardened solution that we know works first. |
I wouldn't want to relay on external functions since we can't be sure how they perform. Also, I'd rather have fast and consistent numeric decoder in ujson than a potentially slow one which supports everything. On 22 jan 2013, at 17:17, "Graham Nash" <notifications@github.commailto:notifications@github.com> wrote: The test I posted previously, test_random_range, still fails after a few iterations on the latest origin/master. Is there a reason to not try strtod? Are there performance problems, and if so are there any benchmarks available? As floating point is notoriously hard, it might be a good idea to start with a battle hardened solution that we know works first. — |
Wouldn't it be prudent to empirically test how strtod performs before On Tue, Jan 22, 2013 at 10:55 AM, Jonas Tärnström
|
Valid questions, I’m concerned about the speed of strtod and its individual implementations. //JT From: Graham Nash [mailto:notifications@github.com] Wouldn't it be prudent to empirically test how strtod performs before On Tue, Jan 22, 2013 at 10:55 AM, Jonas Tärnström
— |
Added precise_float=True (default false) option to decoder to override default behavior and use strtod for all numbers that has decimals or exponents. Integer behavoir will still be retained. |
My tests now pass. I don't see a consistent performance difference using precise_float=True. Have you noticed a difference? If not, wouldn't it make more sense from a usability standpoint (and code maintenance) to make the code path of precise_float=True the only path? Also when do you plan on doing a release to PyPI? Thanks again. Looking forward to switching off of cjson and json! |
I’ll consider your suggestion. My biggest scare are users using the existing “imprecise” floating point decoder taking a completely unnecessary performance hit for an improvement they didn’t ask for. I’m going to do some benchmarks on this later on. The PyPI release is basically due next week I guess. I’m just not in the need to rush this one. From: Graham Nash [mailto:notifications@github.com] My tests now pass. I don't see a consistent performance difference using precise_float=True. Have you noticed a difference? If not, wouldn't it make more sense from a usability standpoint (and code maintenance) to make the code path of precise_float=True the only path? Also when do you plan on doing a release to PyPI? Thanks again. Looking forward to switching off of cjson and json! — |
Sounds good. Let me know how those benchmarks turn out. Thanks again. |
How did the benchmarks turn out? Are we close to a release? Thanks. |
I haven't been able to devote the time needed for this sorry. //JT On 15 mar 2013, at 22:58, "Graham Nash" <notifications@github.commailto:notifications@github.com> wrote: How did the benchmarks turn out? Are we close to a release? Thanks. — |
So, quick question: why does "precision" here seem to mean "places after the decimal point"? Shouldn't it just be the total number of significant digits, no matter where they are? |
You are probably right, we’ve attacked this problem from more of a performance perspective than what’s mathematically a correct definition of precision. Feel free contribute to a name change for the argument, but for backward compatibility reasons we’re going to have to keep the old name as well. From: Leo Trottier [mailto:notifications@github.com] So, quick question: why does "precision" here seem to mean "places after the decimal point"? Shouldn't it just be the total number of significant digits, no matter where they are? — |
I was under the impression that relative scale independence was at the heart of the very idea of "floating point". Otherwise, how will ujson handle 1.4590808237e100 ? or 1.45909280972e-50 ? What does "precision", as ujson is using it, even mean in such cases? Indeed, if I do: ujson.dumps(1e-40) ==> '0.0' Contrast this to: json.dumps(1e-40) ==> '1e-40' |
There are lots of reasons that strtod should not be used for parsing JSON numbers (or really any numbers), & it has nothing to do with performance.
If you want a fully conforming JSON number parsing implementation, you should check out: |
I guess this is a discussion which ultimately depends on how closely you consider JSON tied to JavaScript. Any numeric parsing implementation widely surpassing the capabilities of what JavaScript could handle is from our perspective not worth pursuing as we would expect most JSON to end up in JavaScript land anyhow. If the intention is to marshal fixed point precision numerals outside of the JavaScript world, maybe there are other alternatives to JSON or UltraJSON |
UltraJSON is in the C/C++/Python world outside of Javascript. |
to make sure data (almost) round-trips. Still seeing slight differences.
I am seeing a floating point problem that I do not see in json or cjson. The first two tests below pass while the one using ujson fails. I am running Ubuntu 12.04 if that helps.
The text was updated successfully, but these errors were encountered: