Float decoding problem #69

gmnash · 2012-12-19T22:03:22Z

I am seeing a floating point problem that I do not see in json or cjson. The first two tests below pass while the one using ujson fails. I am running Ubuntu 12.04 if that helps.

import unittest
import json
import cjson
import ujson


class TestUJsonFloat(unittest.TestCase):
    def test_json(self):
        sut = {u'a': 4.56}
        encoded = json.dumps(sut)
        decoded = json.loads(encoded)
        self.assertEqual(sut, decoded)

    def test_cjson(self):
        sut = {u'a': 4.56}
        encoded = cjson.encode(sut)
        decoded = cjson.decode(encoded)
        self.assertEqual(sut, decoded)

    def test_ujson(self):
        sut = {u'a': 4.56}
        encoded = ujson.encode(sut)
        decoded = ujson.decode(encoded)
        self.assertEqual(sut, decoded)

The text was updated successfully, but these errors were encountered:

jskorpan · 2012-12-20T12:35:27Z

Unless there's some rounding issue hidden somewhere in UltraJSONs floating point decoder (and/or encoder) I really can't tell if this is an artifact caused by the very technical limitations of floating point numbers.

Math geniuses to the rescue please :)

Parts of this code is borrowed from the MODP_ASCII project at http://code.google.com/p/stringencoders/

gmnash · 2012-12-21T16:35:16Z

I hacked around with the code a little and tried using the standard strtod() function inside decode_numeric(), and this passes the test. This is the same function that Python uses under the covers too, for string->float conversion.

FASTCALL_ATTR JSOBJ FASTCALL_MSVC decode_numeric ( struct DecoderState *ds)
{   
    char* end = 0;
    double d = strtod(ds->start, &end);
    ds->lastType = JT_DOUBLE;
    ds->start = end;
    RETURN_JSOBJ_NULLCHECK(ds->dec->newDouble (d));
}

This is obviously just a test to prove that the value is correct. More work would be required to support integers, NaN, Infinity, etc.

I know this is a departure from using the stringencoders code, but it seems to return the correct value. What do you think about the approach of using the standard C functions instead?

joewalnes · 2012-12-21T19:22:10Z

+1 for using strtod(). I'm trying to migrate a whole bunch of code from cjson to ujson, and this is the only blocker.

-Joe

I work with nashg482.

jskorpan · 2013-01-07T17:24:32Z

I think I have solved this problem and if we're lucky also further speeded up decoding of integers and decimals. Needs a bit more testing. I'll check back shortly.

jskorpan · 2013-01-16T15:33:35Z

Resolved in master branch

gmnash · 2013-01-16T16:22:30Z

Thanks! When will it be available via pip?

jskorpan · 2013-01-16T16:42:57Z

Now ☺
//JT

From: Graham Nash [mailto:notifications@github.com]
Sent: den 16 januari 2013 17:23
To: esnme/ultrajson
Cc: Jonas Tärnström
Subject: Re: [ultrajson] Float decoding problem (#69)

Thanks! When will it be available via pip?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69#issuecomment-12326275.

gmnash · 2013-01-16T17:12:11Z

It looks like it cuts off floating point numbers after 5 decimal places:

import unittest
import json
import ujson
import cjson


class TestUJsonFloat(unittest.TestCase):
    def test_json(self):
        sut = {u'a': 4.567891}
        encoded = json.dumps(sut)
        decoded = json.loads(encoded)
        self.assertEqual(sut, decoded)

    def test_cjson(self):
        sut = {u'a': 4.567891}
        encoded = cjson.encode(sut)
        decoded = cjson.decode(encoded)
        self.assertEqual(sut, decoded)

    def test_ujson(self):
        sut = {u'a': 4.567891}
        encoded = ujson.encode(sut)
        decoded = ujson.decode(encoded)
        self.assertEqual(sut, decoded)

jskorpan · 2013-01-17T14:50:35Z

There's a kwArg option for that to encode/dumps called double_precision.
Default is 5, I'm fine with changing that default if that seems more reasonable-

gmnash · 2013-01-17T15:35:27Z

I isolated the following case that breaks using the kwArg:

import unittest
import json
import ujson


class TestUJsonFloat(unittest.TestCase):
    def test_random(self):
        sut = {u'a': -528656961.4399388}
        encoded = json.dumps(sut)
        decoded = json.loads(encoded)
        self.assertEqual(sut, decoded)

        encoded = ujson.encode(sut, double_precision=100)
        decoded = ujson.decode(encoded)
        self.assertEqual(sut, decoded)

For reference I have been using this for testing:

import random
import unittest
import json
import ujson


class TestUJsonFloat(unittest.TestCase):
    def test_random_range(self):
        random.seed(0)
        JSON_MAX = pow(2, 53)
        for i in range(0, 100000):
            value = random.uniform(-JSON_MAX, JSON_MAX)
            sut = {u'a': value}

            try:
                encoded = json.dumps(sut)
                decoded = json.loads(encoded)
                self.assertEqual(sut, decoded)
            except Exception as e:
                print "json: i={}, value={}, error={}".format(i, value, e)
                raise

            try:
                encoded = ujson.encode(sut, double_precision=100)
                decoded = ujson.decode(encoded)
                self.assertEqual(sut, decoded)
            except Exception as e:
                print "ujson: i={}, value={}, error={}".format(i, value, e)
                raise

jskorpan · 2013-01-17T15:57:49Z

Interesting error, I’ll look into it shortly

In the mean time I'm pulling 1.28 from PyPi as I deem the error serious.

//JT

From: Graham Nash [mailto:notifications@github.com]
Sent: den 17 januari 2013 16:36
To: esnme/ultrajson
Cc: Jonas Tärnström
Subject: Re: [ultrajson] Float decoding problem (#69)

I isolated the following case that breaks using the kwArg:

import unittest

import json

import ujson

class TestUJsonFloat(unittest.TestCase):

def test_random(self):

    sut = {u'a': -528656961.4399388}

    encoded = json.dumps(sut)

    decoded = json.loads(encoded)

    self.assertEqual(sut, decoded)



    encoded = ujson.encode(sut, double_precision=100)

    decoded = ujson.decode(encoded)

    self.assertEqual(sut, decoded)

For reference I have been using this for testing:

import random

import unittest

import json

import ujson

class TestUJsonFloat(unittest.TestCase):

def test_random(self):

    for i in range(0, 1000):

        value = random.uniform(-1000000000, 1000000000)

        sut = {u'a': value}



        try:

            encoded = json.dumps(sut)

            decoded = json.loads(encoded)

            self.assertEqual(sut, decoded, "json: i={}, value={}".format(i, value))

        except Exception as e:

            print "ujson: i={}, value={}, error={}".format(i, value, e)

            raise



        try:

            encoded = ujson.encode(sut, double_precision=100)

            decoded = ujson.decode(encoded)

            self.assertEqual(sut, decoded, "ujson: i={}, value={}".format(i, value))

        except Exception as e:

            print "ujson: i={}, value={}, error={}".format(i, value, e)

            raise

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69#issuecomment-12372848.

jskorpan · 2013-01-18T15:21:40Z

I've hopefully fixed the issues now. I did some redesign of the mantissa decoder.

The expected funtionality at the moment is that it should decode all numeric values without decimals or exponents as 32-bit signed integers if they are 9 digits long or less and as 64-bit signed integers for all numbers up to LLONG_MIN and LLONG_MAX

Looking forwards to you guys tearing it apart for the sake of further narrowing down and defining its capabilities from a can and can not do perspective. I would be especially interested in test cases for where the decoding to double precision floats would be consider failing, possibly based on comparsion.

Big thanks to @gmnash Keep it up!

gmnash · 2013-01-22T16:11:07Z

The test I posted previously, test_random_range, still fails after a few iterations on the latest origin/master. Is there a reason to not try strtod? Are there performance problems, and if so are there any benchmarks available? As floating point is notoriously hard, it might be a good idea to start with a battle hardened solution that we know works first.

jskorpan · 2013-01-22T16:55:24Z

I wouldn't want to relay on external functions since we can't be sure how they perform. Also, I'd rather have fast and consistent numeric decoder in ujson than a potentially slow one which supports everything.

On 22 jan 2013, at 17:17, "Graham Nash" <notifications@github.com mailto:notifications@github.com> wrote:

The test I posted previously, test_random_range, still fails after a few iterations on the latest origin/master. Is there a reason to not try strtod? Are there performance problems, and if so are there any benchmarks available? As floating point is notoriously hard, it might be a good idea to start with a battle hardened solution that we know works first.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69#issuecomment-12551554.

gmnash · 2013-01-22T17:20:28Z

Wouldn't it be prudent to empirically test how strtod performs before
dismissing it out of hand? Wouldn't it be inconsistent to have asymmetry
in encode/decode? By supports everything do you mean all valid JSON
numbers? Do you plan on listing the values for which the asymmetry occurs
in the README? Do you feel confident you can find them all?

On Tue, Jan 22, 2013 at 10:55 AM, Jonas Tärnström
notifications@github.comwrote:

I wouldn't want to relay on external functions since we can't be sure how
they perform. Also, I'd rather have fast and consistent numeric decoder in
ujson than a potentially slow one which supports everything.

On 22 jan 2013, at 17:17, "Graham Nash" <notifications@github.com<mailto:
notifications@github.com>> wrote:

The test I posted previously, test_random_range, still fails after a few
iterations on the latest origin/master. Is there a reason to not try
strtod? Are there performance problems, and if so are there any benchmarks
available? As floating point is notoriously hard, it might be a good idea
to start with a battle hardened solution that we know works first.

—
Reply to this email directly or view it on GitHub<
https://github.com/esnme/ultrajson/issues/69#issuecomment-12551554>.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69#issuecomment-12553897.

jskorpan · 2013-01-23T08:41:56Z

Valid questions, I’m concerned about the speed of strtod and its individual implementations.
I’ll do some trials in a day or two, thanks for your engagement. I see your point.

//JT

From: Graham Nash [mailto:notifications@github.com]
Sent: den 22 januari 2013 18:21
To: esnme/ultrajson
Cc: Jonas Tärnström
Subject: Re: [ultrajson] Float decoding problem (#69)

Wouldn't it be prudent to empirically test how strtod performs before
dismissing it out of hand? Wouldn't it be inconsistent to have asymmetry
in encode/decode? By supports everything do you mean all valid JSON
numbers? Do you plan on listing the values for which the asymmetry occurs
in the README? Do you feel confident you can find them all?

On Tue, Jan 22, 2013 at 10:55 AM, Jonas Tärnström
notifications@github.comwrote:

I wouldn't want to relay on external functions since we can't be sure how
they perform. Also, I'd rather have fast and consistent numeric decoder in
ujson than a potentially slow one which supports everything.

On 22 jan 2013, at 17:17, "Graham Nash" <notifications@github.com<mailto:
notifications@github.com>> wrote:

The test I posted previously, test_random_range, still fails after a few
iterations on the latest origin/master. Is there a reason to not try
strtod? Are there performance problems, and if so are there any benchmarks
available? As floating point is notoriously hard, it might be a good idea
to start with a battle hardened solution that we know works first.

—
Reply to this email directly or view it on GitHub<
https://github.com/esnme/ultrajson/issues/69#issuecomment-12551554>.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69#issuecomment-12553897.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69#issuecomment-12555199.

jskorpan · 2013-02-12T16:24:34Z

Added precise_float=True (default false) option to decoder to override default behavior and use strtod for all numbers that has decimals or exponents.

Integer behavoir will still be retained.

gmnash · 2013-02-14T15:45:41Z

My tests now pass. I don't see a consistent performance difference using precise_float=True. Have you noticed a difference? If not, wouldn't it make more sense from a usability standpoint (and code maintenance) to make the code path of precise_float=True the only path?

Also when do you plan on doing a release to PyPI?

Thanks again. Looking forward to switching off of cjson and json!

jskorpan · 2013-02-15T08:34:30Z

I’ll consider your suggestion. My biggest scare are users using the existing “imprecise” floating point decoder taking a completely unnecessary performance hit for an improvement they didn’t ask for. I’m going to do some benchmarks on this later on.

The PyPI release is basically due next week I guess. I’m just not in the need to rush this one.
//JT

From: Graham Nash [mailto:notifications@github.com]
Sent: den 14 februari 2013 16:46
To: esnme/ultrajson
Cc: Jonas Tärnström
Subject: Re: [ultrajson] Float decoding problem (#69)

My tests now pass. I don't see a consistent performance difference using precise_float=True. Have you noticed a difference? If not, wouldn't it make more sense from a usability standpoint (and code maintenance) to make the code path of precise_float=True the only path?

Also when do you plan on doing a release to PyPI?

Thanks again. Looking forward to switching off of cjson and json!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69#issuecomment-13557415.

gmnash · 2013-02-15T18:56:34Z

Sounds good. Let me know how those benchmarks turn out. Thanks again.

gmnash · 2013-03-15T21:56:03Z

How did the benchmarks turn out? Are we close to a release? Thanks.

jskorpan · 2013-03-18T09:05:12Z

I haven't been able to devote the time needed for this sorry.
Please use the kwarg switch if that is suitable or patch the code behavior.

//JT

On 15 mar 2013, at 22:58, "Graham Nash" <notifications@github.com mailto:notifications@github.com> wrote:

How did the benchmarks turn out? Are we close to a release? Thanks.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69#issuecomment-14987146.

trottier · 2013-05-03T18:21:31Z

So, quick question: why does "precision" here seem to mean "places after the decimal point"? Shouldn't it just be the total number of significant digits, no matter where they are?

jskorpan · 2013-05-06T09:13:16Z

You are probably right, we’ve attacked this problem from more of a performance perspective than what’s mathematically a correct definition of precision.

Feel free contribute to a name change for the argument, but for backward compatibility reasons we’re going to have to keep the old name as well.

From: Leo Trottier [mailto:notifications@github.com]
Sent: den 3 maj 2013 20:22
To: esnme/ultrajson
Cc: Jonas Tärnström
Subject: Re: [ultrajson] Float decoding problem (#69)

So, quick question: why does "precision" here seem to mean "places after the decimal point"? Shouldn't it just be the total number of significant digits, no matter where they are?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/69#issuecomment-17410062.

trottier · 2013-05-06T17:35:15Z

I was under the impression that relative scale independence was at the heart of the very idea of "floating point".

Otherwise, how will ujson handle 1.4590808237e100 ? or 1.45909280972e-50 ? What does "precision", as ujson is using it, even mean in such cases?

Indeed, if I do:

ujson.dumps(1e-40) ==> '0.0'

Contrast this to:

json.dumps(1e-40) ==> '1e-40'

vlovich · 2014-10-23T01:04:51Z

There are lots of reasons that strtod should not be used for parsing JSON numbers (or really any numbers), & it has nothing to do with performance.

The API only has ERANGE for reporting overflow/underflow. You can't know which exactly.
64-bit integers (which are allowed in JSON) are broken: you only get 53 bits. At one point I remember there was a huge bug where facebook was using 64-bit numbers in their IDs.
Integers are broken: you can't expose in the API if you had to truncate to get an integer value.
Infinities & NaN are unhandled (which is supported by the standard)
It supports a hex-notation which is unsupported by JSON
It can't tell you if you lost precision due to floating-point representation.

If you want a fully conforming JSON number parsing implementation, you should check out:
https://github.com/openwebos/libpbnjson/blob/e80704d2f1f36a4dc666926a6b5e1be09959c009/src/pbnjson_c/jvalue/num_conversion.c

jskorpan · 2014-10-23T07:30:30Z

I guess this is a discussion which ultimately depends on how closely you consider JSON tied to JavaScript. Any numeric parsing implementation widely surpassing the capabilities of what JavaScript could handle is from our perspective not worth pursuing as we would expect most JSON to end up in JavaScript land anyhow.

If the intention is to marshal fixed point precision numerals outside of the JavaScript world, maybe there are other alternatives to JSON or UltraJSON

vlovich · 2014-10-23T15:12:20Z

UltraJSON is in the C/C++/Python world outside of Javascript.

to make sure data (almost) round-trips. Still seeing slight differences.

jskorpan closed this as completed Jan 16, 2013

jskorpan reopened this Jan 17, 2013

jskorpan closed this as completed Feb 12, 2013

This was referenced Jun 26, 2013

ENH: Add JSON export option for DataFrame (take 2) pandas-dev/pandas#1263

Closed

Ensure accurate encoding/decoding of big and small floats pandas-dev/pandas#4042

Closed

ghost referenced this issue in tacaswell/metadataservice-1 Jan 19, 2016

FIX: use stupid precision on doubles

19585ca

to make sure data (almost) round-trips. Still seeing slight differences.

creslinux mentioned this issue Jul 12, 2018

ujson to load ticker files 30% faster in BT freqtrade/freqtrade#1023

Closed

vincentmele mentioned this issue Apr 6, 2020

parse_float support to allow decimal decoding #401

Open

hellais mentioned this issue Sep 14, 2022

Benchmark the performance of measurement loading and validation ooni/data#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Float decoding problem #69

Float decoding problem #69

gmnash commented Dec 19, 2012

jskorpan commented Dec 20, 2012

gmnash commented Dec 21, 2012

joewalnes commented Dec 21, 2012

jskorpan commented Jan 7, 2013

jskorpan commented Jan 16, 2013

gmnash commented Jan 16, 2013

jskorpan commented Jan 16, 2013

gmnash commented Jan 16, 2013

jskorpan commented Jan 17, 2013

gmnash commented Jan 17, 2013

jskorpan commented Jan 17, 2013

jskorpan commented Jan 18, 2013

gmnash commented Jan 22, 2013

jskorpan commented Jan 22, 2013

gmnash commented Jan 22, 2013

jskorpan commented Jan 23, 2013

jskorpan commented Feb 12, 2013

gmnash commented Feb 14, 2013

jskorpan commented Feb 15, 2013

gmnash commented Feb 15, 2013

gmnash commented Mar 15, 2013

jskorpan commented Mar 18, 2013

trottier commented May 3, 2013

jskorpan commented May 6, 2013

trottier commented May 6, 2013

vlovich commented Oct 23, 2014

jskorpan commented Oct 23, 2014

vlovich commented Oct 23, 2014

Float decoding problem #69

Float decoding problem #69

Comments

gmnash commented Dec 19, 2012

jskorpan commented Dec 20, 2012

gmnash commented Dec 21, 2012

joewalnes commented Dec 21, 2012

jskorpan commented Jan 7, 2013

jskorpan commented Jan 16, 2013

gmnash commented Jan 16, 2013

jskorpan commented Jan 16, 2013

gmnash commented Jan 16, 2013

jskorpan commented Jan 17, 2013

gmnash commented Jan 17, 2013

jskorpan commented Jan 17, 2013

jskorpan commented Jan 18, 2013

gmnash commented Jan 22, 2013

jskorpan commented Jan 22, 2013

gmnash commented Jan 22, 2013

jskorpan commented Jan 23, 2013

jskorpan commented Feb 12, 2013

gmnash commented Feb 14, 2013

jskorpan commented Feb 15, 2013

gmnash commented Feb 15, 2013

gmnash commented Mar 15, 2013

jskorpan commented Mar 18, 2013

trottier commented May 3, 2013

jskorpan commented May 6, 2013

trottier commented May 6, 2013

vlovich commented Oct 23, 2014

jskorpan commented Oct 23, 2014

vlovich commented Oct 23, 2014