Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output NUMERIC type as string in JSON #245

Closed
makkarpov opened this issue Jun 6, 2022 · 8 comments · Fixed by #255
Closed

Output NUMERIC type as string in JSON #245

makkarpov opened this issue Jun 6, 2022 · 8 comments · Fixed by #255

Comments

@makkarpov
Copy link

Currently numeric values are output as-is, e.g.

{
   "action": "U",
   "schema": "public",
   "table": "type_test",
   "columns": [
      {
         "name": "id",
         "value": 1
      },
      // ...
      {
         "name": "f_numeric",
         "value": 1012345000999912345001230123445566.00000000
      }
   ],
   "identity": [
      {
         "name": "id",
         "value": 1
      }
   ]
}

This is explicitly discouraged by RFC 7156, section 6:

Since software that implements IEEE 754-2008 binary64 (double precision) numbers is generally available and widely used, good interoperability can be achieved by implementations that expect no more precision or range than these provide, in the sense that implementations will approximate JSON numbers within the expected precision. A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate potential interoperability problems, since it suggests that the software that created it expects receiving software to have greater capabilities for numeric magnitude and precision than is widely available.

Not every language (even ones that support bignums) and not every library will correctly parse such JSON, and silent data corruptions could occur when such numerics are implicitly casted to double.

Much safer way is to output arbitrary precision numbers as strings, so user could handle them without risk of losing precision.

@eulerto
Copy link
Owner

eulerto commented Jun 25, 2022

The JSON spec says nothing about this. The exact number representation is the one provided by Postgres. You didn't show the data types here but I bet that column "f_numeric" has a numeric data type. The data types real and double precision follow the IEEE 754 standard so it provides compact representation for such numbers.

postgres=# create table bar (a double precision);
CREATE TABLE
postgres=# insert into bar (a) values(1012345000999912345001230123445566.00000000);
INSERT 0 1
postgres=# select a from bar;
           a            
------------------------
 1.0123450009999124e+33
(1 row)

postgres=# create table baz (a numeric(50,8));
CREATE TABLE
postgres=# insert into baz (a) values(1012345000999912345001230123445566.00000000);
INSERT 0 1
postgres=# select a from baz;
                      a                      
---------------------------------------------
 1012345000999912345001230123445566.00000000
(1 row)

wal2json won't represent numbers as strings. That boat has sailed. Besides that the JSON spec (ECMA-404) provides a number as a valid JSON value so we should use it. What you are suggesting is that nobody won't do math with numbers and that JSON libraries are not prepared for big numbers. For the former, you are wrong or you are using the wrong data type in your database. For the latter, it is a weak argument; if the issue is in the JSON library that you are using, you should complain to the JSON library developers and not wal2json or Postgres. The JSON format provides a strongly typed system that avoid issues with typing rules and unpredictable or erroneous results.

@makkarpov
Copy link
Author

makkarpov commented Jun 25, 2022

What you are suggesting is that nobody won't do math with numbers

You could do any math with numbers, but you should do it once you cast the number to some bignum/bigdec type, which is not native for most of the languages.

For the latter, it is a weak argument; if the issue is in the JSON library that you are using, you should complain to the JSON library developers

E.g. JavaScript (JSON.parse coerces anything to double). E.g. Python, it does the same, even though it has native bignum type (but not bigdec). Surely, you could complain about whole surrounding world. You could raise issue to node, you could raise issue to V8, you could raise issue to Python devs, but complaining about everything isn't perfect way to deal with issues like that.

Even in strongly typed languages most libraries will coerce the number to double if they don't have type information that they should'nt. Just because if you will try to parse everyting as bignum, you will get huge performance hit on 99.9999999% of JSON that don't contain such bignums. This means that such JSON will not survive parsing it as some generic JsonNode. If your solution does not work with most of the ecosystem, this is not an 'issue in the JSON library'. And this is why RFC has such paragraph.

The JSON format provides a strongly typed system that avoid issues with typing rules and unpredictable or erroneous results.

This is why the issue exists, because that 'strongly typed system' is inable to differentiate IEEE numbers from big numbers, and many JSON libraries will misinterpret later as former.

You can make it opt-in via slot option or via next format version, but best practice is to explicitly convert big numbers as strings, because there is a lot of real-world software that will misinterpret any number as an IEEE number, leading to a lot of silent precision losses and data corruptions.

@DimCitus
Copy link

DimCitus commented Nov 2, 2022

Hi all,

Please also consider NaN and Infinity values (positive, negative) that are converted to NULL by wal2json at the moment. See also dimitri/pgcopydb#127 where we're having trouble with the current number processing done in wal2json. I still need to dive into the situation more, my first reaction is that an option in wal2json to send numbers as their Postgres string representation would be good.

What do you think @eulerto ?

@eulerto
Copy link
Owner

eulerto commented Nov 4, 2022

@DimCitus Hmm. I wasn't sold about the "losing precision" argument but I didn't consider the NaN and Infinity cases. I will review the PR #255 soon.

@DimCitus
Copy link

DimCitus commented Nov 4, 2022

Also given the variety of JSON parsers available in the wild, I must admit I would feel comfortable with an option that always output numeric values as JSON strings, using the exact string representation that Postgres would use itself in pg_dump and pg_restore, in a way that my replication client using wal2json is known safe even against parsing bugs or incompleteness in the JSON lib I happen to have found with the right license and the right language...

@eulerto
Copy link
Owner

eulerto commented Nov 4, 2022

Are you referring to all numeric data types?

@DimCitus
Copy link

DimCitus commented Nov 4, 2022

Are you referring to all numeric data types?

I must admit that in the context of replication, a way to bypass parsing numbers entirely and instead having the guarantee that the number representation is the one that Postgres expects would be a tremendous feature. So yes, all numeric data types actually... so that I don't even have to think about it...

@eulerto
Copy link
Owner

eulerto commented Nov 4, 2022

It does make sense. I wouldn't imagine enable it only for numeric but not bigint because you can have the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants