Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer schemas silently change the value of large ints, overflow to negative values #2378

Closed
aphyr opened this issue May 8, 2018 · 3 comments
Assignees
Labels
kind/maintenance Maintenance tasks, such as refactoring, with no impact in features.

Comments

@aphyr
Copy link

aphyr commented May 8, 2018

Dgraph's tour explains that the int schema is a 64-bit signed integer. However, on Dgraph 5b93fb4, int attributes have some strange behavior that suggests mayyyybe it's being converted to a floating-point intermediate representation. For instance, here's a table of values written to dgraph, followed by that same value as it is read back:

    9007199254740989 9007199254740989,
    9007199254740990 9007199254740990,
    9007199254740991 9007199254740991,
    9007199254740992 9007199254740992,
    9007199254740993 9007199254740992,
    9007199254740994 9007199254740994,
    9007199254740995 9007199254740996,
    9007199254740996 9007199254740996,
    9007199254740997 9007199254740996,
    9007199254740998 9007199254740998,
    9007199254740999 9007199254741000,

Notice that numbers above 9007199254740992N--2^53+1--are not exactly representable, and are remapped to numbers that would fit in an IEEE 754 double precision float. I think this behavior also interacts with a later coercion to signed ints, because...

    9223372036854775293 9223372036854774784,
    9223372036854775294 9223372036854774784,
    9223372036854775295 9223372036854774784,
    9223372036854775296 -9223372036854775808,
    9223372036854775297 -9223372036854775808,
    9223372036854775298 -9223372036854775808,
    9223372036854775299 -9223372036854775808,

Numbers above 9223372036854775296 are pinned to -9223372036854775809, which is -2^63. I thiiiink what's happening here is that the floating-point coercion is pushing these values up to 2^64, which causes an unchecked signed integer overflow, and wraps over to -2^63.

You can reproduce this with 56dce4d5b875bc2eec841564f865b72168c91938 by running

lein run test --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz --force-download --workload types --nemesis none --sequencing server --time-limit 500
@manishrjain
Copy link
Contributor

Just like #2377 , this is due to JSON parsing in Go. To deal well with huge integers, the recommended way would be to set the int schema, and then send data as hex encoded strings via JSON. Or, use RDFs.

Case in point: https://play.golang.org/p/rdh0UbgsDlM

@manishrjain
Copy link
Contributor

manishrjain commented May 17, 2018

So, a bit of backstory, @aphyr went and posted this on Twitter after my comments here: https://twitter.com/aphyr/status/997148938028318720.

And found that there's thing called json.Number -- which I didn't know about. So, it looks like by using Json.Number, we might be able to correctly tell if it's an int64 or a float64. I'll work on implementing that today, to see if we can now correctly parse numbers.

This is the Go Play code, which can correctly determine int64 vs float64:
https://play.golang.org/p/KgZzEbxA7e7

@manishrjain manishrjain reopened this May 17, 2018
@manishrjain manishrjain self-assigned this May 17, 2018
@manishrjain manishrjain added kind/maintenance Maintenance tasks, such as refactoring, with no impact in features. and removed wontfix labels May 17, 2018
manishrjain added a commit that referenced this issue May 18, 2018
With JSON mutations, we previously couldn't distinguish an int from float, using the standard json.Unmarshal. Now, we use a json.Decoder, with `UseNumber`, which allows us to correctly distinguish int from float.

This fixes #2377 and #2378 .
@manishrjain
Copy link
Contributor

Tested this with the above change. Works.

manishrjain added a commit that referenced this issue May 18, 2018
With JSON mutations, we previously couldn't distinguish an int from float, using the standard json.Unmarshal. Now, we use a json.Decoder, with `UseNumber`, which allows us to correctly distinguish int from float.

This fixes #2377 and #2378 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/maintenance Maintenance tasks, such as refactoring, with no impact in features.
Development

No branches or pull requests

2 participants