-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test/cql-pytest: regression test for old bug with CAST(f AS TEXT) pre… #15131
Conversation
…cision When casting a float or double column to a string with `CAST(f AS TEXT)`, Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does. Before commit 71bbd74, this wasn't the case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits. This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test *fails* before the aforementioned commit, and passes after it. It also passes on Cassandra. Refs scylladb#15127 Signed-off-by: Nadav Har'El <nyh@scylladb.com>
# expected_f. If the cast printed too few digits, it would not be | ||
# equal. | ||
ftext = list(cql.execute(f"SELECT CAST(f AS text) FROM {table3} WHERE p={p}"))[0][0] | ||
read_f = c_float(float(ftext)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't quite understand this part. it is c_float()
which casts/truncates the 64-bit float-pointer number parsed by float
to. i think what we are testing is here the functionality of c_float()
, which is implemented by CPython. in other words, it can be reduced to something like
for n in numbers:
cql.execute("... {n}")
f64 = float(next(cql.execute("... "))[0])
assert c_float(n) == c_float(f64)
and the 2nd test is something like:
for n in numbers:
cql.execute("... {n}")
f64 = float(next(cql.execute("... "))[0])
assert n == f64
so i think they are duplicated if we don't intend to test CPython or its standard library. probably we can just keep the 2nd test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, both tests are important. Let me give you an example. Imagine that the Scylla fix was to change the sprintf format from "%g" to "%.8g" (for both float and double). With that "fix", the first test would pass: We will set a float (32-bit) column to whatever precision of PI it can carry, and then CAST will print it out in 8 digits of precision. The float(ftext)
will parse this number to a double (yes, it's confusing - Python's "float" is C's and Scylla's "double"). We then trucate this to a 32-bit float (c_float). And expect that the result will be identical to the 32-bit truncation of the original number we started with. And it will, because 8 digits of precision is enough for a 32-bit float. But the second test will fail: 64-bit double ("float" type in Python) can have 15 digits of precision, so printing out only 8 of them and then scanning them will result in a different number.
You can add print(ftext)
(and run with "cql-pytest/run --vs test_cast.py" to see this output) to see what these CASTS actually generate if you're curious. Before your Seastar patch, they generated only 6 digits of precision for both float and double types - which wasn't enough for either (the float was missing more-or-less one decimal digit of precision, double was missing much more precision).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you missed in your summary is that the two tests use a different Scylla column - one uses the float "f" and the second uses the double "db". So they exercise different Scylla "CAST ... AS text" code paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, true. "db" (double) versus "f" (float). sorry, i completely missed it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just two nits,
c_float
has its__eq__
so probably could use it- no need to materialize the returned result set using
list
.next()
would suffice.
I didn't know how - I couldn't find any documentation for c_float :-( The ".value" thing seemed to work, so I used it...
Interesting. I always use the same |
my bad. was inspecting its
yeah. probably just a difference in the tastes, i would prefer |
CI state |
…cision When casting a float or double column to a string with `CAST(f AS TEXT)`, Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does. Before commit 71bbd74, this wasn't the case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits. This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test *fails* before the aforementioned commit, and passes after it. It also passes on Cassandra. Refs scylladb#15127 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb#15131
…cision
When casting a float or double column to a string with
CAST(f AS TEXT)
, Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does.Before commit 71bbd74, this wasn't the case in Scylla:
CAST(f AS TEXT)
always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits.This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test fails before the aforementioned commit, and passes after it. It also passes on Cassandra.
Refs #15127