test/cql-pytest: regression test for old bug with CAST(f AS TEXT) pre… #15131

nyh · 2023-08-23T08:17:32Z

…cision

When casting a float or double column to a string with CAST(f AS TEXT), Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does.

Before commit 71bbd74, this wasn't the case in Scylla: CAST(f AS TEXT) always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test fails before the aforementioned commit, and passes after it. It also passes on Cassandra.

Refs #15127

…cision When casting a float or double column to a string with `CAST(f AS TEXT)`, Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does. Before commit 71bbd74, this wasn't the case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits. This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test *fails* before the aforementioned commit, and passes after it. It also passes on Cassandra. Refs scylladb#15127 Signed-off-by: Nadav Har'El <nyh@scylladb.com>

tchaikov · 2023-08-23T10:13:16Z

test/cql-pytest/test_cast_data.py

+        # expected_f. If the cast printed too few digits, it would not be
+        # equal.
+        ftext = list(cql.execute(f"SELECT CAST(f AS text) FROM {table3} WHERE p={p}"))[0][0]
+        read_f = c_float(float(ftext))


i don't quite understand this part. it is c_float() which casts/truncates the 64-bit float-pointer number parsed by float to. i think what we are testing is here the functionality of c_float(), which is implemented by CPython. in other words, it can be reduced to something like

for n in numbers: cql.execute("... {n}") f64 = float(next(cql.execute("... "))[0]) assert c_float(n) == c_float(f64)

and the 2nd test is something like:

for n in numbers: cql.execute("... {n}") f64 = float(next(cql.execute("... "))[0]) assert n == f64

so i think they are duplicated if we don't intend to test CPython or its standard library. probably we can just keep the 2nd test?

No, both tests are important. Let me give you an example. Imagine that the Scylla fix was to change the sprintf format from "%g" to "%.8g" (for both float and double). With that "fix", the first test would pass: We will set a float (32-bit) column to whatever precision of PI it can carry, and then CAST will print it out in 8 digits of precision. The float(ftext) will parse this number to a double (yes, it's confusing - Python's "float" is C's and Scylla's "double"). We then trucate this to a 32-bit float (c_float). And expect that the result will be identical to the 32-bit truncation of the original number we started with. And it will, because 8 digits of precision is enough for a 32-bit float. But the second test will fail: 64-bit double ("float" type in Python) can have 15 digits of precision, so printing out only 8 of them and then scanning them will result in a different number.

You can add print(ftext) (and run with "cql-pytest/run --vs test_cast.py" to see this output) to see what these CASTS actually generate if you're curious. Before your Seastar patch, they generated only 6 digits of precision for both float and double types - which wasn't enough for either (the float was missing more-or-less one decimal digit of precision, double was missing much more precision).

I think what you missed in your summary is that the two tests use a different Scylla column - one uses the float "f" and the second uses the double "db". So they exercise different Scylla "CAST ... AS text" code paths.

ahh, true. "db" (double) versus "f" (float). sorry, i completely missed it!

tchaikov

just two nits,

c_float has its __eq__ so probably could use it
no need to materialize the returned result set using list. next() would suffice.

nyh · 2023-08-23T10:37:37Z

just two nits,

* `c_float` has its `__eq__` so probably could use it

I didn't know how - I couldn't find any documentation for c_float :-( The ".value" thing seemed to work, so I used it...

* no need to materialize the returned result set using `list`. `next()` would suffice.

Interesting. I always use the same list(cql.execute(....) idiom because I always want it in the form of a list, not an iterator, so I am just used to doing that. In any case, there is no performance difference, it's just one result anyway - it's not like I'm building a very long list and then only reading the first result.

tchaikov · 2023-08-23T10:55:19Z

just two nits,
* `c_float` has its `__eq__` so probably could use it
I didn't know how - I couldn't find any documentation for c_float :-( The ".value" thing seemed to work, so I used it...

my bad. was inspecting its __eq__, but i forgot that object's default implementation is equivalent to is .

* no need to materialize the returned result set using `list`. `next()` would suffice.
Interesting. I always use the same list(cql.execute(....) idiom because I always want it in the form of a list, not an iterator, so I am just used to doing that. In any case, there is no performance difference, it's just one result anyway - it's not like I'm building a very long list and then only reading the first result.

yeah. probably just a difference in the tastes, i would prefer foo(bar(baz())) =)

scylladb-promoter · 2023-08-23T13:00:15Z

CI state SUCCESS - https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/3209/

…cision When casting a float or double column to a string with `CAST(f AS TEXT)`, Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does. Before commit 71bbd74, this wasn't the case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits. This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test *fails* before the aforementioned commit, and passes after it. It also passes on Cassandra. Refs scylladb#15127 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb#15131

nyh requested review from tchaikov, fruch and eliransin August 23, 2023 08:17

github-actions bot deleted a comment from aws-amplify-us-east-2 bot Aug 23, 2023

github-actions bot deleted a comment from nyh Aug 23, 2023

denesb approved these changes Aug 23, 2023

View reviewed changes

tchaikov reviewed Aug 23, 2023

View reviewed changes

tchaikov approved these changes Aug 23, 2023

View reviewed changes

scylladb-promoter closed this in 5530c52 Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test/cql-pytest: regression test for old bug with CAST(f AS TEXT) pre… #15131

test/cql-pytest: regression test for old bug with CAST(f AS TEXT) pre… #15131

nyh commented Aug 23, 2023

tchaikov Aug 23, 2023 •

edited

nyh Aug 23, 2023

nyh Aug 23, 2023 •

edited

tchaikov Aug 23, 2023

tchaikov left a comment

nyh commented Aug 23, 2023

tchaikov commented Aug 23, 2023

scylladb-promoter commented Aug 23, 2023

test/cql-pytest: regression test for old bug with CAST(f AS TEXT) pre… #15131

test/cql-pytest: regression test for old bug with CAST(f AS TEXT) pre… #15131

Conversation

nyh commented Aug 23, 2023

tchaikov Aug 23, 2023 • edited

Choose a reason for hiding this comment

nyh Aug 23, 2023

Choose a reason for hiding this comment

nyh Aug 23, 2023 • edited

Choose a reason for hiding this comment

tchaikov Aug 23, 2023

Choose a reason for hiding this comment

tchaikov left a comment

Choose a reason for hiding this comment

nyh commented Aug 23, 2023

tchaikov commented Aug 23, 2023

scylladb-promoter commented Aug 23, 2023

tchaikov Aug 23, 2023 •

edited

nyh Aug 23, 2023 •

edited