Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/cql-pytest: regression test for old bug with CAST(f AS TEXT) pre… #15131

Closed
wants to merge 1 commit into from

Conversation

nyh
Copy link
Contributor

@nyh nyh commented Aug 23, 2023

…cision

When casting a float or double column to a string with CAST(f AS TEXT), Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does.

Before commit 71bbd74, this wasn't the case in Scylla: CAST(f AS TEXT) always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test fails before the aforementioned commit, and passes after it. It also passes on Cassandra.

Refs #15127

…cision

When casting a float or double column to a string with `CAST(f AS TEXT)`,
Scylla is expected to print the number with enough digits so that reading
that string back to a float or double restores the original number
exactly. This expectation isn't documented anywhere, but makes sense,
and is what Cassandra does.

Before commit 71bbd74, this wasn't the
case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision,
which was a bit under enough for a float (which can have 7 decimal digits
of precision), but very much not enough for a double (which can need 15
digits). The origin of this magic "6 digits" number was that Scylla uses
seastar::to_sstring() to print the float and double values, and before
the aforementioned commit those functions used sprintf with the "%g"
format - which always prints 6 decimal digits of precision! After that
commit, to_sstring() now uses a different approach (based on fmt) to
print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double
values to the database, cast them to text, and then recover the float
or double number from that text - and check that we get back exactly the
same float or double object. The test *fails* before the aforementioned
commit, and passes after it. It also passes on Cassandra.

Refs scylladb#15127

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
@github-actions github-actions bot deleted a comment from aws-amplify-us-east-2 bot Aug 23, 2023
@github-actions github-actions bot deleted a comment from nyh Aug 23, 2023
# expected_f. If the cast printed too few digits, it would not be
# equal.
ftext = list(cql.execute(f"SELECT CAST(f AS text) FROM {table3} WHERE p={p}"))[0][0]
read_f = c_float(float(ftext))
Copy link
Contributor

@tchaikov tchaikov Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't quite understand this part. it is c_float() which casts/truncates the 64-bit float-pointer number parsed by float to. i think what we are testing is here the functionality of c_float(), which is implemented by CPython. in other words, it can be reduced to something like

for n in numbers:
    cql.execute("... {n}")
    f64 = float(next(cql.execute("... "))[0])
    assert c_float(n) == c_float(f64)

and the 2nd test is something like:

for n in numbers:
    cql.execute("... {n}")
    f64 = float(next(cql.execute("... "))[0])
    assert n == f64

so i think they are duplicated if we don't intend to test CPython or its standard library. probably we can just keep the 2nd test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, both tests are important. Let me give you an example. Imagine that the Scylla fix was to change the sprintf format from "%g" to "%.8g" (for both float and double). With that "fix", the first test would pass: We will set a float (32-bit) column to whatever precision of PI it can carry, and then CAST will print it out in 8 digits of precision. The float(ftext) will parse this number to a double (yes, it's confusing - Python's "float" is C's and Scylla's "double"). We then trucate this to a 32-bit float (c_float). And expect that the result will be identical to the 32-bit truncation of the original number we started with. And it will, because 8 digits of precision is enough for a 32-bit float. But the second test will fail: 64-bit double ("float" type in Python) can have 15 digits of precision, so printing out only 8 of them and then scanning them will result in a different number.

You can add print(ftext) (and run with "cql-pytest/run --vs test_cast.py" to see this output) to see what these CASTS actually generate if you're curious. Before your Seastar patch, they generated only 6 digits of precision for both float and double types - which wasn't enough for either (the float was missing more-or-less one decimal digit of precision, double was missing much more precision).

Copy link
Contributor Author

@nyh nyh Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you missed in your summary is that the two tests use a different Scylla column - one uses the float "f" and the second uses the double "db". So they exercise different Scylla "CAST ... AS text" code paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, true. "db" (double) versus "f" (float). sorry, i completely missed it!

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just two nits,

  • c_float has its __eq__ so probably could use it
  • no need to materialize the returned result set using list. next() would suffice.

@nyh
Copy link
Contributor Author

nyh commented Aug 23, 2023

just two nits,

* `c_float` has its `__eq__` so probably could use it

I didn't know how - I couldn't find any documentation for c_float :-( The ".value" thing seemed to work, so I used it...

* no need to materialize the returned result set using `list`. `next()` would suffice.

Interesting. I always use the same list(cql.execute(....) idiom because I always want it in the form of a list, not an iterator, so I am just used to doing that. In any case, there is no performance difference, it's just one result anyway - it's not like I'm building a very long list and then only reading the first result.

@tchaikov
Copy link
Contributor

just two nits,

* `c_float` has its `__eq__` so probably could use it

I didn't know how - I couldn't find any documentation for c_float :-( The ".value" thing seemed to work, so I used it...

my bad. was inspecting its __eq__, but i forgot that object's default implementation is equivalent to is .

* no need to materialize the returned result set using `list`. `next()` would suffice.

Interesting. I always use the same list(cql.execute(....) idiom because I always want it in the form of a list, not an iterator, so I am just used to doing that. In any case, there is no performance difference, it's just one result anyway - it's not like I'm building a very long list and then only reading the first result.

yeah. probably just a difference in the tastes, i would prefer foo(bar(baz())) =)

@scylladb-promoter
Copy link
Contributor

raphaelsc pushed a commit to raphaelsc/scylla that referenced this pull request Aug 29, 2023
…cision

When casting a float or double column to a string with `CAST(f AS TEXT)`,
Scylla is expected to print the number with enough digits so that reading
that string back to a float or double restores the original number
exactly. This expectation isn't documented anywhere, but makes sense,
and is what Cassandra does.

Before commit 71bbd74, this wasn't the
case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision,
which was a bit under enough for a float (which can have 7 decimal digits
of precision), but very much not enough for a double (which can need 15
digits). The origin of this magic "6 digits" number was that Scylla uses
seastar::to_sstring() to print the float and double values, and before
the aforementioned commit those functions used sprintf with the "%g"
format - which always prints 6 decimal digits of precision! After that
commit, to_sstring() now uses a different approach (based on fmt) to
print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double
values to the database, cast them to text, and then recover the float
or double number from that text - and check that we get back exactly the
same float or double object. The test *fails* before the aforementioned
commit, and passes after it. It also passes on Cassandra.

Refs scylladb#15127

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb#15131
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants