New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(Already-fixed bug) casting float to text resulted in too few digits. #15127
Comments
Two of the dtest changes are https://github.com/scylladb/scylla-dtest/pull/3130 are https://github.com/scylladb/scylla-dtest/pull/3284. These patches added more or fewer digits to the test's gold truth without investigating why this even changed across Scylla versions, and whether this change was for the better or for the worse (I claim now it was for the better, so at least that). |
I confirmed that cherry-picking the Seastar patch 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f makes the test above pass, so this is the Seastar patch that should be backported. I also know why this issue existed before this patch. Before this patch, float (and double!) was printed with "%g". According to sprintf() documention, "%g" uses 6 digits of precision, which is one-two less than the actual digits of precision in a "float", and much less than the precision of a "double". |
…cision When casting a float or double column to a string with `CAST(f AS TEXT)`, Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does. Before commit 71bbd74, this wasn't the case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits. This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test *fails* before the aforementioned commit, and passes after it. It also passes on Cassandra. Refs scylladb#15127 Signed-off-by: Nadav Har'El <nyh@scylladb.com>
@fruch @eliransin I don't know what to do about backporting this fix to earlier branches like 5.2. The code backport itself is trivial (just requires cherry-picking Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f), but the problem will be dtest: The 5.2 and 5.1 cast dtests enshrine the old, incorrect, behavior, so it will be messy to fix. Perhaps we need to begin by backporting new dtest to 5.1 and 5.2 and qualify it with "requires" on this issue, and then the dtest will come alive only when this issue is closed? I don't know how exactly to do that correctly. |
@nyh , I think you are right.
@bhalevy any objections? Do you think we should do it differently? |
On branches, we never "update the Seastar module" like we do in master. Instead, we have a separate Seastar branch, and we cherry-pick to it specific Seastar patches (not all the Seastar patches). I noted above the exact Seastar commit that needs to be backported, and I even tested that a cherry-pick of that commit works trivially (no manual changes needed) and makes my test pass. |
I wish our dtest Otherwise, the test could be made to be backward compatible so to accept both old and new formats, as a first step, and a following patch can remove backward compatibility and allow only the new format - to be applied once the scylla side is fixed. |
…cision When casting a float or double column to a string with `CAST(f AS TEXT)`, Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does. Before commit 71bbd74, this wasn't the case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits. This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test *fails* before the aforementioned commit, and passes after it. It also passes on Cassandra. Refs #15127 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #15131
Right, this needs to be fixed in two branches (5.1 and 5.2) or even more, but when we close this issue all of those branches will need to already work. We can open multiple issues, one for each backport, but it's ugly :-(
There's an even easier approach:
If you find this approach acceptable, I can do it. I hope step 3 doesn't find additional CAST bugs we fixed recently - if it does, those should probably be backported too instead of hiding it in the test. |
I won't skip the tests in 1, just remove them from gating if they are part of gating, and introduce them at the end (part 3) Also we'll need to follow up the same process on 2022.2 and 2023.1 |
…cision When casting a float or double column to a string with `CAST(f AS TEXT)`, Scylla is expected to print the number with enough digits so that reading that string back to a float or double restores the original number exactly. This expectation isn't documented anywhere, but makes sense, and is what Cassandra does. Before commit 71bbd74, this wasn't the case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision, which was a bit under enough for a float (which can have 7 decimal digits of precision), but very much not enough for a double (which can need 15 digits). The origin of this magic "6 digits" number was that Scylla uses seastar::to_sstring() to print the float and double values, and before the aforementioned commit those functions used sprintf with the "%g" format - which always prints 6 decimal digits of precision! After that commit, to_sstring() now uses a different approach (based on fmt) to print the float and double values, that prints all significant digits. This patch adds a regression test for this bug: We write float and double values to the database, cast them to text, and then recover the float or double number from that text - and check that we get back exactly the same float or double object. The test *fails* before the aforementioned commit, and passes after it. It also passes on Cassandra. Refs scylladb#15127 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb#15131
@bhalevy , I think that for start we should eliminate the branching in dtest, and always run the master (for individual tests we will mark minimum/range version that the test can run on). That way a failure of some dtest on a specific scylla branch would mean either, changed behaviour on later versions or a forgotten backport which we should apply). |
it won't solve the issue completely, you'll still need to update again and again the master branch with the targets of the test of a part of the test, the problem start when it's not a specific test is some logic inside the test that is need to be changed base on a version, in places we tried doing so, it got out of hand and out sync. for future release branch maybe it would be better, since the next gating gonna run most of the dtest. anyhow even if we push for such a suggestion, I would go at trying to apply it backwards on older branches. |
as fmtlib claims that it is > faster than common standard library implementations of (s)printf and the new implementation is also simpler than the old one. Signed-off-by: Kefu Chai <tchaikov@gmail.com> (cherry picked from commit 4f4e84b) Refs scylladb/scylladb#15127
Backported Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f: > sstring: refactor to_sstring() using fmt::format_to() Refs scylladb#15127
as fmtlib claims that it is > faster than common standard library implementations of (s)printf and the new implementation is also simpler than the old one. Signed-off-by: Kefu Chai <tchaikov@gmail.com> (cherry picked from commit 4f4e84b) Refs scylladb/scylladb#15127
Backported Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f: * seastar 04a39f448...06bb98796 (1): > sstring: refactor to_sstring() using fmt::format_to() Refs scylladb#15127
Committed to next-5.2, cb7e7f1 |
Committed to next-5.1, eaf93b3. We're done backporting to live open-source branches. Closing this issue and removing the backport-candidate tag. |
This issue was discovered by @eliransin because of changes to CAST dtests that did not refer to any specific change in Scylla and were never explained. The dtests for CAST of float to text expected a certain precision for this conversion, and one day the precision in the test was simply changed to match the latest Scylla, with no explanation and no discussion on which one, the old or the new, is the correct precision (CC @fruch)
It turns out that luckily, as I explain below, the new behavior is the correct one, so we don't need to fix this bug, but we do need to backport the fix we have to older releases. Importantly, @eliransin noticed that Scylla 5.2 does not have this fix, and still casts float and double to text with too few digits.
After my investigation, I discovered that before commit 71bbd74,
SELECT CAST(f AS text)
for a float column (32-bit floating-point) printed too few digits - if one casts a float value to text using Scylla, and then converts that text back to a 32-bit floating point, the result is a different number. Same for 64-bit floating point (double column type). Starting in commit 71bbd74 this problem was solved - the text cast result now has more digits, and converting it to a float restores the original number. But this was never explained in any commit message, let alone any github issue, and we never considered backporting it. In particular, @eliransin noticed that Scylla 5.2 does not have this fix, and still casts float to string with too few digits.To find this commit, I wrote a cql-pytest that reproduces this issue - it passes on current Cassandra and Scylla, but fails on Scylla 5.2:
I used "git bisect" to find when this test started to succeed. The result is commit 71bbd74 is a Seastar module update, with the description:
I suspect what fixed the float-to-text casting was Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f by @tchaikov:
I suspect the new code is not only faster and simpler, it's more correct (not missing digits) for the "float" type.
In particular, the old code had
and the new code has a generic thing which only god and @tchaikov knows what it does ;-)
What happened here is bad on several fronts:
Let's consider now:
The text was updated successfully, but these errors were encountered: