Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Already-fixed bug) casting float to text resulted in too few digits. #15127

Closed
nyh opened this issue Aug 22, 2023 · 13 comments
Closed

(Already-fixed bug) casting float to text resulted in too few digits. #15127

nyh opened this issue Aug 22, 2023 · 13 comments

Comments

@nyh
Copy link
Contributor

nyh commented Aug 22, 2023

This issue was discovered by @eliransin because of changes to CAST dtests that did not refer to any specific change in Scylla and were never explained. The dtests for CAST of float to text expected a certain precision for this conversion, and one day the precision in the test was simply changed to match the latest Scylla, with no explanation and no discussion on which one, the old or the new, is the correct precision (CC @fruch)

It turns out that luckily, as I explain below, the new behavior is the correct one, so we don't need to fix this bug, but we do need to backport the fix we have to older releases. Importantly, @eliransin noticed that Scylla 5.2 does not have this fix, and still casts float and double to text with too few digits.

After my investigation, I discovered that before commit 71bbd74, SELECT CAST(f AS text) for a float column (32-bit floating-point) printed too few digits - if one casts a float value to text using Scylla, and then converts that text back to a 32-bit floating point, the result is a different number. Same for 64-bit floating point (double column type). Starting in commit 71bbd74 this problem was solved - the text cast result now has more digits, and converting it to a float restores the original number. But this was never explained in any commit message, let alone any github issue, and we never considered backporting it. In particular, @eliransin noticed that Scylla 5.2 does not have this fix, and still casts float to string with too few digits.

To find this commit, I wrote a cql-pytest that reproduces this issue - it passes on current Cassandra and Scylla, but fails on Scylla 5.2:

# Test casting 32-bit floating-point to text. The output string should have
# enough digits to faithfully reproduce the original floating-point value,
# not fewer digits (we don't check in this test whether it outputs too many
# digits).
def test_cast_float_to_text(cql, table3):
    p = unique_key_int()
    # pi's precision below is excessive - it is truncated to Python's 64-bit
    # float type, and later below further to a 32-bit float.
    pi = 3.141592653589793238462643383279502884197
    cql.execute(f'INSERT INTO {table3} (p, f) VALUES ({p}, {pi})')
    # If we read "f" back, we get pi truncated to 32-bit precision.
    # Python's "float" is a 64-bit floating-point number. To match Scylla's
    # 32-bit "float" type, we need to use c_float.
    expected_f = c_float(pi)
    read_f = c_float(list(cql.execute(f"SELECT f FROM {table3} WHERE p={p}"))[0][0])
    # c_float values cannot be compared, you need to compare their "value"
    assert expected_f.value == read_f.value
    # If we ask Scylla to cast f to text, and then convert this text back
    # to c_float, the result should be equal again to expected_f. If the cast
    # printed too few digits, it would not be equal.
    ftext = list(cql.execute(f"SELECT CAST(f AS text) FROM {table3} WHERE p={p}"))[0][0]
    read_f = c_float(float(ftext))
    assert expected_f.value == read_f.value

I used "git bisect" to find when this test started to succeed. The result is commit 71bbd74 is a Seastar module update, with the description:

    Update seastar submodule
    
    * seastar 8889cbc198...d41af8b592 (14):
      > Merge 'Perf stall detector related improvements' from Travis Downs
    Ref #8828, #7882, #11582 (may help make progress)
      > build: pass HEAPPROF definition to src/core/reactor.cc too
      > Limit memory address space per core to 64GB when hwloc is not available
      > build: revert use pkg_search_module(.. IMPORTED_TARGET ..) changes
      > Fix missing newlines in seastar-addr2line
      > Use an integral type for uniform_int_distribution
      > Merge 'tls_test: use a dedicated https server for testing' from Kefu Chai
      > build: use ${CMAKE_BINARY_DIR} when running 'cmake --build ..'
      > build: do not set c-ares_FOUND with PARENT_SCOPE
      > reactor: drop unused member function declaration
      > sstring: refactor to_sstring() using fmt::format_to()
      > http: delay input stream close until responses sent
      > build: enable non-library targets using default option value
      > Merge 'sstring: specialize uninitialize_string() and use resize_and_overwrite if available' from Kefu Chai

I suspect what fixed the float-to-text casting was Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f by @tchaikov:

    sstring: refactor to_sstring() using fmt::format_to() 
    as fmtlib claims that it is
    > faster than common standard library implementations of (s)printf
    and the new implementation is also simpler than the old one.

I suspect the new code is not only faster and simpler, it's more correct (not missing digits) for the "float" type.
In particular, the old code had

-string_type to_sstring(float value) {
-    return to_sstring_sprintf<string_type>(value, "%g");
-}

and the new code has a generic thing which only god and @tchaikov knows what it does ;-)

+string_type to_sstring(T value) {
+    auto size = fmt::formatted_size("{}", value);
+    auto formatted = uninitialized_string<string_type>(size);
+    fmt::format_to(formatted.data(), "{}", value);
+    return formatted;
+}

What happened here is bad on several fronts:

  1. It is bad we fixed a bug without realizing it - and in particular never backported this fix.
  2. It is bad that Scylla's cast implementation uses some Seastar to_sstring() function, which was never document what exactly it is supposed to be doing :-(
  3. It is bad that we noticed in dtest that something changed, but just changed the test to match the current situation without realizing this means we fixed a bug. To be fair, we had a barrage of weird CAST dtest failures around the same time - that changed behavior for different reasons (not just the one described here), so it's likely that this change was lost in the noise and the test was just modified to enshrine the new behavior - without understanding why it changed.

Let's consider now:

  1. Backporting this fix
  2. Reconsider why Scylla's cast uses Seastar's to_sstring().
  3. Insert my test into the test suite (I'll send a PR for that).
@nyh
Copy link
Contributor Author

nyh commented Aug 23, 2023

Two of the dtest changes are https://github.com/scylladb/scylla-dtest/pull/3130 are https://github.com/scylladb/scylla-dtest/pull/3284. These patches added more or fewer digits to the test's gold truth without investigating why this even changed across Scylla versions, and whether this change was for the better or for the worse (I claim now it was for the better, so at least that).

@nyh
Copy link
Contributor Author

nyh commented Aug 23, 2023

I confirmed that cherry-picking the Seastar patch 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f makes the test above pass, so this is the Seastar patch that should be backported.

I also know why this issue existed before this patch. Before this patch, float (and double!) was printed with "%g". According to sprintf() documention, "%g" uses 6 digits of precision, which is one-two less than the actual digits of precision in a "float", and much less than the precision of a "double".

nyh added a commit to nyh/scylla that referenced this issue Aug 23, 2023
…cision

When casting a float or double column to a string with `CAST(f AS TEXT)`,
Scylla is expected to print the number with enough digits so that reading
that string back to a float or double restores the original number
exactly. This expectation isn't documented anywhere, but makes sense,
and is what Cassandra does.

Before commit 71bbd74, this wasn't the
case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision,
which was a bit under enough for a float (which can have 7 decimal digits
of precision), but very much not enough for a double (which can need 15
digits). The origin of this magic "6 digits" number was that Scylla uses
seastar::to_sstring() to print the float and double values, and before
the aforementioned commit those functions used sprintf with the "%g"
format - which always prints 6 decimal digits of precision! After that
commit, to_sstring() now uses a different approach (based on fmt) to
print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double
values to the database, cast them to text, and then recover the float
or double number from that text - and check that we get back exactly the
same float or double object. The test *fails* before the aforementioned
commit, and passes after it. It also passes on Cassandra.

Refs scylladb#15127

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
@nyh
Copy link
Contributor Author

nyh commented Aug 23, 2023

@fruch @eliransin I don't know what to do about backporting this fix to earlier branches like 5.2. The code backport itself is trivial (just requires cherry-picking Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f), but the problem will be dtest: The 5.2 and 5.1 cast dtests enshrine the old, incorrect, behavior, so it will be messy to fix. Perhaps we need to begin by backporting new dtest to 5.1 and 5.2 and qualify it with "requires" on this issue, and then the dtest will come alive only when this issue is closed? I don't know how exactly to do that correctly.

@eliransin
Copy link
Contributor

@nyh , I think you are right.
We should:

  1. revert + add requires for the branches that got the "fix" in dtest
  2. Actually fix the branches and close this issue. ( This should be done by updating the seastar module which I see that was recently done two weeks ago for those branches - @avikivity is there a particular reason not to update the seastar modules on 5.2 and 5.1 branches? it will fix a casting bug in Scylla)

@bhalevy any objections? Do you think we should do it differently?

@nyh
Copy link
Contributor Author

nyh commented Aug 23, 2023

Actually fix the branches and close this issue. ( This should be done by updating the seastar module which I see that was recently done two weeks ago for those branches - @avikivity is there a particular reason not to update the seastar modules on 5.2 and 5.1 branches? it will fix a casting bug in Scylla)

On branches, we never "update the Seastar module" like we do in master. Instead, we have a separate Seastar branch, and we cherry-pick to it specific Seastar patches (not all the Seastar patches). I noted above the exact Seastar commit that needs to be backported, and I even tested that a cherry-pick of that commit works trivially (no manual changes needed) and makes my test pass.

@bhalevy
Copy link
Member

bhalevy commented Aug 23, 2023

I wish our dtest require marker was smarter so it could know in which branches an issue is fixed in, or if issues were cloned per branch they occur in, we could require a particular GH issue that corresponds to the relevant branch, and then the test would start running once the issue is fixed in the respective branch.

Otherwise, the test could be made to be backward compatible so to accept both old and new formats, as a first step, and a following patch can remove backward compatibility and allow only the new format - to be applied once the scylla side is fixed.

denesb pushed a commit that referenced this issue Aug 23, 2023
…cision

When casting a float or double column to a string with `CAST(f AS TEXT)`,
Scylla is expected to print the number with enough digits so that reading
that string back to a float or double restores the original number
exactly. This expectation isn't documented anywhere, but makes sense,
and is what Cassandra does.

Before commit 71bbd74, this wasn't the
case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision,
which was a bit under enough for a float (which can have 7 decimal digits
of precision), but very much not enough for a double (which can need 15
digits). The origin of this magic "6 digits" number was that Scylla uses
seastar::to_sstring() to print the float and double values, and before
the aforementioned commit those functions used sprintf with the "%g"
format - which always prints 6 decimal digits of precision! After that
commit, to_sstring() now uses a different approach (based on fmt) to
print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double
values to the database, cast them to text, and then recover the float
or double number from that text - and check that we get back exactly the
same float or double object. The test *fails* before the aforementioned
commit, and passes after it. It also passes on Cassandra.

Refs #15127

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15131
@nyh
Copy link
Contributor Author

nyh commented Aug 27, 2023

I wish our dtest require marker was smarter so it could know in which branches an issue is fixed in

Right, this needs to be fixed in two branches (5.1 and 5.2) or even more, but when we close this issue all of those branches will need to already work. We can open multiple issues, one for each backport, but it's ugly :-(

Otherwise, the test could be made to be backward compatible so to accept both old and new formats, as a first step, and a following patch can remove backward compatibility and allow only the new format - to be applied once the scylla side is fixed.

There's an even easier approach:

  1. Delete the file cql_cast_test.py from branches 5.2 and 5.1 in dtest, or alternatively, @skip all the tests in it.
  2. Fix Scylla in branches 5.2 and 5.1 (a trivial Seastar backport)
  3. Bring back cql_cast_test.py to branches 5.2 and 5.1 in dtest - copy the code from the master version, not the old 5.2/5.1 version.

If you find this approach acceptable, I can do it. I hope step 3 doesn't find additional CAST bugs we fixed recently - if it does, those should probably be backported too instead of hiding it in the test.

@fruch
Copy link
Contributor

fruch commented Aug 27, 2023

I wish our dtest require marker was smarter so it could know in which branches an issue is fixed in

Right, this needs to be fixed in two branches (5.1 and 5.2) or even more, but when we close this issue all of those branches will need to already work. We can open multiple issues, one for each backport, but it's ugly :-(

Otherwise, the test could be made to be backward compatible so to accept both old and new formats, as a first step, and a following patch can remove backward compatibility and allow only the new format - to be applied once the scylla side is fixed.

There's an even easier approach:

  1. Delete the file cql_cast_test.py from branches 5.2 and 5.1 in dtest, or alternatively, @skip all the tests in it.
  2. Fix Scylla in branches 5.2 and 5.1 (a trivial Seastar backport)
  3. Bring back cql_cast_test.py to branches 5.2 and 5.1 in dtest - copy the code from the master version, not the old 5.2/5.1 version.

If you find this approach acceptable, I can do it. I hope step 3 doesn't find additional CAST bugs we fixed recently - if it does, those should probably be backported too instead of hiding it in the test.

I won't skip the tests in 1, just remove them from gating if they are part of gating, and introduce them at the end (part 3)

Also we'll need to follow up the same process on 2022.2 and 2023.1

raphaelsc pushed a commit to raphaelsc/scylla that referenced this issue Aug 29, 2023
…cision

When casting a float or double column to a string with `CAST(f AS TEXT)`,
Scylla is expected to print the number with enough digits so that reading
that string back to a float or double restores the original number
exactly. This expectation isn't documented anywhere, but makes sense,
and is what Cassandra does.

Before commit 71bbd74, this wasn't the
case in Scylla: `CAST(f AS TEXT)` always printed 6 digits of precision,
which was a bit under enough for a float (which can have 7 decimal digits
of precision), but very much not enough for a double (which can need 15
digits). The origin of this magic "6 digits" number was that Scylla uses
seastar::to_sstring() to print the float and double values, and before
the aforementioned commit those functions used sprintf with the "%g"
format - which always prints 6 decimal digits of precision! After that
commit, to_sstring() now uses a different approach (based on fmt) to
print the float and double values, that prints all significant digits.

This patch adds a regression test for this bug: We write float and double
values to the database, cast them to text, and then recover the float
or double number from that text - and check that we get back exactly the
same float or double object. The test *fails* before the aforementioned
commit, and passes after it. It also passes on Cassandra.

Refs scylladb#15127

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb#15131
@eliransin
Copy link
Contributor

@bhalevy , I think that for start we should eliminate the branching in dtest, and always run the master (for individual tests we will mark minimum/range version that the test can run on). That way a failure of some dtest on a specific scylla branch would mean either, changed behaviour on later versions or a forgotten backport which we should apply).
We might also extend the requires with something similar so every version can have it's own requires (which is the backport request issue).

@fruch
Copy link
Contributor

fruch commented Sep 3, 2023

@bhalevy , I think that for start we should eliminate the branching in dtest, and always run the master (for individual tests we will mark minimum/range version that the test can run on). That way a failure of some dtest on a specific scylla branch would mean either, changed behaviour on later versions or a forgotten backport which we should apply). We might also extend the requires with something similar so every version can have it's own requires (which is the backport request issue).

it won't solve the issue completely, you'll still need to update again and again the master branch with the targets of the test of a part of the test, the problem start when it's not a specific test is some logic inside the test that is need to be changed base on a version, in places we tried doing so, it got out of hand and out sync.

for future release branch maybe it would be better, since the next gating gonna run most of the dtest.

anyhow even if we push for such a suggestion, I would go at trying to apply it backwards on older branches.
also the definition of what you'll expected out of requires marker, should be carefully defined before you can start such an effort

nyh pushed a commit to scylladb/scylla-seastar that referenced this issue Oct 8, 2023
as fmtlib claims that it is

> faster than common standard library implementations of (s)printf

and the new implementation is also simpler than the old one.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 4f4e84b)

Refs scylladb/scylladb#15127
nyh added a commit to nyh/scylla that referenced this issue Oct 8, 2023
Backported Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f:

  > sstring: refactor to_sstring() using fmt::format_to()

Refs scylladb#15127
nyh pushed a commit to scylladb/scylla-seastar that referenced this issue Oct 8, 2023
as fmtlib claims that it is

> faster than common standard library implementations of (s)printf

and the new implementation is also simpler than the old one.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
(cherry picked from commit 4f4e84b)

Refs scylladb/scylladb#15127
nyh added a commit to nyh/scylla that referenced this issue Oct 8, 2023
Backported Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f:

* seastar 04a39f448...06bb98796 (1):
  > sstring: refactor to_sstring() using fmt::format_to()

Refs scylladb#15127
@nyh
Copy link
Contributor Author

nyh commented Oct 8, 2023

Opened two PRs to fix this in 5.2 and 5.1:
5.2: #15663
5.1: #15664

nyh added a commit that referenced this issue Oct 9, 2023
Backported Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f:

  > sstring: refactor to_sstring() using fmt::format_to()

Refs #15127

Closes #15663
@nyh
Copy link
Contributor Author

nyh commented Oct 9, 2023

Committed to next-5.2, cb7e7f1

nyh added a commit that referenced this issue Oct 9, 2023
Backported Seastar commit 4f4e84bb2cec5f11b4742396da7fc40dbb3f162f:

* seastar 04a39f448...06bb98796 (1):
  > sstring: refactor to_sstring() using fmt::format_to()

Refs #15127

Closes #15664
@nyh
Copy link
Contributor Author

nyh commented Oct 9, 2023

Committed to next-5.1, eaf93b3.

We're done backporting to live open-source branches. Closing this issue and removing the backport-candidate tag.

@nyh nyh closed this as completed Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants