Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot Cast Counter To Double #14501

Closed
1 task done
jonny7 opened this issue Jul 4, 2023 · 26 comments · Fixed by #14745
Closed
1 task done

Cannot Cast Counter To Double #14501

jonny7 opened this issue Jul 4, 2023 · 26 comments · Fixed by #14745

Comments

@jonny7
Copy link

jonny7 commented Jul 4, 2023

This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.

  • I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

Installation details
Scylla version (or git commit hash): 5.2.2
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI):

Hardware details (for performance issues) Delete if unneeded
Platform (physical/VM/cloud instance type/docker): Docker / Kubernetes

Trying to CAST a Counter to a double fails with

InvalidRequest: Error from server: code=2200 [Invalid query] message="org.apache.cassandra.db.marshal.CounterColumnType cannot be cast to org.apache.cassandra.db.marshal.DoubleType"

I've replicated this in Docker with version 5.1.7 + 5.2.2.

I've just tested this with Cassandra's latest Docker image too and it returns fine.

Thanks

@michoecho
Copy link
Contributor

cc @cvybhu

@cvybhu
Copy link
Contributor

cvybhu commented Jul 4, 2023

Thank you for the bug report.
It looks like there's no implementation for converting counter values to double, we need to implement this feature.

A switch case needs to be added here:

case cast_switch_case_val(kind::double_kind, kind::byte):
return castas_fctn_simple<double, int8_t>;
case cast_switch_case_val(kind::double_kind, kind::short_kind):
return castas_fctn_simple<double, int16_t>;
case cast_switch_case_val(kind::double_kind, kind::int32):
return castas_fctn_simple<double, int32_t>;
case cast_switch_case_val(kind::double_kind, kind::long_kind):
return castas_fctn_simple<double, int64_t>;
case cast_switch_case_val(kind::double_kind, kind::float_kind):
return castas_fctn_simple<double, float>;
case cast_switch_case_val(kind::double_kind, kind::varint):
return castas_fctn_simple<double, utils::multiprecision_int>;
case cast_switch_case_val(kind::double_kind, kind::decimal):
return castas_fctn_from_decimal_to_float<double>;

As a temporary workaround, could you convert the value to double on the driver side?

@jonny7
Copy link
Author

jonny7 commented Jul 4, 2023

Thank you for the bug report. It looks like there's no implementation for converting counter values to double, we need to implement this feature.

A switch case needs to be added here:

case cast_switch_case_val(kind::double_kind, kind::byte):
return castas_fctn_simple<double, int8_t>;
case cast_switch_case_val(kind::double_kind, kind::short_kind):
return castas_fctn_simple<double, int16_t>;
case cast_switch_case_val(kind::double_kind, kind::int32):
return castas_fctn_simple<double, int32_t>;
case cast_switch_case_val(kind::double_kind, kind::long_kind):
return castas_fctn_simple<double, int64_t>;
case cast_switch_case_val(kind::double_kind, kind::float_kind):
return castas_fctn_simple<double, float>;
case cast_switch_case_val(kind::double_kind, kind::varint):
return castas_fctn_simple<double, utils::multiprecision_int>;
case cast_switch_case_val(kind::double_kind, kind::decimal):
return castas_fctn_from_decimal_to_float<double>;

As a temporary workaround, could you convert the value to double on the driver side?

Thanks for this, it's Grafana that is doing the queries so in that case unfortunately not.

@nyh
Copy link
Contributor

nyh commented Jul 4, 2023

In cql3/functions/castas_fcts.cc we have a four-year-old comment:

// FIXME: Add conversions for counters, after they are fully implemented...

Maybe it's time to do that...

@mykaul @eliransin we also need to do some soul-searching why we didn't find this bug much earlier:

  1. It appears the author of dtest's cql_cast_test.py was aware of this missing implementation and left behind a TODO: counters are not supported. should be added later instead of creating xfailing tests.
  2. The author of test/cql-pytest/test_cast_data.py (that's me...) created this file to check one specific bug, and was too much in a hurry to write tests for all types and left a # TODO: test casts from more types..
  3. Cassandra has a unit test for these casts, which would have caught this bug - functions/CastFctsTest.java. If the author of the test translation project Translate Cassandra's CQL unit tests into Scylla's cql-pytest framework #7722 (again, that's me...) put more effort into it, we would have known about this bug already.

Of course, now that we do know about this bug, we also need to fix it.

@nyh
Copy link
Contributor

nyh commented Jul 4, 2023

I'm removing the "enhancement" tag - as the user noted, missing small details in CQL can break existing applications, and will be perceived by users as a bug, not a request for a new feature.

@mykaul
Copy link
Contributor

mykaul commented Jul 4, 2023

Maybe they aren't a popular use case. I wonder what Grafana input is.

@nyh
Copy link
Contributor

nyh commented Jul 4, 2023

@mykaul neither counters not casts are very "popular" features, so their combination is probably even less popular - but they do exist, and Grafana (which I didn't even know could use Cassandra as a backend, nice!) might not be the only Cassandra application that uses them.

@nyh
Copy link
Contributor

nyh commented Jul 5, 2023

This bug is reproduced by the translated Cassandra unit test:

  • cassandra_tests/functions/cast_fcts_test.py::testCounterCastsInSelectionClause

nyh added a commit to nyh/scylla that referenced this issue Jul 5, 2023
This is a translation of Cassandra's CQL unit test source file
functions/CastFctsTest.java into our cql-pytest framework.

There are 13 tests, 9 of them currently xfail.

The failures are caused by one recently-discovered issue:

Refs scylladb#14501: Cannot Cast Counter To Double

and by three previously unknown or undocumented issues:

Refs scylladb#14508: SELECT CAST column names should match Cassandra's
Refs scylladb#14518: CAST from timestamp to string not same as Cassandra on zero
             milliseconds
Refs scylladb#14522: Support CAST function not only in SELECT

Curiously, the careful translation of this test also caused me to
find a bug in Cassandra https://issues.apache.org/jira/browse/CASSANDRA-18647
which the test in Java missed because it made the same mistake as the
implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb pushed a commit that referenced this issue Jul 12, 2023
This is a translation of Cassandra's CQL unit test source file
functions/CastFctsTest.java into our cql-pytest framework.

There are 13 tests, 9 of them currently xfail.

The failures are caused by one recently-discovered issue:

Refs #14501: Cannot Cast Counter To Double

and by three previously unknown or undocumented issues:

Refs #14508: SELECT CAST column names should match Cassandra's
Refs #14518: CAST from timestamp to string not same as Cassandra on zero
             milliseconds
Refs #14522: Support CAST function not only in SELECT

Curiously, the careful translation of this test also caused me to
find a bug in Cassandra https://issues.apache.org/jira/browse/CASSANDRA-18647
which the test in Java missed because it made the same mistake as the
implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14528
@nyh nyh self-assigned this Jul 17, 2023
@nyh
Copy link
Contributor

nyh commented Jul 17, 2023

I could easily make the Cassandra test for counters work, but I think I opened Pandora's box :-( Even after I get CAST(c AS double) et al. to work, the silly CAST(c as counter) doesn't work - it invokes one of several fail(unimplemented::cause::COUNTERS) in type.cc.

So now I think that all these unimplemented::cause::COUNTERS are bugs - we do have counters implemented so we shouldn't have any of those "unimplemented" things left.

Recently in b25ee62 @wmitros also needed to deserialize counters and had to use weird workarounds because of these "unimplemented" things.

I'm actually really worried that some of this code that reads counter, including the one that @wmitros wrote and I'm writing now, might be wrong. The code assumes that a counter is just a 64-bit integer. But isn't counter actually a complicated CRDT object, and needs to be "evaluated" to figure out the real count? I'm worried that my test (which increments a counter in memtable, so it might be short-circuited?) doesn't actually exercise this case. Or, maybe it's fine to read a counter as a 64-bit integer because we already have some other code which implicitly does the calculation when a counter object is converted to a 64-bit integer?

The lua.cc has a comment saying "No data_value ever has a counter type, it is represented with long_type instead." I don't know if this is true and I can treat counter as a 64-bit integer. And if so, why is it an "abstract_type" instead of a "concrete_type" like the rest of the integer types?

@cvybhu
Copy link
Contributor

cvybhu commented Jul 17, 2023

I ran into the same thing. I thought it's just a quick fix, but the counter_type_impl isn't fully implemented :C
The people who originally introduced the counter type didn't implement all the methods for this type.

They have to be implemented, but this requires much more work.
For example, can counter values be empty?

@cvybhu
Copy link
Contributor

cvybhu commented Jul 17, 2023

The good news is that once I implemented all the methods by calling the long_type_impl ones, the test started working. I could cast something to counter and back. Still I'm not sure how correct this is due to empty values.

@nyh
Copy link
Contributor

nyh commented Jul 18, 2023

There's something basic I don't understand. On-disk, a counter is some complicated thing with "shards" from different sstables, and so on. Am I correct in understanding that the representation in memory is always the sum of everything as just a uint64_type? If that's true, why isn't counter_type simply an integer_type() like long_type()? Why is counter_type a subclass of abstract_type instead? Was there a reason why it had to be complicated this way?

@avikivity @mykaul who is the expert in the counters column type in the company?

@nyh
Copy link
Contributor

nyh commented Jul 18, 2023

They have to be implemented, but this requires much more work. For example, can counter values be empty?

Anything can be empty through ugly tricks (see test_empty.py::test_empty_int for example of empty int) but I wouldn't shed any tears if it doesn't work correctly. I'm much more worried that there's something basic I don't understand about how counters work - how can I be sure that the counter is really just one int64_t when the cast code reaches it and not multiple "shards" or something. I'm guessing it is (that whoever wrote counters did the SELECT part correctly), but I don't understand where exactly this happens.

@denesb
Copy link
Contributor

denesb commented Jul 18, 2023

Am I correct in understanding that the representation in memory is always the sum of everything as just a uint64_type?

No, counters are also represented as shards in memory too. But counters are conceptually a 64 bit integer, to calculate their value, you simply sum up all the shards.

We also have another type of (legacy?) counter, that I found traces of in the code. I have no idea what that is, maybe that is the remains of an older implementation.

@avikivity @mykaul who is the expert in the counters column type in the company?

Sadly, all counter experts have left the company. So we are left with reverse-engineering the code. Some people (Avi) might remember why things are they way they are.

I was always irked by how the counters were different then all the other types, with most regular type methods being unimplemented. My best guess is that this is simply due to we never getting around to properly finish this feature. I think it is time to remove all those nasty "unimplemented" traps.

@avikivity
Copy link
Member

Counters are just bigints with a funny name on the coordinator side (there's a back door via CQL SCYLLA_COUNTER_SHARD_LIST) . They're only special on the replica.

@nyh
Copy link
Contributor

nyh commented Jul 18, 2023

Counters are just bigints with a funny name on the coordinator side (there's a back door via CQL SCYLLA_COUNTER_SHARD_LIST) . They're only special on the replica.

So Avi, is there a reason why counter_type cannot be an integer_type() like long_type and inherit all the types.cc functions, instead of being an unimplemented special case everywhere?
However, I guess it will be simpler for me to just implement those special cases using the long_type implementation (just like @cvybhu said) instead of trying to think what I might ruin in other places in the code by changing counter_type's defintion...

@avikivity
Copy link
Member

Counters are just bigints with a funny name on the coordinator side (there's a back door via CQL SCYLLA_COUNTER_SHARD_LIST) . They're only special on the replica.

So Avi, is there a reason why counter_type cannot be an integer_type() like long_type and inherit all the types.cc functions, instead of being an unimplemented special case everywhere?

The actual type in the schema has to be different, otherwise we don't know it's a counter.

However, I guess it will be simpler for me to just implement those special cases using the long_type implementation (just like @cvybhu said) instead of trying to think what I might ruin in other places in the code by changing counter_type's defintion...

Yes.

@nyh
Copy link
Contributor

nyh commented Jul 18, 2023

They have to be implemented, but this requires much more work. For example, can counter values be empty?

Anything can be empty through ugly tricks (see test_empty.py::test_empty_int for example of empty int)

On second thought, I don't think a real counter can be "empty". Counters are written differently, and I don't see how an "empty" value can be written there. I wonder what happens, though, if you cast an "empty integer" (silly, but existing, concept) to other types, including counters. I guess yet another thing to test...

@avikivity
Copy link
Member

We should deprecate emptys and auto-convert them to NULLs.

@nyh
Copy link
Contributor

nyh commented Jul 18, 2023

I already have a working implementation that works well (and not too ugly) and will send a PR soon.

As usual, as G. I. Joe said, "knowing is half the battle" - tests (which are the vast majority of the content of my patch) exposed weird corner cases and even more unimplemented functions I didn't find in the first try.

@nyh
Copy link
Contributor

nyh commented Jul 18, 2023

On second thought, I don't think a real counter can be "empty". Counters are written differently, and I don't see how an "empty" value can be written there. I wonder what happens, though, if you cast an "empty integer" (silly, but existing, concept) to other types, including counters. I guess yet another thing to test...

It turns out (I wrote a test for this as well...) that neither Cassandra nor Scylla allow casting other types to "counter", so you can't cast an empty integer to a counter. Moreover, because of various limitations of CQL, I couldn't check what blobAsCounter(0x) (this is the trick often used to create an "empty integer") actually does, so I don't really know if it works as expected or not - there is no statement I could think of where this construct can actually be used.

nyh added a commit to nyh/scylla that referenced this issue Jul 18, 2023
We were missing support in the "CAST(x AS type)" function for the counter
type. This patch adds this support, as well as extensive testing that it
works in Scylla the same as Cassandra.

We also un-xfail an existing test translated from Cassandra's unit
test. But note that this old test did not cover all the edge-cases that
the new test checks - some missing cases in the implementation were
not caught by the old test.

Fixes scylladb#14501

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
denesb added a commit that referenced this issue Jul 31, 2023
…adav Har'El

We have had support for COUNTER columns for quite some time now, but some functionality was left unimplemented - various internal and CQL functions resulted in "unimplemented" messages when used, and the goal of this series is to fix those issues. The primary goal was to add the missing support for CASTing counters to other types in CQL (issue #14501), but we also add the missing CQL  `counterasblob()` and `blobascounter()` functions (issue #14742).

As usual, the series includes extensive functional tests for these features, and one pre-existing test for CAST that used to fail now begins to pass.

Fixes #14501
Fixes #14742

Closes #14745

* github.com:scylladb/scylladb:
  test/cql-pytest: test confirming that casting to counter doesn't work
  cql: support casting of counter to other types
  cql: implement missing counterasblob() and blobascounter() functions
  cql: implement missing type functions for "counters" type
@jonny7
Copy link
Author

jonny7 commented Aug 11, 2023

Any idea @nyh what release/when this fix will be released?

@nyh
Copy link
Contributor

nyh commented Aug 13, 2023

@jonny7 it is already available in the nightly image so you can try that out right away. But it's not in any released version: It will be in the next open-source version but there is not schedule for when this will happen (it hasn't been branched yet).

As I noted in #14501 (comment), I think we should consider this a compatibility bug, not a feature, so I want to backport it to the 5.2 and 5.1 branches. When I do that (stay tuned), you'll need to wait for the next 5.2.* or 5.1.* point releases to get this fix.

nyh added a commit that referenced this issue Aug 13, 2023
This is a translation of Cassandra's CQL unit test source file
functions/CastFctsTest.java into our cql-pytest framework.

There are 13 tests, 9 of them currently xfail.

The failures are caused by one recently-discovered issue:

Refs #14501: Cannot Cast Counter To Double

and by three previously unknown or undocumented issues:

Refs #14508: SELECT CAST column names should match Cassandra's
Refs #14518: CAST from timestamp to string not same as Cassandra on zero
             milliseconds
Refs #14522: Support CAST function not only in SELECT

Curiously, the careful translation of this test also caused me to
find a bug in Cassandra https://issues.apache.org/jira/browse/CASSANDRA-18647
which the test in Java missed because it made the same mistake as the
implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14528

(cherry picked from commit f08bc83)
nyh pushed a commit that referenced this issue Aug 13, 2023
…adav Har'El

We have had support for COUNTER columns for quite some time now, but some functionality was left unimplemented - various internal and CQL functions resulted in "unimplemented" messages when used, and the goal of this series is to fix those issues. The primary goal was to add the missing support for CASTing counters to other types in CQL (issue #14501), but we also add the missing CQL  `counterasblob()` and `blobascounter()` functions (issue #14742).

As usual, the series includes extensive functional tests for these features, and one pre-existing test for CAST that used to fail now begins to pass.

Fixes #14501
Fixes #14742

Closes #14745

* github.com:scylladb/scylladb:
  test/cql-pytest: test confirming that casting to counter doesn't work
  cql: support casting of counter to other types
  cql: implement missing counterasblob() and blobascounter() functions
  cql: implement missing type functions for "counters" type

(cherry picked from commit a637ddd)
@nyh
Copy link
Contributor

nyh commented Aug 13, 2023

Backported to 5.2:
The backport of a637ddd was clean only after also backporting three test-only patches that created the tests and test functions needed for the tests included in the backported patch: 78555ba, f08bc83, 328cdb2.
The backported version includes the tests, and I verified that they (and all cql-pytest in general) passes.

nyh added a commit that referenced this issue Aug 13, 2023
This is a translation of Cassandra's CQL unit test source file
functions/CastFctsTest.java into our cql-pytest framework.

There are 13 tests, 9 of them currently xfail.

The failures are caused by one recently-discovered issue:

Refs #14501: Cannot Cast Counter To Double

and by three previously unknown or undocumented issues:

Refs #14508: SELECT CAST column names should match Cassandra's
Refs #14518: CAST from timestamp to string not same as Cassandra on zero
             milliseconds
Refs #14522: Support CAST function not only in SELECT

Curiously, the careful translation of this test also caused me to
find a bug in Cassandra https://issues.apache.org/jira/browse/CASSANDRA-18647
which the test in Java missed because it made the same mistake as the
implementation.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #14528

(cherry picked from commit f08bc83)
(cherry picked from commit e03c21a)
nyh pushed a commit that referenced this issue Aug 13, 2023
…adav Har'El

We have had support for COUNTER columns for quite some time now, but some functionality was left unimplemented - various internal and CQL functions resulted in "unimplemented" messages when used, and the goal of this series is to fix those issues. The primary goal was to add the missing support for CASTing counters to other types in CQL (issue #14501), but we also add the missing CQL  `counterasblob()` and `blobascounter()` functions (issue #14742).

As usual, the series includes extensive functional tests for these features, and one pre-existing test for CAST that used to fail now begins to pass.

Fixes #14501
Fixes #14742

Closes #14745

* github.com:scylladb/scylladb:
  test/cql-pytest: test confirming that casting to counter doesn't work
  cql: support casting of counter to other types
  cql: implement missing counterasblob() and blobascounter() functions
  cql: implement missing type functions for "counters" type

(cherry picked from commit a637ddd)
Small modification was needed to validate_visitor API for the patch to
apply.
@nyh
Copy link
Contributor

nyh commented Aug 13, 2023

Backported to 5.1:
Again required backporting the same three test patches and yet another test patch 0c26032 for more missing test functions, and finally a637ddd (there was one small modification I needed to do in one of the function calls in that patch where the API changed a big).
Unfortunately, one of the backported tests - an irrelevant test that "unset" values in a request produce an error - doesn't pass on this old version, so I manually commented it out. This test it's not relevant to this issue (casting counters) and it's just an error handling case, not a bugfix we would backport anyway so I don't fill compelled to figure out which patch fixed that bug and backport it too.

Again I verified manually that all cql-pytest tests pass after the backport.

After backporting to both 5.2 and 5.1, I will remove the "backport candidate" tag.

@tzach
Copy link
Contributor

tzach commented Aug 21, 2023

Doc explicitly mention counter to double cast is supported https://opensource.docs.scylladb.com/stable/cql/functions.html#cast

@nyh do we need to fix the docs in more places?

@DoronArazii DoronArazii added this to the 5.4 milestone Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants