Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of table names with colon characters #227

Merged
merged 2 commits into from
Nov 21, 2023

Conversation

nyh
Copy link
Contributor

@nyh nyh commented Nov 20, 2023

As noted in #226, nodetool breaks if a table name contains the colon character. This normally can't happen - CQL table names can only contain alphanumerics and underscores, and the DynamoDB API adds also dashes and dots - but neither allows a colon. But unfortunately, when Alternator has a secondary index (GSI or LSI), the name of the view does get a colon (it looks like "tablename:viewname"). And this confuses JMX, which addresses objects using MBean names (see https://docs.oracle.com/javase/8/docs/api/javax/management/ObjectName.html), where a name that contains a colon needs to be quoted.

This small series fixes two things: First, some internal list of ObjectName objects needs to quote the names it holds to allow the colon character. Second, we should allow the external client - namely, nodetool - to send us a quoted name.

To complete this series I already have patches to two other repositories which I'll send shortly:

  1. A patch to scylla-tools-java (i.e., nodetool) to use quoting when sending a table name with a colon to the JMX server.
  2. A dtest patch to verify that before these patches, neither "nodetool info" or "nodetool tablestats" work, and after these patches, both work.

ColumnFamilyStore builds for its own uses a list of Scylla's tables using
the REST API. It holds each table's name in a formant meant for MBean,
in an ObjectName object. As the ObjectName documentation explains, when
a value contains a comma, equals, colon or quote - it must be quoted,
otherwise the ObjectName cannot be constructed.

Scylla table names can't ever contain commas, equals or quotes - but
they can contain colons, in the case of Alternator GSIs as discovered
in issue scylladb#226. So in this patch we quote the columnfamily value (using
the ObjectName.quote()) function, to allow it to contain a colon.

Our code also wants to read back the ObjectName, so we unquote() it
to restore the unquoted name.

We could have quoted the keyspace name as well, not just the columnfamily
name, but this isn't necessary because even in Alternator, keyspace names
cannot contain colons (or commas or equals or quotes).

After this patch, "notetool info" starts to work even if an Alternator
GSI exists. "nodetool info" doesn't need to access any specific table,
but it still causes our JMX implementation to build that list of tables
which failed before this patch and now works.

Fixes scylladb#226.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
After the previous patch, our JMX implementation no longer fails when
there is a table whose name contains the colon character (as in scylladb#226).
However, the JMX client (like nodetool) still can't *address* this
table - in order to be used in a MBean object name it must be quoted
(as explained in the documentation of ObjectName), but our server
doesn't understand the quoted name.

So in this patch we check if the "scope" (i.e., columnfamily name)
is quoted, and if it is, unquote it while searching for the table.
We don't do the same treatment for the other components of the bean
name (like keyspace) because those can't have a colon character.

After this patch, "nodetool tablestats" almost begins to work when
there are Alternator GSIs (with a colon in the table's name). It's
"almost" because we also need a patch in nodetool, so it will quote
the table name before sending it. The patch *here* is what will allow
this quoted name to be understood by the JMX server.

Refs scylladb#226.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
@nyh nyh requested review from elcallio and mykaul November 20, 2023 14:32
@nyh
Copy link
Contributor Author

nyh commented Nov 20, 2023

I need help from the JMX or packaging gurus: How do we confirm that this patch doesn't break anything in the dtest suite? (it's supposed to fix a dtest which isn't yet in, but we need to confirm it doesn't break any existing test).

nyh added a commit to nyh/scylla-tools-java that referenced this pull request Nov 20, 2023
As the ObjectName documentation explains, when an addressing an MBean
in JMX, when a value contains a comma, equals, colon or quote - it must
be quoted, otherwise the ObjectName cannot be constructed.

CQL or Alternator table names cannot contain any of these characters,
but in Alternator the names of secondary indexes *do* contain a colon,
so before this patch nodetool cannot address them - it tries to
construct an ObjectName with that name, and fails.

Note that "nodetool tablestats" is also broken by this bug, because even
if the user doesn't try to specify one specific table, nodetool iterates
on all tables and fails when it reaches the view with the colon in its
name.

So this patch adds the missing quoting if the table name contains a
colon.  We don't quote table names that don't have a colon to preserve
backward compatibility with old implmentation of JMX that don't know
how to unquote these names. A separate patch to scylla-jmx added
support for unquoting - see scylladb/scylla-jmx#227.

After this patch "nodetool tablestats" can work even if there is an
Alternator GSI or LSI.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
@nyh
Copy link
Contributor Author

nyh commented Nov 20, 2023

See also pull requests in other repositories:

  1. NodeProbe: allow addressing table name with colon in it scylla-tools-java#355 - fix nodetool to be able to quote a table name with a colon in it (which the fix in the second patch in this PR can unquote).
  2. https://github.com/scylladb/scylla-dtest/pull/3738 - two dtests that reproduce the original bug, and check that it's fixed. This PR only fixes the nodetool info test, the nodetool pull request mentioned above is also needed for the nodetool tablestats test to work.

Copy link
Contributor

@mykaul mykaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Was surprised that you need checks before 'unquote()' or it might throw an exception, but apparently that's the API.

@nyh
Copy link
Contributor Author

nyh commented Nov 20, 2023

LGTM. Was surprised that you need checks before 'unquote()' or it might throw an exception, but apparently that's the API.

Even after beeing knee-deep in this code for many hours, I'm not sure I fully understood what is really happening here or why it was designed this way. Why does unquote() even need to exist? After you used quote() to create a valid ObjectName, why doesn't the getKeyProperty() method retrieve the original unquoted string - why does the caller need to call unquote()? Anyway, experiments shows that this is what happens - if you used quote() to create the ObjectName, the quote characters are saved and returned by GetKeyProperty() :-(

All I can say is that I'm happy we are planning to get of this Java code eventually...

@denesb denesb merged commit 70e278b into scylladb:master Nov 21, 2023
@denesb
Copy link
Contributor

denesb commented Nov 21, 2023

Submodule update: scylladb/scylladb@282dc0a.

@nyh
Copy link
Contributor Author

nyh commented Nov 22, 2023

I need help from the JMX or packaging gurus: How do we confirm that this patch doesn't break anything in the dtest suite? (it's supposed to fix a dtest which isn't yet in, but we need to confirm it doesn't break any existing test).

@denesb did you do any such confirmation before merging this PR?
I see now next promotion failing on nodetool tests https://jenkins.scylladb.com/job/scylla-master/job/next/6846/#showFailuresLink - might be related :-(

@denesb
Copy link
Contributor

denesb commented Nov 22, 2023 via email

@nyh
Copy link
Contributor Author

nyh commented Nov 22, 2023

I can reproduce a failure of

PYTHONPATH=../scylla-ccm pytest --cassandra-dir=$HOME/scylla/build/dev nodetool_additional_test.py::TestNodetool::test_get_sstable

I'll try to debug it. The curious thing is that obviously my patch did not kill en-masse all the nodetool tests, it just broke two specific tests. I'll need to debug now why.

Do you want to revert the Scylla patch that updates the submodule and leave the JMX patch in and I'll send a followup patch? Or do you want to revert the scylla-jmx patch as well?

Also I repeat my question for the JMX and CI gurus - don't we have CI runs for JMX patches? How do we prevent these cases from happening in the future?

@denesb
Copy link
Contributor

denesb commented Nov 22, 2023

Do you want to revert the Scylla patch that updates the submodule and leave the JMX patch in and I'll send a followup patch? Or do you want to revert the scylla-jmx patch as well?

The submodule update is is only in next, it can simply be dequeued.

Also I repeat my question for the JMX and CI gurus - don't we have CI runs for JMX patches? How do we prevent these cases from happening in the future?

There is no CI for tools/* submodules, we merge to master directly. Also, there is usually no CI for submodule updates, unless the maintainer thinks the submodule update is risky, and opens a PR for CI. As you can see, the maintainer will sometimes misjudge the situation.

@nyh
Copy link
Contributor Author

nyh commented Nov 22, 2023

Sorry about the mess. Please dequeue the Scylla submodule update, and I'll send a followup PR to this PR to fix the bug (I hope I can do this quickly, I'm debugging it now).

@denesb
Copy link
Contributor

denesb commented Nov 22, 2023

Sorry about the mess. Please dequeue the Scylla submodule update, and I'll send a followup PR to this PR to fix the bug (I hope I can do this quickly, I'm debugging it now).

Submodule update dequeued.

@mykaul
Copy link
Contributor

mykaul commented Nov 22, 2023

Also I repeat my question for the JMX and CI gurus - don't we have CI runs for JMX patches? How do we prevent these cases from happening in the future?

I suspect we'll get rid of JMX sooner than we'll have tests for it :(

@nyh
Copy link
Contributor Author

nyh commented Nov 22, 2023

Sent a followup: #230

@fruch
Copy link

fruch commented Nov 22, 2023

I can reproduce a failure of

PYTHONPATH=../scylla-ccm pytest --cassandra-dir=$HOME/scylla/build/dev nodetool_additional_test.py::TestNodetool::test_get_sstable

I'll try to debug it. The curious thing is that obviously my patch did not kill en-masse all the nodetool tests, it just broke two specific tests. I'll need to debug now why.

Do you want to revert the Scylla patch that updates the submodule and leave the JMX patch in and I'll send a followup patch? Or do you want to revert the scylla-jmx patch as well?

Also I repeat my question for the JMX and CI gurus - don't we have CI runs for JMX patches? How do we prevent these cases from happening in the future?

There is a way to run all of dtest tests, but you first build using the byo a unified package and then give to a dtest job to run all tests.

Other option is to run just the gating do a PR to Scylla with submodule change and let it run, I do it multiple time for cqlsh submodule, and it's mostly enough

@nyh
Copy link
Contributor Author

nyh commented Nov 23, 2023

There is a way to run all of dtest tests, but you first build using the byo a unified package and then give to a dtest job to run all tests.

My problem is that because I don't do those things often, I have no idea how to do what you just said, and no idea where there is documentation now how to do it.

Other option is to run just the gating do a PR to Scylla with submodule change and let it run, I do it multiple time for cqlsh submodule, and it's mostly enough

Yes, this is a good thing to do before merging the submodule change. But in some sense it's a bit too late - it means we already merged the broken patch into scylla-jmx (for example), and yes, we won't have Scylla use the broken patch but it's still in the repository. But on second thought - maybe it's fine and we don't need to improve (and complicate) the process. We don't need to be fanatics on perfect code and bisectability on every small repository - maybe it's not a disaster that scylla-jmx got a buggy patch, and then a day later it got a second patch to fix it. Just as long as the buggy version isn't actually used in the Scylla parent module (and it isn't).

@fruch
Copy link

fruch commented Nov 23, 2023

There is a way to run all of dtest tests, but you first build using the byo a unified package and then give to a dtest job to run all tests.

My problem is that because I don't do those things often, I have no idea how to do what you just said, and no idea where there is documentation now how to do it.

Other option is to run just the gating do a PR to Scylla with submodule change and let it run, I do it multiple time for cqlsh submodule, and it's mostly enough

Yes, this is a good thing to do before merging the submodule change. But in some sense it's a bit too late - it means we already merged the broken patch into scylla-jmx (for example), and yes, we won't have Scylla use the broken patch but it's still in the repository. But on second thought - maybe it's fine and we don't need to improve (and complicate) the process. We don't need to be fanatics on perfect code and bisectability on every small repository - maybe it's not a disaster that scylla-jmx got a buggy patch, and then a day later it got a second patch to fix it. Just as long as the buggy version isn't actually used in the Scylla parent module (and it isn't).

In some case I'm opening a PR pointing to my fork of the submodule, before merging, and opening a PR just for testing, and point it back to the actual repo/branch, when I confirmed it's working.

again maybe a bit convoluted and long, and can confuse the maintainers, but it get me a CI run ontop of my changes, with the least amount of effort. (but with not much control, since I can't change dtest/ccm in such run, or the exact test that would run)

denesb pushed a commit to scylladb/scylla-tools-java that referenced this pull request Nov 23, 2023
As the ObjectName documentation explains, when an addressing an MBean
in JMX, when a value contains a comma, equals, colon or quote - it must
be quoted, otherwise the ObjectName cannot be constructed.

CQL or Alternator table names cannot contain any of these characters,
but in Alternator the names of secondary indexes *do* contain a colon,
so before this patch nodetool cannot address them - it tries to
construct an ObjectName with that name, and fails.

Note that "nodetool tablestats" is also broken by this bug, because even
if the user doesn't try to specify one specific table, nodetool iterates
on all tables and fails when it reaches the view with the colon in its
name.

So this patch adds the missing quoting if the table name contains a
colon.  We don't quote table names that don't have a colon to preserve
backward compatibility with old implmentation of JMX that don't know
how to unquote these names. A separate patch to scylla-jmx added
support for unquoting - see scylladb/scylla-jmx#227.

After this patch "nodetool tablestats" can work even if there is an
Alternator GSI or LSI.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes: #355
@denesb
Copy link
Contributor

denesb commented Nov 23, 2023

Submodule update PR: scylladb/scylladb#16146

nyh added a commit to scylladb/scylla-tools-java that referenced this pull request Dec 5, 2023
As the ObjectName documentation explains, when an addressing an MBean
in JMX, when a value contains a comma, equals, colon or quote - it must
be quoted, otherwise the ObjectName cannot be constructed.

CQL or Alternator table names cannot contain any of these characters,
but in Alternator the names of secondary indexes *do* contain a colon,
so before this patch nodetool cannot address them - it tries to
construct an ObjectName with that name, and fails.

Note that "nodetool tablestats" is also broken by this bug, because even
if the user doesn't try to specify one specific table, nodetool iterates
on all tables and fails when it reaches the view with the colon in its
name.

So this patch adds the missing quoting if the table name contains a
colon.  We don't quote table names that don't have a colon to preserve
backward compatibility with old implmentation of JMX that don't know
how to unquote these names. A separate patch to scylla-jmx added
support for unquoting - see scylladb/scylla-jmx#227.

After this patch "nodetool tablestats" can work even if there is an
Alternator GSI or LSI.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes: #355
(cherry picked from commit 26f5f71)
nyh added a commit to scylladb/scylla-tools-java that referenced this pull request Dec 5, 2023
As the ObjectName documentation explains, when an addressing an MBean
in JMX, when a value contains a comma, equals, colon or quote - it must
be quoted, otherwise the ObjectName cannot be constructed.

CQL or Alternator table names cannot contain any of these characters,
but in Alternator the names of secondary indexes *do* contain a colon,
so before this patch nodetool cannot address them - it tries to
construct an ObjectName with that name, and fails.

Note that "nodetool tablestats" is also broken by this bug, because even
if the user doesn't try to specify one specific table, nodetool iterates
on all tables and fails when it reaches the view with the colon in its
name.

So this patch adds the missing quoting if the table name contains a
colon.  We don't quote table names that don't have a colon to preserve
backward compatibility with old implmentation of JMX that don't know
how to unquote these names. A separate patch to scylla-jmx added
support for unquoting - see scylladb/scylla-jmx#227.

After this patch "nodetool tablestats" can work even if there is an
Alternator GSI or LSI.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes: #355
(cherry picked from commit 26f5f71)
(cherry picked from commit 3764ae9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants