-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
column_family: Make toppartitions queries more generic #7864
Conversation
This requires a change in the nodetool repo as well to wholly fix the #4520 issue (which is coming and I'll link it here) |
From my quick test it looks like this change isn't backward compatible, i.e. it's no longer possible to pass a single column family name to this interface. Is it hard to maintain compatibility by allowing both a single name and a list? This change would become way more user friendly that way. From my quick local tests it looks like parameters for this API are not validated anyway, so it should be easy enough to keep accepting both |
The problem with backwards compatibility is that previously I can rework this, but it's always gonna be awkward if we don't want the |
So, can't you just leave the old API ( |
You're absolutely right, for some reason I forgot that they can coexist |
1df3c52
to
9df0511
Compare
Addressed the feedback and also added an information from which column family the listed partition is, as with many CFs we can have partitions with exact same names |
Looks good to me, but @amnonh should probably take a look as the API owner. |
Nothing as a shorthand for everything is bad practice. Nothing should stand for nothing, not everything. |
api/api-doc/column_family.json
Outdated
@@ -663,8 +711,8 @@ | |||
"parameters":[ | |||
{ | |||
"name":"name", | |||
"description":"The column family name in keyspace:name format", | |||
"required":true, | |||
"description":"The column family name in keyspace:name format. Omitting the name (i.e. 'keyspace:') will run the query on the whole keyspace", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct, the path parameter should be required, if the name is omitted, you would fall to your new implementation that does not have a query name parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, forgot to change it back to required
api/column_family.cc
Outdated
apilog.info("toppartitions query: #names={} duration={} list_size={} capacity={}", | ||
names.size(), duration.param, list_size.param, capacity.param); | ||
|
||
return seastar::do_with(db::toppartitions_query(ctx.db, std::move(names), duration.value, list_size, capacity), [&ctx](auto& q) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we unified the implamentation for the specific and non specific CF?
if not anyting else, you can always call the general with the specific names, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not exactly sure what you mean, can you elaborate a bit more?
Are you referring to the distinction between toppartitions
and toppartitions_generic
? Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it seems that there will be two implementation, one for a specific table and one for multiple tables
I would have loved to see #7797 maybe you can do add that? |
It depends how you phrase it - "all unless specified" is not uncommon and is the same thing. I can change the description to reflect that more. I can also replace that with something like an "ALL" keyword, but that seems to me like overcomplication. Do you have a simpler solution in mind @avikivity? |
Not in this issue, but I could take a look at that once I'm done with current toppartitions and nodetool stuff I'm working on. |
9df0511
to
5dacf02
Compare
If it's an optional parameter, then if it's not included it can mean all. But an empty value shouldn't mean all. However, I have some memory of the same pattern used in other APIs. If that's correct, then we can follow the existing pattern. |
the operation documentation is just confusing, it's a path parameter, so it's mandatory, if it's not there then it's a different url (i.e different end point). |
I mean that we already have some violations of "empty should mean nothing" in our REST API. Is this correct? |
I'm not sure I follow, having to end points which one is more specific like: Is a common RESTFull definition, but those are two different endpoints, so the documentation of the more specific one, should not say leave empty for everything, even if it happen to be the case, it's a mandatory parameter, if you leave it empty, it will not be the same endpoint. |
5dacf02
to
301db48
Compare
Ok, I restructured the code a bit. Now, not providing the cf list means to query "all", while providing an empty one is exactly that (empty list, so no families are getting included). |
301db48
to
4949e5c
Compare
As per my discussion with @amnonh, I moved the generic toppartitions endpoint to |
4949e5c
to
01c5873
Compare
@amnonh please review again |
db/data_listeners.cc
Outdated
@@ -59,11 +59,11 @@ void data_listeners::on_write(const schema_ptr& s, const frozen_mutation& m) { | |||
|
|||
toppartitions_item_key::operator sstring() const { | |||
std::ostringstream oss; | |||
oss << key.key().with_schema(*schema); | |||
oss << "(" << schema->ks_name() << ":" << schema->cf_name() << ") " << key.key().with_schema(*schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it change current behaviour? it would now add ks/table for users who call nodetool toppartitions for a specific table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the goal, as we can now have partitions from different tables that have the same names - it's good to have a way to differentiate between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is true for someone that uses the new functionality, what about backwards compatibility for users who use the old functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it to include backwards compatibility
db/data_listeners.cc
Outdated
if (zis) { | ||
zis->_top_k_read.append(toppartitions_item_key{s, dk}); | ||
|
||
for (const auto& [ks, cf] : _families.value()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we don't expect it to be too big, because you'll do this loop for any operation and you don't have that quick bale-out in the original code where you compare the ks/table and return imidiately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be big, but I'll change it to a map as you pointed out. It's a good point.
db/data_listeners.hh
Outdated
@@ -155,15 +154,14 @@ public: | |||
|
|||
class toppartitions_query { | |||
distributed<database>& _xdb; | |||
sstring _ks; | |||
sstring _cf; | |||
std::optional<std::vector<std::tuple<sstring, sstring>>> _families; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wny making a vector optional?
BTW if it was unorderd_map you could have do a quick bale-out in the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's optional to reflect a difference between an empty set of filters and no filters. The former will match against no tables, while the latter will match against all of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yeah, changing to a hashmap is a good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't get it, if you know there are none, why using it in the first place? or is it something that is always there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user can omit that parameter or they can input an empty list - they mean different things and have to be handled accordingly. Once we pass that information down to the toppartitions query object and all the filtering happens, we need to be able to tell which of the described situations happened, because they give different results.
Using std::optional
seemed like the most logical way to do it, as we're literally describing an "optional" parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly the two option:
- the user passes an empty container, which means they want everything
- the user didn't pass anything, so they want nothing - so do nothing, at the topest level, just return imidiatly with an empty result
Even if you mean the logic is the other way around, I don't see why you need to implement the empty case, if a user request something that by definition returns nothing, return imidiatly with nohting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the optionals and check for empty requests before instantiating the query, as asked
Right now toppartitions can only be invoked on one column family at a time. This change introduces a natural extension to this functionality, allowing to specify a list of families. We provide three ways for filtering in the query parameter "name_list": 1. A specific column family to include in the form "ks:cf" 2. A keyspace, telling the server to include all column families in it. Specified by omitting the cf name, i.e. "ks:" 3. All column families, which is represented by an empty list The list can include any amount of one or both of the 1. and 2. option. Fixes scylladb#4520
01c5873
to
ae63d26
Compare
Addressed all the above feedback |
"paramType":"query" | ||
}, | ||
{ | ||
"name":"keyspace_filters", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it should be just keyspace (or ks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why? Many keyspaces can be provided, so a singular here seems weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you tested it? I don't think we support array of query parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nitpick, but from looking at the code this is a comma seperated string, the API currently does not enforce it, but it's better be correct anyhow, but I'll not fight over it
], | ||
"parameters":[ | ||
{ | ||
"name":"table_filters", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it should be table or cf, if you are adding keyspace then it's only table on not keyspace:table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there can be multiples of both, you can specify a couple of tables, which are identified by "ks:cf" and a couple of whole keyspaces. For example, you can request two tables: "ks:t, ks:simple" and one keyspace "system". The result would be toppartitions for the two tables in ks keyspace and all tables in the system keyspace
} | ||
|
||
// when the query is empty return immediately | ||
if (filters_provided && table_filters.empty() && keyspace_filters.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can that be? someone either table_filters was true or keyspace_filters was true or filter_pfovided was false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filters_provided means that there were some filters included in the request, ergo the request is not the default one running on all tables
Then, the only situation where we can safely return without running the query is if some filters were included (e.g. the optional query parameters were provided), but both of them were empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still not completely understand, they added at least one of the query parameters (keyspace or table), but they added it without a value?
I'm not even sure if we support it, did you check that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
For simplicity, let's pretend there is no keyspace_filters
parameter, only the table ones. Now, one of the two situations can happen:
1. The user doesn't use the table_filters
option -> the option is not in the params
2. The user provides the 'table_filters' option -> the option is in the params and contains a list of table names
In the first case, the query runs on all tables and keyspaces as this is the default
In the second case, the query runs only on the tables provided in the 'table_filters' option. A list can be empty, of course, and that means the query runs on a empty set - therefore, it gives empty results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the code, my question was about the query parameter, it does not hurt to add the check like you did, I wasn't sure you can pass an empty parameter, so it looked redundant.
@amnonh ping |
1 similar comment
@amnonh ping |
} | ||
|
||
// when the query is empty return immediately | ||
if (filters_provided && table_filters.empty() && keyspace_filters.empty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the code, my question was about the query parameter, it does not hurt to add the check like you did, I wasn't sure you can pass an empty parameter, so it looked redundant.
"paramType":"query" | ||
}, | ||
{ | ||
"name":"keyspace_filters", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a nitpick, but from looking at the code this is a comma seperated string, the API currently does not enforce it, but it's better be correct anyhow, but I'll not fight over it
Right now toppartitions can only be invoked on one column family at a time.
This change introduces a natural extension to this functionality,
allowing to specify a list of families.
We provide three ways for filtering in the query parameter "name_list":
1. A specific column family to include in the form "ks:cf"
2. A keyspace, telling the server to include all column families in it.
Specified by omitting the cf name, i.e. "ks:"
3. All column families, which is represented by an empty list
The list can include any amount of one or both of the 1. and 2. option.
Fixes #4520