-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor schema, introduce schema_static_props and move several properties into it #13170
Refactor schema, introduce schema_static_props and move several properties into it #13170
Conversation
There was some logic to call mark_as_populate at the appropriate places, but the _populated field and the ensure_populated function were not used by anyone.
Our goal (scylladb#12642) is to mark raft tables to use schema commitlog. There are two similar cases in code right now - with_null_sharder and set_wait_for_sync_to_commitlog schema_builder methods. The problem is that if we need to mark some new schema with one of these methods we need to do this twice - first in a method describing the schema (e.g. system_keyspace::raft()) and second in the function create_table_from_mutations, which is not obvious and easy to forget. create_table_from_mutations is called when schema object is reconstructed from mutations, with_null_sharder and set_wait_for_sync_to_commitlog must be called from it since the schema properties they describe are not included in the mutation representation of the schema. This patch proposes to distinguish between the schema properties that get into mutations and those that do not. The former are described with schema_builder, while for the latter we introduce schema_static_props struct and the schema_builder::register_static_configurator method. This way we can formulate a rule once in the code about which schemas should have a null sharder, and it will be enforced in all cases.
This patch continues the refactoring, now we move wait_for_sync_to_commitlog property from schema_builder to schema_static_props. The patch replaces schema_builder::set_wait_for_sync_to_commitlog and is_extra_durable with two register_static_configurator, one in system_keyspace and another in system_distributed_keyspace. They correspond to the two parts of the original disjunction in schema_tables::is_extra_durable.
This patch finishes the refactoring. We introduce the use_schema_commitlog flag in schema_static_props and use it to choose the commitlog in database::add_column_family. The only configurator added declares what was originally in database::add_column_family - all tables from schema_tables keyspace should use schema_commitlog.
CI state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we replaced one hack with another. Now one has to look up 2 or 3 places to piece together the real structure of a schema.
We should just byte the bullet and figure out a way to store these flags on the disk, instead of coming up with increasingly inventive ways to avoid it.
And do we really want to? What if we want to change the property after upgrade? My feeling that this should not be a schema property at all, but a table property, but this is pre-existing. |
After which upgrade? This "temporary" hack now survived several major versions and these properties are not going anywhere. There is no such thing as table property, table == schema on the very low level. We could have added these to the already existing generic table properties, the reason we didn't AFAIK is because we didn't want users to see it in |
Upgrade where you want the property to be changed. Use system commitlog instead of regular one for instance. If the info is on the disk it will have to be re-written.
That's exactly what I lament about. Schema should be something that can be changed by a DDL statement only IMO.
We did not want all those things because it does not belong to a schema :) |
This is already not the case. You cannot change whether a table is a materialized view or cdc table. Yet it is stored in the schema tables and in the schema. These flags are yet more like that, we just didn't bother to create a proper place for it, so we employed a "temporary" solution. And -- who would have thought -- instead of solving it later, we just piled on it. |
Those properties are really table object properties. Each replica can have different values, so they are not part of global table schema. They could live in the table object, and you could have access them like s->table()->schema_commit_log(). But we don't have the access to the table object via schema_ptr, and we create tables based on properties in schema_ptr, so we have this static props as a stop-gap solution. |
I know this is not the case and complain about it :) |
Whether something can be changed or not is not a determinant if it should
live in the schema object. If it's an information which needs to propagate
to other nodes, it must live in the schema. How other nodes would know if
the table is a view otherwise?
…On Wed, Mar 15, 2023 at 12:34 PM Botond Dénes ***@***.***> wrote:
That's exactly what I lament about. Schema should be something that can be
changed by a DDL statement only IMO.
This is already not the case. You cannot change whether a table is a
materialized view or cdc table. Yet it is stored in the schema tables and
in the schema. These flags are yet more like that, we just didn't bother to
create a proper place for it, so we employed a "temporary" solution. And --
who would have thought -- instead of solving it later, we just piled on it.
—
Reply to this email directly, view it on GitHub
<#13170 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACFIL5OLQGPZ6SKFBWU67LW4GSLTANCNFSM6AAAAAAV2UEM7I>
.
You are receiving this because your review was requested.Message ID:
***@***.***>
|
With raft it can be a part of the command that creates it and while snapshot transfer it may be transferred separately from schema mutations. Having the info in schema mutation is just easier, but it does not have to be that way. But I like your way to separate properties as well - those that can be different on each node should not be in schema while others should. |
I didn't know these properties can change from replica to replica. Of course if this is the case this cannot be in the schema. |
They should be the same, but they may be different while upgrading. How to shard a table it a local property, not global. |
I see, makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we replaced one hack with another. Now one has to look up 2 or 3 places to piece together the real structure of a schema.
This new solution is strictly better, in multiple reviews I reminded people that attaching with_null_sharder()
and friend to the schema builder is not enough, they also have to update a list of tables somewhere in another file like db/schema_tables.cc.
Now each table has a single place where you specify that it uses null sharding (or other options), so you can't do this mistake any more. Furthermore, the place is in the same file that the schema definition for this table.
Qd |
Our end goal (#12642) is to mark raft tables to use
schema commitlog. There are two similar
cases in code right now -
with_null_sharder
and
set_wait_for_sync_to_commitlog
schema_builder
methods. The problem is that if we need to
mark some new schema with one of these methods
we need to do this twice - first in
a method describing the schema
(e.g.
system_keyspace::raft()
) and second in thefunction
create_table_from_mutations
, which is notobvious and easy to forget.
create_table_from_mutations
is called when schema objectis reconstructed from mutations,
with_null_sharder
and
set_wait_for_sync_to_commitlog
must be called from itsince the schema properties they describe are
not included in the mutation representation of the schema.
This series proposes to distinguish between the schema
properties that get into mutations and those that do not.
The former are described with
schema_builder
, while forthe latter we introduce
schema_static_props
struct andthe
schema_builder::register_static_configurator
method.This way we can formulate a rule once in the code about
which schemas should have a null sharder/be synced, and it will
be enforced in all cases.