Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor schema, introduce schema_static_props and move several properties into it #13170

Merged
merged 4 commits into from
Mar 15, 2023

Conversation

gusev-p
Copy link

@gusev-p gusev-p commented Mar 14, 2023

Our end goal (#12642) is to mark raft tables to use
schema commitlog. There are two similar
cases in code right now - with_null_sharder
and set_wait_for_sync_to_commitlog schema_builder
methods. The problem is that if we need to
mark some new schema with one of these methods
we need to do this twice - first in
a method describing the schema
(e.g. system_keyspace::raft()) and second in the
function create_table_from_mutations, which is not
obvious and easy to forget.

create_table_from_mutations is called when schema object
is reconstructed from mutations, with_null_sharder
and set_wait_for_sync_to_commitlog must be called from it
since the schema properties they describe are
not included in the mutation representation of the schema.

This series proposes to distinguish between the schema
properties that get into mutations and those that do not.
The former are described with schema_builder, while for
the latter we introduce schema_static_props struct and
the schema_builder::register_static_configurator method.
This way we can formulate a rule once in the code about
which schemas should have a null sharder/be synced, and it will
be enforced in all cases.

Petr Gusev added 4 commits March 14, 2023 13:32
There was some logic to call mark_as_populate at
the appropriate places, but the _populated field
and the ensure_populated function were
not used by anyone.
Our goal (scylladb#12642) is to mark raft tables to use
schema commitlog. There are two similar
cases in code right now - with_null_sharder
and set_wait_for_sync_to_commitlog schema_builder
methods. The problem is that if we need to
mark some new schema with one of these methods
we need to do this twice - first in
a method describing the schema
(e.g. system_keyspace::raft()) and second in the
function create_table_from_mutations, which is not
obvious and easy to forget.

create_table_from_mutations is called when schema object
is reconstructed from mutations, with_null_sharder
and set_wait_for_sync_to_commitlog must be called from it
since the schema properties they describe are
not included in the mutation representation of the schema.

This patch proposes to distinguish between the schema
properties that get into mutations and those that do not.
The former are described with schema_builder, while for
the latter we introduce schema_static_props struct and
the schema_builder::register_static_configurator method.
This way we can formulate a rule once in the code about
which schemas should have a null sharder, and it will
be enforced in all cases.
This patch continues the refactoring, now we move
wait_for_sync_to_commitlog property from schema_builder to
schema_static_props.

The patch replaces schema_builder::set_wait_for_sync_to_commitlog
and is_extra_durable with two register_static_configurator,
one in system_keyspace and another in system_distributed_keyspace.
They correspond to the two parts of the original disjunction
in schema_tables::is_extra_durable.
This patch finishes the refactoring. We introduce the
use_schema_commitlog flag in schema_static_props
and use it to choose the commitlog in
database::add_column_family. The only
configurator added declares what was originally in
database::add_column_family - all
tables from schema_tables keyspace
should use schema_commitlog.
@scylladb-promoter
Copy link
Contributor

Copy link
Contributor

@denesb denesb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we replaced one hack with another. Now one has to look up 2 or 3 places to piece together the real structure of a schema.
We should just byte the bullet and figure out a way to store these flags on the disk, instead of coming up with increasingly inventive ways to avoid it.

@gleb-cloudius
Copy link
Contributor

It feels like we replaced one hack with another. Now one has to look up 2 or 3 places to piece together the real structure of a schema. We should just byte the bullet and figure out a way to store these flags on the disk, instead of coming up with increasingly inventive ways to avoid it.

And do we really want to? What if we want to change the property after upgrade? My feeling that this should not be a schema property at all, but a table property, but this is pre-existing.

@denesb
Copy link
Contributor

denesb commented Mar 15, 2023

After which upgrade? This "temporary" hack now survived several major versions and these properties are not going anywhere.

There is no such thing as table property, table == schema on the very low level.
We have compaction parameters in the schema, you can argue it has nothing to do with the schema, only with the table.

We could have added these to the already existing generic table properties, the reason we didn't AFAIK is because we didn't want users to see it in DESCRIBE TABLE output, and more importantly, we didn't want them to be able to change it.
Maybe we can add another hidden_properties generic property map, which doesn't show up in DESCRIBE TABLE output and it is not possible to provide these via CQL at all.

@gleb-cloudius
Copy link
Contributor

After which upgrade? This "temporary" hack now survived several major versions and these properties are not going anywhere.

Upgrade where you want the property to be changed. Use system commitlog instead of regular one for instance. If the info is on the disk it will have to be re-written.

There is no such thing as table property, table == schema on the very low level. We have compaction parameters in the schema, you can argue it has nothing to do with the schema, only with the table.

That's exactly what I lament about. Schema should be something that can be changed by a DDL statement only IMO.

We could have added these to the already existing generic table properties, the reason we didn't AFAIK is because we didn't want users to see it in DESCRIBE TABLE output, and more importantly, we didn't want them to be able to change it. Maybe we can add another hidden_properties generic property map, which doesn't show up in DESCRIBE TABLE output and it is not possible to provide these via CQL at all.

We did not want all those things because it does not belong to a schema :)

@denesb
Copy link
Contributor

denesb commented Mar 15, 2023

That's exactly what I lament about. Schema should be something that can be changed by a DDL statement only IMO.

This is already not the case. You cannot change whether a table is a materialized view or cdc table. Yet it is stored in the schema tables and in the schema. These flags are yet more like that, we just didn't bother to create a proper place for it, so we employed a "temporary" solution. And -- who would have thought -- instead of solving it later, we just piled on it.

@tgrabiec
Copy link
Contributor

Those properties are really table object properties. Each replica can have different values, so they are not part of global table schema. They could live in the table object, and you could have access them like s->table()->schema_commit_log(). But we don't have the access to the table object via schema_ptr, and we create tables based on properties in schema_ptr, so we have this static props as a stop-gap solution.

@gleb-cloudius
Copy link
Contributor

That's exactly what I lament about. Schema should be something that can be changed by a DDL statement only IMO.

This is already not the case. You cannot change whether a table is a materialized view or cdc table. Yet it is stored in the schema tables and in the schema. These flags are yet more like that, we just didn't bother to create a proper place for it, so we employed a "temporary" solution. And -- who would have thought -- instead of solving it later, we just piled on it.

I know this is not the case and complain about it :)

@tgrabiec
Copy link
Contributor

tgrabiec commented Mar 15, 2023 via email

@gleb-cloudius
Copy link
Contributor

Whether something can be changed or not is not a determinant if it should live in the schema object. If it's an information which needs to propagate to other nodes, it must live in the schema. How other nodes would know if the table is a view otherwise?

With raft it can be a part of the command that creates it and while snapshot transfer it may be transferred separately from schema mutations. Having the info in schema mutation is just easier, but it does not have to be that way. But I like your way to separate properties as well - those that can be different on each node should not be in schema while others should.

@denesb
Copy link
Contributor

denesb commented Mar 15, 2023

I didn't know these properties can change from replica to replica. Of course if this is the case this cannot be in the schema.
It would be nice if this would be mentioned somewhere. BTW how is this even possible if the properties are hardcoded in a static container?

@gleb-cloudius
Copy link
Contributor

I didn't know these properties can change from replica to replica. Of course if this is the case this cannot be in the schema. It would be nice if this would be mentioned somewhere. BTW how is this even possible if the properties are hardcoded in a static container?

They should be the same, but they may be different while upgrading. How to shard a table it a local property, not global.

@denesb
Copy link
Contributor

denesb commented Mar 15, 2023

They should be the same, but they may be different while upgrading. How to shard a table it a local property, not global.

I see, makes sense.

Copy link
Contributor

@kbr-scylla kbr-scylla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like we replaced one hack with another. Now one has to look up 2 or 3 places to piece together the real structure of a schema.

This new solution is strictly better, in multiple reviews I reminded people that attaching with_null_sharder() and friend to the schema builder is not enough, they also have to update a list of tables somewhere in another file like db/schema_tables.cc.

Now each table has a single place where you specify that it uses null sharding (or other options), so you can't do this mistake any more. Furthermore, the place is in the same file that the schema definition for this table.

@kbr-scylla
Copy link
Contributor

Qd

@scylladb-promoter scylladb-promoter merged commit 5705df7 into scylladb:master Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants