New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a new node with authentication set - causes system_auth keyspace RF to be set to 1 #2129
Comments
FYI: Another user has just reported this on our public slack. |
@slivne Schema sync works as expected, the latest change wins. Perhaps the problem is that we create the auth table before we do the sync with existing nodes? |
@tgrabiec - it could be ...
Yet that will be an issue when upgrading and there are differences in the
schema.
I know we had a discussion lately about the id generation of the auth and
traces and that you found that to make sure there is no conflict they
provide a "static" id ....
can you check what cassandra does - maybe we missed something
…On Mon, Mar 6, 2017 at 10:20 AM, Tomasz Grabiec ***@***.***> wrote:
@slivne <https://github.com/slivne> Schema sync works as expected, the
latest change wins. Perhaps the problem is that we create the auth table
before we do the sync with existing nodes?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2129 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADThCNjCosUrl96T_XZvtSVq9pfBClalks5ri8GzgaJpZM4MS1_H>
.
|
@slivne I am not able to reproduce. Do you have logs for that run? |
scylla ~/scylla/build/release/scylla --version
|
@slivne In you logs on node3 I can see:
Which means |
you are right ... didn't change that in this run changing it now ... and
trying again
…On Mon, Mar 6, 2017 at 4:33 PM, Tomasz Grabiec ***@***.***> wrote:
@slivne <https://github.com/slivne> In you logs on node3 I can see:
Bootstrap variables: 0 0 0 0
Which means auto_bootstrap is false, and that's the reason why node
doesn't perform the schema sync.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2129 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADThCDJKR_iaJxKQzMMryBQzPn7bMw_Wks5rjBlQgaJpZM4MS1_H>
.
|
with auto_bootstrap: true - the issue does not occur - so it seems there is no bug ....
yet maybe the customer also started his node in this manner (the ami by default I think has it to false) |
the customer is using the ami
I am checking if we have an ability in the ami to change the default bootstrap value. |
so there is a way to start the ami tell it to have auto_bootstrap set to true in the user data setting
will change the default https://github.com/scylladb/scylla-ami/blob/master/ds2_configure.py#L216 We need to update our documentation to reflect that http://docs.scylladb.com/kb/add-nodes-on-ec2/ currently it only mentions seeds - we also need to specify bootstrap. |
On Mon, Mar 6, 2017 at 11:30 AM, Shlomi Livne ***@***.***> wrote:
so there is a way to start the ami tell it to have auto_bootstrap set to
true
in the user data setting
--bootstrap
will change the default
https://github.com/scylladb/scylla-ami/blob/master/ds2_configure.py#L216
We need to update our documentation to reflect that
http://docs.scylladb.com/kb/add-nodes-on-ec2/
currently it only mentions seeds - we also need to specify bootstrap.
I would double check to make sure that bootstrap indeed only sets this
parameter at first boot.
If we weren't even aware of that, imagine how well tested this is?
Imagine an AMI set with this being bootstrapped every time it boots...
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2129 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAUNvddbKWQHkGikFnBFIwu6Ga6nCklVks5rjDSugaJpZM4MS1_H>
.
|
I think the right way is to use an AMI parameter for not starting Scylla. |
2017-03-06 17:30 GMT+01:00 Shlomi Livne <notifications@github.com>:
so there is a way to start the ami tell it to have auto_bootstrap set to
true
in the user data setting
--bootstrap
will change the default
https://github.com/scylladb/scylla-ami/blob/master/ds2_configure.py#L216
We need to update our documentation to reflect that
http://docs.scylladb.com/kb/add-nodes-on-ec2/
currently it only mentions seeds - we also need to specify bootstrap.
This procedure is missing more things, which are present in [1], like
running nodetool cleanup. Since that page is EC2-specific, it should link
to the general procedure.
[1]
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html
|
The generic add node does include cleanup KB section is out of date, and need to be revisited and moved under procedures. @nirmaayan FYI |
The problem doesn't reproduce on Cassandra 2.2, because they create default keyspace metadata with timestamp=0, so it always looses with manual changes. |
Actually, it seems to be like that since 1.2.2 (https://issues.apache.org/jira/browse/CASSANDRA-5112) |
Before we can apply a similar solution, we have to fix |
…ications There is a workaround for notification race, which attaches keyspace mutations to other schema changes in case the target node missed the keyspace creation. Currently that generated keyspace mutations on the spot instead of using the ones stored in schema tables. Those mutations would have current timestamp, as if the keyspace has been just modified. This is problematic because this may generate an overwrite of keyspace parameters with newer timestamp but with stale values, if the node is not up to date with keyspace metadata. That's especially the case when booting up a node without enabling auto_bootstrap. In such case the node will not wait for schema sync before creating auth tables. Such table creation will attach potentially out of date mutations for keyspace metadata, which may overwrite changes made to keyspace paramteters made earlier in the cluster. Refs #2129.
…d tracing keyspaces" form Tomek "If a node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129." * tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev: db: Create default auth and tracing keyspaces using lowest timestamp migration_manager: Append actual keyspace mutations with schema notifications
…d tracing keyspaces" form Tomek "If a node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129." * tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev: db: Create default auth and tracing keyspaces using lowest timestamp migration_manager: Append actual keyspace mutations with schema notifications (cherry picked from commit 6db6d25)
…d tracing keyspaces" form Tomek "If a node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129." * tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev: db: Create default auth and tracing keyspaces using lowest timestamp migration_manager: Append actual keyspace mutations with schema notifications (cherry picked from commit 6db6d25)
…stamp This was needed to fix issue scylladb#2129 which was only manifest itself with auto_bootstrap set to false. The option is ignored now and we always wait for schema to synch during boot.
…stamp This was needed to fix issue scylladb#2129 which was only manifest itself with auto_bootstrap set to false. The option is ignored now and we always wait for schema to synch during boot.
…ions in `start()` When creating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. This was to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). This workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.
…ions in `start()` When creating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. This was to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). This workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.
…stamp This was needed to fix issue #2129 which was only manifest itself with auto_bootstrap set to false. The option is ignored now and we always wait for schema to synch during boot.
…ions in `start()` When creating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. This was to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). This workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.
…ions in `start()` When creating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. This was to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). This workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.
…ions in `start()` When creating or updating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. There two reasons for this: - to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). - to eliminate unnecessary schema sync. If two nodes created these tables concurrently with different timestamps, the schemas would formally be different and would need to merge. This could happen during upgrades when we upgraded from a version which doesn't have these tables or doesn't have some columns. The scylladb#2129 workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. The second problem would happen during parallel bootstrap, which we don't allow, or during parallel upgrade. The procedure we recommend is rolling upgrade - where nodes are upgraded one by one. In this case only one node is going to create/update the tables; following upgraded nodes will sync schema first and notice they don't need to do anything. So if procedures are followed correctly, the workaround is not needed. If someone doesn't follow the procedures and upgrades nodes in parallel, these additional schema synchronizations are not a big cost, so the workaround doesn't give us much in this case as well. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.
…ions in `start()` When creating or updating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. There two reasons for this: - to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). - to eliminate unnecessary schema sync. If two nodes created these tables concurrently with different timestamps, the schemas would formally be different and would need to merge. This could happen during upgrades when we upgraded from a version which doesn't have these tables or doesn't have some columns. The scylladb#2129 workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. The second problem would happen during parallel bootstrap, which we don't allow, or during parallel upgrade. The procedure we recommend is rolling upgrade - where nodes are upgraded one by one. In this case only one node is going to create/update the tables; following upgraded nodes will sync schema first and notice they don't need to do anything. So if procedures are followed correctly, the workaround is not needed. If someone doesn't follow the procedures and upgrades nodes in parallel, these additional schema synchronizations are not a big cost, so the workaround doesn't give us much in this case as well. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.
…ions in `start()` When creating or updating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. There two reasons for this: - to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). - to eliminate unnecessary schema sync. If two nodes created these tables concurrently with different timestamps, the schemas would formally be different and would need to merge. This could happen during upgrades when we upgraded from a version which doesn't have these tables or doesn't have some columns. The scylladb#2129 workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. The second problem would happen during parallel bootstrap, which we don't allow, or during parallel upgrade. The procedure we recommend is rolling upgrade - where nodes are upgraded one by one. In this case only one node is going to create/update the tables; following upgraded nodes will sync schema first and notice they don't need to do anything. So if procedures are followed correctly, the workaround is not needed. If someone doesn't follow the procedures and upgrades nodes in parallel, these additional schema synchronizations are not a big cost, so the workaround doesn't give us much in this case as well. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.
…ions in `start()` When creating or updating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. There two reasons for this: - to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). - to eliminate unnecessary schema sync. If two nodes created these tables concurrently with different timestamps, the schemas would formally be different and would need to merge. This could happen during upgrades when we upgraded from a version which doesn't have these tables or doesn't have some columns. The scylladb#2129 workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. The second problem would happen during parallel bootstrap, which we don't allow, or during parallel upgrade. The procedure we recommend is rolling upgrade - where nodes are upgraded one by one. In this case only one node is going to create/update the tables; following upgraded nodes will sync schema first and notice they don't need to do anything. So if procedures are followed correctly, the workaround is not needed. If someone doesn't follow the procedures and upgrades nodes in parallel, these additional schema synchronizations are not a big cost, so the workaround doesn't give us much in this case as well. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.
…ions in `start()` When creating or updating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. There two reasons for this: - to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). - to eliminate unnecessary schema sync. If two nodes created these tables concurrently with different timestamps, the schemas would formally be different and would need to merge. This could happen during upgrades when we upgraded from a version which doesn't have these tables or doesn't have some columns. The scylladb#2129 workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. The second problem would happen during parallel bootstrap, which we don't allow, or during parallel upgrade. The procedure we recommend is rolling upgrade - where nodes are upgraded one by one. In this case only one node is going to create/update the tables; following upgraded nodes will sync schema first and notice they don't need to do anything. So if procedures are followed correctly, the workaround is not needed. If someone doesn't follow the procedures and upgrades nodes in parallel, these additional schema synchronizations are not a big cost, so the workaround doesn't give us much in this case as well. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.
Add a background fiber that works to free memory using spare cycles, so that allocations don't have to evict cache synchronously. The shares for the fiber are increased the closer we are to running out of memory, preferring to steal cycles from the workload rather than encountering stalls. The last patch is not strictly related but is a good idea. See backport notes in the first patch. The others were trivial. Test: unit (dev) Ref scylladb#2113 Ref scylladb#2106 Ref scylladb#2071 Ref scylladb#2039 Closes scylladb#2129 * github.com:scylladb/scylla-enterprise: lsa: Mark compact_segment_locked() as noexcept lsa: Avoid excessive eviction if region is not compactible logalloc: fix quadratic behaviour of reclaim_from_evictable logalloc: reduce minimum lsa reserve in allocating_section to 1 main: start background reclaim before bootstrap Merge 'lsa: background reclaim' from Avi Kivity logalloc: background reclaim
Installation details
Scylla version (or git commit hash):
Cluster size:
OS (RHEL/CentOS/Ubuntu/AWS AMI):
It seems that we create system_auth tables when a new node boots up with RF=1 and then allow it to connect to the cluster.
Somehow the schema sync code - aligns on the schema the last node has created which has RF=1.
to reproduce
authenticator: PasswordAuthenticator
)ALTER KEYSPACE "system_auth" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
describe KEYSPACE system_auth;
authenticator: PasswordAuthenticator
)nodetool describecluster
- the new node will not be syncheddescribe KEYSPACE system_auth;
- the old noewds should have RF=2 the added node RF=1nodetool describecluster
describe KEYSPACE system_auth;
will show RF=1The text was updated successfully, but these errors were encountered: