Adding a new node with authentication set - causes system_auth keyspace RF to be set to 1 #2129

slivne · 2017-03-03T22:59:10Z

Installation details
Scylla version (or git commit hash):
Cluster size:
OS (RHEL/CentOS/Ubuntu/AWS AMI):

It seems that we create system_auth tables when a new node boots up with RF=1 and then allow it to connect to the cluster.

Somehow the schema sync code - aligns on the schema the last node has created which has RF=1.

to reproduce

create a cluster of 2 nodes)
enable auth (in scylla.yaml insert authenticator: PasswordAuthenticator)
boot cluster
alter system_auth via cqlsh ALTER KEYSPACE "system_auth" WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
verify via describe KEYSPACE system_auth;
add a node (with authenticator: PasswordAuthenticator)
run nodetool describecluster - the new node will not be synched
enter with cqlsh - old node/new node and run describe KEYSPACE system_auth; - the old noewds should have RF=2 the added node RF=1
wait for all nodes to synch on schema nodetool describecluster
enter with cqlsh old node/new node - describe KEYSPACE system_auth; will show RF=1

The text was updated successfully, but these errors were encountered:

glommer · 2017-03-04T05:27:32Z

FYI: Another user has just reported this on our public slack.

tgrabiec · 2017-03-06T08:20:00Z

@slivne Schema sync works as expected, the latest change wins. Perhaps the problem is that we create the auth table before we do the sync with existing nodes?

slivne · 2017-03-06T09:05:53Z

@tgrabiec - it could be ... Yet that will be an issue when upgrading and there are differences in the schema. I know we had a discussion lately about the id generation of the auth and traces and that you found that to make sure there is no conflict they provide a "static" id .... can you check what cassandra does - maybe we missed something

…

On Mon, Mar 6, 2017 at 10:20 AM, Tomasz Grabiec ***@***.***> wrote: @slivne <https://github.com/slivne> Schema sync works as expected, the latest change wins. Perhaps the problem is that we create the auth table before we do the sync with existing nodes? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2129 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADThCNjCosUrl96T_XZvtSVq9pfBClalks5ri8GzgaJpZM4MS1_H> .

tgrabiec · 2017-03-06T13:08:54Z

@slivne I am not able to reproduce. Do you have logs for that run?

slivne · 2017-03-06T14:08:29Z

scylla

~/scylla/build/release/scylla --version
666.development-0.20170227.e20b804

ccm create scylla-2 --scylla --vnodes -n 2 --install-dir=/home/shlomi/scylla-ccm/../scylla
 vi ~/.ccm/scylla-2/node1/conf/scylla.yaml 
 vi ~/.ccm/scylla-2/node2/conf/scylla.yaml 
 ccm node1 start --jvm_arg="--logger-log-level" --jvm_arg="migration_manager=trace" --jvm_arg="--logger-log-level" --jvm_arg="schema_tables=trace" --jvm_arg="--logger-log-level" --jvm_arg="storage_service=trace"
 ccm node2 start --jvm_arg="--logger-log-level" --jvm_arg="migration_manager=trace" --jvm_arg="--logger-log-level" --jvm_arg="schema_tables=trace" --jvm_arg="--logger-log-level" --jvm_arg="storage_service=trace"
 ccm node1 cqlsh -u cassandra -p cassandra
ALTER KEYSPACE "system_auth"    WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
describe KEYSPACE system_auth;

CREATE KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'}  AND durable_writes = true;
.
.

 ccm add --scylla node3
 vi ~/.ccm/scylla-2/node3/conf/scylla.yaml 
 ccm node3 start --jvm_arg="--logger-log-level" --jvm_arg="migration_manager=trace" --jvm_arg="--logger-log-level" --jvm_arg="schema_tables=trace" --jvm_arg="--logger-log-level" --jvm_arg="storage_service=trace"
 ccm node1 cqlsh -u cassandra -p cassandra
describe KEYSPACE system_auth;

CREATE KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'}  AND durable_writes = true;

.
.

 ccm node3 cqlsh -u cassandra -p cassandra
describe KEYSPACE system_auth;

CREATE KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

.
.
 ccm node1 nodetool describecluster
(wait till they are in synch)
 ccm node1 cqlsh -u cassandra -p cassandra
describe KEYSPACE system_auth;

CREATE KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;
.
.
.

all_logs.tar.gz

tgrabiec · 2017-03-06T14:33:49Z

@slivne In you logs on node3 I can see:

Bootstrap variables: 0 0 0 0

Which means auto_bootstrap is false, and that's the reason why node doesn't perform the schema sync.

slivne · 2017-03-06T14:35:11Z

you are right ... didn't change that in this run changing it now ... and trying again

…

On Mon, Mar 6, 2017 at 4:33 PM, Tomasz Grabiec ***@***.***> wrote: @slivne <https://github.com/slivne> In you logs on node3 I can see: Bootstrap variables: 0 0 0 0 Which means auto_bootstrap is false, and that's the reason why node doesn't perform the schema sync. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2129 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADThCDJKR_iaJxKQzMMryBQzPn7bMw_Wks5rjBlQgaJpZM4MS1_H> .

slivne · 2017-03-06T15:09:45Z

with auto_bootstrap: true -

the issue does not occur - so it seems there is no bug ....

ccm create scylla-2 --scylla --vnodes -n 2 --install-dir=/home/shlomi/scylla-ccm/../scylla
echo "authenticator: PasswordAuthenticator" >> ~/.ccm/scylla-2/node1/conf/scylla.yaml 
echo "authenticator: PasswordAuthenticator" >> ~/.ccm/scylla-2/node2/conf/scylla.yaml 
ccm node1 start --jvm_arg="--logger-log-level" --jvm_arg="migration_manager=trace" --jvm_arg="--logger-log-level" --jvm_arg="schema_tables=trace" --jvm_arg="--logger-log-level" --jvm_arg="storage_service=trace"
ccm node2 start --jvm_arg="--logger-log-level" --jvm_arg="migration_manager=trace" --jvm_arg="--logger-log-level" --jvm_arg="schema_tables=trace" --jvm_arg="--logger-log-level" --jvm_arg="storage_service=trace"
ccm node1 nodetool describecluster
ccm node1 cqlsh -u cassandra -p cassandra -e "ALTER KEYSPACE system_auth    WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };"
ccm node1 cqlsh -u cassandra -p cassandra -e "describe keyspace system_auth"
ccm add --scylla node3 -b
echo "authenticator: PasswordAuthenticator" >> ~/.ccm/scylla-2/node3/conf/scylla.yaml 
ccm node3 start --jvm_arg="--logger-log-level" --jvm_arg="migration_manager=trace" --jvm_arg="--logger-log-level" --jvm_arg="schema_tables=trace" --jvm_arg="--logger-log-level" --jvm_arg="storage_service=trace"
ccm node1 cqlsh -u cassandra -p cassandra -e "describe keyspace system_auth"
while true 
do 
   ccm node3 cqlsh -u cassandra -p cassandra -e "describe keyspace system_auth"
   sleep 1
done

yet maybe the customer also started his node in this manner (the ami by default I think has it to false)

slivne · 2017-03-06T16:26:32Z

the customer is using the ami

by default the ami sets auto_bootstrap to false.
once the instance booted - they stopped the instance and changed the setting to be true - yet by that time it was to late.

I am checking if we have an ability in the ami to change the default bootstrap value.

slivne · 2017-03-06T16:30:35Z

so there is a way to start the ami tell it to have auto_bootstrap set to true

in the user data setting

--bootstrap

will change the default

https://github.com/scylladb/scylla-ami/blob/master/ds2_configure.py#L216

We need to update our documentation to reflect that

http://docs.scylladb.com/kb/add-nodes-on-ec2/

currently it only mentions seeds - we also need to specify bootstrap.

glommer · 2017-03-06T17:58:41Z

On Mon, Mar 6, 2017 at 11:30 AM, Shlomi Livne ***@***.***> wrote: so there is a way to start the ami tell it to have auto_bootstrap set to true in the user data setting --bootstrap will change the default https://github.com/scylladb/scylla-ami/blob/master/ds2_configure.py#L216 We need to update our documentation to reflect that http://docs.scylladb.com/kb/add-nodes-on-ec2/ currently it only mentions seeds - we also need to specify bootstrap.

I would double check to make sure that bootstrap indeed only sets this parameter at first boot. If we weren't even aware of that, imagine how well tested this is? Imagine an AMI set with this being bootstrapped every time it boots...

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2129 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAUNvddbKWQHkGikFnBFIwu6Ga6nCklVks5rjDSugaJpZM4MS1_H> .

tzach · 2017-03-06T18:20:55Z

I think the right way is to use an AMI parameter for not starting Scylla.
Then you can update the bootstrap, cluster name etc without affecting the running cluster.

tgrabiec · 2017-03-06T18:21:10Z

2017-03-06 17:30 GMT+01:00 Shlomi Livne <notifications@github.com>:

so there is a way to start the ami tell it to have auto_bootstrap set to true in the user data setting --bootstrap will change the default https://github.com/scylladb/scylla-ami/blob/master/ds2_configure.py#L216 We need to update our documentation to reflect that http://docs.scylladb.com/kb/add-nodes-on-ec2/ currently it only mentions seeds - we also need to specify bootstrap.

This procedure is missing more things, which are present in [1], like running nodetool cleanup. Since that page is EC2-specific, it should link to the general procedure. [1] https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_node_to_cluster_t.html

tzach · 2017-03-06T18:41:10Z

The generic add node does include cleanup
http://docs.scylladb.com/procedures/add_node_to_cluster/

KB section is out of date, and need to be revisited and moved under procedures.
https://github.com/scylladb/scylla-docs/issues/176

@nirmaayan FYI

tgrabiec · 2017-03-06T19:15:34Z

The problem doesn't reproduce on Cassandra 2.2, because they create default keyspace metadata with timestamp=0, so it always looses with manual changes.

Since https://issues.apache.org/jira/browse/CASSANDRA-8853

tgrabiec · 2017-03-07T09:33:27Z

Actually, it seems to be like that since 1.2.2 (https://issues.apache.org/jira/browse/CASSANDRA-5112)

tgrabiec · 2017-03-07T12:04:18Z

Before we can apply a similar solution, we have to fix schema_tables::make_create_table_mutations() to not generate keyspace mutations using make_create_keyspace_mutations(), which generates keyspace mutations with current timestamp.

…ications There is a workaround for notification race, which attaches keyspace mutations to other schema changes in case the target node missed the keyspace creation. Currently that generated keyspace mutations on the spot instead of using the ones stored in schema tables. Those mutations would have current timestamp, as if the keyspace has been just modified. This is problematic because this may generate an overwrite of keyspace parameters with newer timestamp but with stale values, if the node is not up to date with keyspace metadata. That's especially the case when booting up a node without enabling auto_bootstrap. In such case the node will not wait for schema sync before creating auth tables. Such table creation will attach potentially out of date mutations for keyspace metadata, which may overwrite changes made to keyspace paramteters made earlier in the cluster. Refs #2129.

…d tracing keyspaces" form Tomek "If a node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129." * tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev: db: Create default auth and tracing keyspaces using lowest timestamp migration_manager: Append actual keyspace mutations with schema notifications

…d tracing keyspaces" form Tomek "If a node is bootstrapped with auto_boostrap disabled, it will not wait for schema sync before creating global keyspaces for auth and tracing. When such schema changes are then reconciled with schema on other nodes, they may overwrite changes made by the user before the node was started, because they will have higher timestamp. To prevent that, let's use minimum timestamp so that default schema always looses with manual modifications. This is what Cassandra does. Fixes #2129." * tag 'tgrabiec/prevent-keyspace-metadata-loss-v1' of github.com:scylladb/seastar-dev: db: Create default auth and tracing keyspaces using lowest timestamp migration_manager: Append actual keyspace mutations with schema notifications (cherry picked from commit 6db6d25)

…stamp This was needed to fix issue scylladb#2129 which was only manifest itself with auto_bootstrap set to false. The option is ignored now and we always wait for schema to synch during boot.

…ions in `start()` When creating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. This was to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). This workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.

…stamp This was needed to fix issue #2129 which was only manifest itself with auto_bootstrap set to false. The option is ignored now and we always wait for schema to synch during boot.

…ions in `start()` When creating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. This was to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). This workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.

…ions in `start()` When creating or updating internal distributed tables in `system_distributed_keyspace::start()`, hardcoded timestamps were used. There two reasons for this: - to protect against issue scylladb#2129, where nodes would start without synchronizing schema with the existing cluster, creating the tables again, which would override any manual user changes to these tables. The solution was to use small timestamps (like api::min_timestamp) - the user-created schema mutations would always 'win' (because when they were created, they used current time). - to eliminate unnecessary schema sync. If two nodes created these tables concurrently with different timestamps, the schemas would formally be different and would need to merge. This could happen during upgrades when we upgraded from a version which doesn't have these tables or doesn't have some columns. The scylladb#2129 workaround is no longer necessary: when nodes start they always have to sync schema with existing nodes; we also don't allow bootstrapping nodes in parallel. The second problem would happen during parallel bootstrap, which we don't allow, or during parallel upgrade. The procedure we recommend is rolling upgrade - where nodes are upgraded one by one. In this case only one node is going to create/update the tables; following upgraded nodes will sync schema first and notice they don't need to do anything. So if procedures are followed correctly, the workaround is not needed. If someone doesn't follow the procedures and upgrades nodes in parallel, these additional schema synchronizations are not a big cost, so the workaround doesn't give us much in this case as well. When schema changes are performed by Raft group 0, certain constraints are placed on the timestamps used for mutations. For this we'll need to be able to use timestamps which are generated based on current time.

Add a background fiber that works to free memory using spare cycles, so that allocations don't have to evict cache synchronously. The shares for the fiber are increased the closer we are to running out of memory, preferring to steal cycles from the workload rather than encountering stalls. The last patch is not strictly related but is a good idea. See backport notes in the first patch. The others were trivial. Test: unit (dev) Ref scylladb#2113 Ref scylladb#2106 Ref scylladb#2071 Ref scylladb#2039 Closes scylladb#2129 * github.com:scylladb/scylla-enterprise: lsa: Mark compact_segment_locked() as noexcept lsa: Avoid excessive eviction if region is not compactible logalloc: fix quadratic behaviour of reclaim_from_evictable logalloc: reduce minimum lsa reserve in allocating_section to 1 main: start background reclaim before bootstrap Merge 'lsa: background reclaim' from Avi Kivity logalloc: background reclaim

slivne assigned tgrabiec Mar 3, 2017

slivne added the type/bug label Mar 3, 2017

pdziepak closed this as completed in d6425e7 Mar 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding a new node with authentication set - causes system_auth keyspace RF to be set to 1 #2129

Adding a new node with authentication set - causes system_auth keyspace RF to be set to 1 #2129

slivne commented Mar 3, 2017

glommer commented Mar 4, 2017

tgrabiec commented Mar 6, 2017

slivne commented Mar 6, 2017 via email •

edited

tgrabiec commented Mar 6, 2017

slivne commented Mar 6, 2017 •

edited

tgrabiec commented Mar 6, 2017

slivne commented Mar 6, 2017 via email •

edited

slivne commented Mar 6, 2017

slivne commented Mar 6, 2017

slivne commented Mar 6, 2017

glommer commented Mar 6, 2017 via email

tzach commented Mar 6, 2017

tgrabiec commented Mar 6, 2017 via email •

edited

tzach commented Mar 6, 2017

tgrabiec commented Mar 6, 2017

tgrabiec commented Mar 7, 2017

tgrabiec commented Mar 7, 2017

Adding a new node with authentication set - causes system_auth keyspace RF to be set to 1 #2129

Adding a new node with authentication set - causes system_auth keyspace RF to be set to 1 #2129

Comments

slivne commented Mar 3, 2017

glommer commented Mar 4, 2017

tgrabiec commented Mar 6, 2017

slivne commented Mar 6, 2017 via email • edited

tgrabiec commented Mar 6, 2017

slivne commented Mar 6, 2017 • edited

tgrabiec commented Mar 6, 2017

slivne commented Mar 6, 2017 via email • edited

slivne commented Mar 6, 2017

slivne commented Mar 6, 2017

slivne commented Mar 6, 2017

glommer commented Mar 6, 2017 via email

tzach commented Mar 6, 2017

tgrabiec commented Mar 6, 2017 via email • edited

tzach commented Mar 6, 2017

tgrabiec commented Mar 6, 2017

tgrabiec commented Mar 7, 2017

tgrabiec commented Mar 7, 2017

slivne commented Mar 6, 2017 via email •

edited

slivne commented Mar 6, 2017 •

edited

slivne commented Mar 6, 2017 via email •

edited

tgrabiec commented Mar 6, 2017 via email •

edited