Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add replace-node-first-boot option #12316

Merged
merged 13 commits into from
Jan 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 4 additions & 3 deletions db/config.cc
Original file line number Diff line number Diff line change
Expand Up @@ -788,9 +788,10 @@ db::config::config(std::shared_ptr<db::extensions> exts)
, consistent_rangemovement(this, "consistent_rangemovement", value_status::Used, true, "When set to true, range movements will be consistent. It means: 1) it will refuse to bootstrap a new node if other bootstrapping/leaving/moving nodes detected. 2) data will be streamed to a new node only from the node which is no longer responsible for the token range. Same as -Dcassandra.consistent.rangemovement in cassandra")
, join_ring(this, "join_ring", value_status::Unused, true, "When set to true, a node will join the token ring. When set to false, a node will not join the token ring. User can use nodetool join to initiate ring joinging later. Same as -Dcassandra.join_ring in cassandra.")
, load_ring_state(this, "load_ring_state", value_status::Used, true, "When set to true, load tokens and host_ids previously saved. Same as -Dcassandra.load_ring_state in cassandra.")
, replace_address(this, "replace_address", value_status::Used, "", "The listen_address or broadcast_address of the dead node to replace. Same as -Dcassandra.replace_address.")
, replace_address_first_boot(this, "replace_address_first_boot", value_status::Used, "", "Like replace_address option, but if the node has been bootstrapped successfully it will be ignored. Same as -Dcassandra.replace_address_first_boot.")
, ignore_dead_nodes_for_replace(this, "ignore_dead_nodes_for_replace", value_status::Used, "", "List dead nodes to ingore for replace operation using a comman-separated list of either host IDs or ip addresses. E.g., scylla --ignore-dead-nodes-for-replace 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c,125ed9f4-7777-1dbn-mac8-43fddce9123e")
, replace_node_first_boot(this, "replace_node_first_boot", value_status::Used, "", "The Host ID of a dead node to replace. If the replacing node has already been bootstrapped successfully, this option will be ignored.")
, replace_address(this, "replace_address", value_status::Used, "", "[[deprecated]] The listen_address or broadcast_address of the dead node to replace. Same as -Dcassandra.replace_address.")
, replace_address_first_boot(this, "replace_address_first_boot", value_status::Used, "", "[[deprecated]] Like replace_address option, but if the node has been bootstrapped successfully it will be ignored. Same as -Dcassandra.replace_address_first_boot.")
, ignore_dead_nodes_for_replace(this, "ignore_dead_nodes_for_replace", value_status::Used, "", "List dead nodes to ingore for replace operation using a comma-separated list of host IDs. E.g., scylla --ignore-dead-nodes-for-replace 8d5ed9f4-7764-4dbd-bad8-43fddce94b7c,125ed9f4-7777-1dbn-mac8-43fddce9123e")
, override_decommission(this, "override_decommission", value_status::Used, false, "Set true to force a decommissioned node to join the cluster")
, enable_repair_based_node_ops(this, "enable_repair_based_node_ops", liveness::LiveUpdate, value_status::Used, true, "Set true to use enable repair based node operations instead of streaming based")
, allowed_repair_based_node_ops(this, "allowed_repair_based_node_ops", liveness::LiveUpdate, value_status::Used, "replace", "A comma separated list of node operations which are allowed to enable repair based node operations. The operations can be bootstrap, replace, removenode, decommission and rebuild")
Expand Down
1 change: 1 addition & 0 deletions db/config.hh
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ public:
named_value<bool> consistent_rangemovement;
named_value<bool> join_ring;
named_value<bool> load_ring_state;
named_value<sstring> replace_node_first_boot;
named_value<sstring> replace_address;
named_value<sstring> replace_address_first_boot;
named_value<sstring> ignore_dead_nodes_for_replace;
Expand Down
3 changes: 2 additions & 1 deletion dist/docker/commandlineparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,6 @@ def parse():
parser.add_argument('--authorizer', default=None, dest='authorizer', help="Set authorizer class")
parser.add_argument('--cluster-name', default=None, dest='clusterName', help="Set cluster name")
parser.add_argument('--endpoint-snitch', default=None, dest='endpointSnitch', help="Set endpoint snitch")
parser.add_argument('--replace-address-first-boot', default=None, dest='replaceAddressFirstBoot', help="IP address of a dead node to replace.")
parser.add_argument('--replace-node-first-boot', default=None, dest='replaceNodeFirstBoot', help="Host ID of a dead node to replace.")
parser.add_argument('--replace-address-first-boot', default=None, dest='replaceAddressFirstBoot', help="[[deprecated]] IP address of a dead node to replace.")
return parser.parse_known_args()
5 changes: 4 additions & 1 deletion dist/docker/scyllasetup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ def __init__(self, arguments, extra_arguments):
self._clusterName = arguments.clusterName
self._endpointSnitch = arguments.endpointSnitch
self._replaceAddressFirstBoot = arguments.replaceAddressFirstBoot
self._replaceNodeFirstBoot = arguments.replaceNodeFirstBoot
self._io_setup = arguments.io_setup
self._extra_args = extra_arguments

Expand Down Expand Up @@ -153,7 +154,9 @@ def arguments(self):
if self._endpointSnitch is not None:
args += ["--endpoint-snitch %s" % self._endpointSnitch]

if self._replaceAddressFirstBoot is not None:
if self._replaceNodeFirstBoot is not None:
args += ["--replace-node-first-boot %s" % self._replaceNodeFirstBoot]
elif self._replaceAddressFirstBoot is not None:
args += ["--replace-address-first-boot %s" % self._replaceAddressFirstBoot]

args += ["--blocked-reactor-notify-ms 999999999"]
Expand Down
4 changes: 2 additions & 2 deletions docs/operating-scylla/admin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@ The following addresses can be configured in scylla.yaml:
- Address for REST API requests. See api_port in the :ref:`Networking <cqlsh-networking>` parameters.
* - prometheus_address
- Address for Prometheus queries. See prometheus_port in the :ref:`Networking <cqlsh-networking>` parameters and `ScyllaDB Monitoring Stack <https://monitoring.docs.scylladb.com/stable/>`_ for more details.
* - replace_address_first_boot
- Address of the node this Scylla instance is meant to replace. Refer to :doc:`Replace a Dead Node in a Scylla Cluster </operating-scylla/procedures/cluster-management/replace-dead-node>` for more details.
* - replace_node_first_boot
- Host ID of a dead node this Scylla node is replacing. Refer to :doc:`Replace a Dead Node in a Scylla Cluster </operating-scylla/procedures/cluster-management/replace-dead-node>` for more details.

.. note:: When the listen_address, rpc_address, broadcast_address, and broadcast_rpc_address parameters are not set correctly, Scylla does not work as expected.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ To recover the data and rebuild the node, follow this procedure:

#. Open the ``/etc/scylla/scylla.yaml`` file.

#. Add, if not present,else edit, the ``replace_address_first_boot`` parameter and change it to the
IP of the node before it restarted it might be the same IP after restart.
#. Add, if not present, else edit, the ``replace_node_first_boot`` parameter and change it to the
Host ID of the node before it restarted.
#. Stop Scylla Server

.. include:: /rst_include/scylla-commands-stop-index.rst
Expand All @@ -23,6 +23,6 @@ To recover the data and rebuild the node, follow this procedure:

.. include:: /rst_include/scylla-commands-start-index.rst

#. Revert the ``replace_address_first_boot`` setting to what they were before you ran this procedure.
For ease of use, you can comment out the ``replace_address_first_boot`` parameter.
#. Revert the ``replace_node_first_boot`` setting to what they were before you ran this procedure.
For ease of use, you can comment out the ``replace_node_first_boot`` parameter.

Original file line number Diff line number Diff line change
Expand Up @@ -66,11 +66,11 @@ Procedure

- **rpc_address** - Address for client connection (Thrift, CQL)

#. Add the ``replace_address_first_boot`` parameter to the ``scylla.yaml`` config file on the new node. This line can be added to any place in the config file. After a successful node replacement, there is no need to remove it from the ``scylla.yaml`` file. (Note: The obsolete parameter "replace_address" is not supported and should not be used). The value of the ``replace_address_first_boot`` parameter should be the IP address of the node to be replaced.
#. Add the ``replace_node_first_boot`` parameter to the ``scylla.yaml`` config file on the new node. This line can be added to any place in the config file. After a successful node replacement, there is no need to remove it from the ``scylla.yaml`` file. (Note: The obsolete parameters "replace_address" and "replace_address_first_boot" are not supported and should not be used). The value of the ``replace_node_first_boot`` parameter should be the Host ID of the node to be replaced.

For example (using the address of the failed node from above):
For example (using the Host ID of the failed node from above):

``replace_address_first_boot: 192.168.1.203``
``replace_node_first_boot: 675ed9f4-6564-6dbd-can8-43fddce952gy``

#. Start Scylla node.

Expand Down Expand Up @@ -171,11 +171,11 @@ In this case, the node's data will be cleaned after restart. To remedy this, you

sudo sed -e '/.*scylla/s/^/#/g' -i /etc/fstab

#. Run the following command, replacing 172.30.0.186 with the listen_address / rpc_address of the node that you are restarting:
#. Run the following command to replace the instance whose ephemeral volumes were erased (previously known by the Host ID of the node you are restarting) with the restarted instance. The restarted node will be assigned a new random Host ID.
bhalevy marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: none

echo 'replace_address_first_boot: 172.30.0.186' | sudo tee --append /etc/scylla/scylla.yaml
echo 'replace_node_first_boot: 675ed9f4-6564-6dbd-can8-43fddce952gy' | sudo tee --append /etc/scylla/scylla.yaml

#. Run the following command to re-setup RAID

Expand Down
4 changes: 4 additions & 0 deletions init.cc
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ std::set<gms::inet_address> get_seeds_from_db_config(const db::config& cfg) {
startlog.error("Use broadcast_address instead of listen_address for seeds list");
throw std::runtime_error("Use broadcast_address for seeds list");
}
if (!cfg.replace_node_first_boot().empty() && seeds.contains(broadcast_address)) {
startlog.error("Bad configuration: replace-node-first-boot is not allowed for seed nodes");
throw bad_configuration_error();
}
if ((!cfg.replace_address_first_boot().empty() || !cfg.replace_address().empty()) && seeds.contains(broadcast_address)) {
bhalevy marked this conversation as resolved.
Show resolved Hide resolved
startlog.error("Bad configuration: replace-address and replace-address-first-boot are not allowed for seed nodes");
throw bad_configuration_error();
Expand Down