Skip to content

Commit

Permalink
refactor(test configuration): network interface configuration
Browse files Browse the repository at this point in the history
We need to configure Scylla networking with multiple NIC/IP combinations.
There are a few addresses to configure Scylla connections: rpc_address, listen_address,
broadcast_address, broadcast_rpc_address.

We want to be able to use different NIC/IP for addresses, at least, for rpc_address and
listen_address:
- rpc_address: ipv4, private, nic 0
- listen_address: ipv4, public, nic 1

This commit presents SCT configuration changes for support this.

According to issue scylladb/scylla-manager#3411
  • Loading branch information
juliayakovlev committed Nov 29, 2023
1 parent e798bb6 commit 1505d6f
Show file tree
Hide file tree
Showing 52 changed files with 1,010 additions and 145 deletions.
28 changes: 28 additions & 0 deletions configurations/network_config/all_ipv6_public.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
scylla_network_config:
- address: listen_address # Address Scylla listens for connections from other nodes. See storage_port and ssl_storage_ports.
ip_type: ipv6
public: true
listen_all: false # Should be True when multiple interfaces - Scylla should be listening on all interfaces
use_dns: false
nic: 0
- address: rpc_address # Address on which Scylla is going to expect Thrift and CQL client connections.
ip_type: ipv6
public: true
listen_all: false # Should be True when multiple interfaces - Scylla should be listening on all interfaces
use_dns: false
nic: 0
- address: broadcast_rpc_address # Address that is broadcasted to tell the clients to connect to. Related to rpc_address.
ip_type: ipv6
public: true # Should be False when multiple interfaces
use_dns: false
nic: 0
- address: broadcast_address # Address that is broadcasted to tell other Scylla nodes to connect to. Related to listen_address above.
ip_type: ipv6
public: true # Should be False when multiple interfaces
use_dns: false
nic: 0 # If ipv4 and public is True it has to be primary network interface (device index is 0)
- address: test_communication # Type of IP used to connect to machine instances
ip_type: ipv6
public: true
use_dns: false
nic: 0 # If ipv4 and public is True it has to be primary network interface (device index is 0)
28 changes: 28 additions & 0 deletions configurations/network_config/test_communication_public.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
scylla_network_config:
- address: listen_address # Address Scylla listens for connections from other nodes. See storage_port and ssl_storage_ports.
ip_type: ipv4
public: false
listen_all: false # Should be True when multiple interfaces - Scylla should be listening on all interfaces
use_dns: false
nic: 0
- address: rpc_address # Address on which Scylla is going to expect Thrift and CQL client connections.
ip_type: ipv4
public: false
listen_all: false # Should be True when multiple interfaces - Scylla should be listening on all interfaces
use_dns: false
nic: 0
- address: broadcast_rpc_address # Address that is broadcasted to tell the clients to connect to. Related to rpc_address.
ip_type: ipv4
public: false # Should be False when multiple interfaces
use_dns: false
nic: 0
- address: broadcast_address # Address that is broadcasted to tell other Scylla nodes to connect to. Related to listen_address above.
ip_type: ipv4
public: false # Should be False when multiple interfaces
use_dns: false
nic: 0 # If ipv4 and public is True it has to be primary network interface (device index is 0)
- address: test_communication # Type of IP used to connect to machine instances
ip_type: ipv4
public: true
use_dns: false
nic: 0 # If ipv4 and public is True it has to be primary network interface (device index is 0)
28 changes: 28 additions & 0 deletions configurations/network_config/two_interfaces.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
scylla_network_config:
- address: listen_address # Address Scylla listens for connections from other nodes. See storage_port and ssl_storage_ports.
ip_type: ipv4
public: false
listen_all: true # Should be True when multiple interfaces - Scylla should be listening on all interfaces
use_dns: false
nic: 1
- address: rpc_address # Address on which Scylla is going to expect Thrift and CQL client connections.
ip_type: ipv4
public: false
listen_all: true # Should be True when multiple interfaces - Scylla should be listening on all interfaces
use_dns: false
nic: 1
- address: broadcast_rpc_address # Address that is broadcasted to tell the clients to connect to. Related to rpc_address.
ip_type: ipv4
public: false # Should be False when multiple interfaces
use_dns: false
nic: 1
- address: broadcast_address # Address that is broadcasted to tell other Scylla nodes to connect to. Related to listen_address above.
ip_type: ipv4
public: false # Should be False when multiple interfaces
use_dns: false
nic: 1 # If ipv4 and public is True it has to be primary network interface (device index is 0)
- address: test_communication # Type of IP used to connect to machine instances
ip_type: ipv4
public: true
use_dns: false
nic: 0 # If ipv4 and public is True it has to be primary network interface (device index is 0)
34 changes: 34 additions & 0 deletions defaults/aws_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,37 @@ data_volume_disk_size: 500
data_volume_disk_iops: 10000 # depend on type iops could be 100-16000 for io2|io3 and 3000-16000 for gp3

kms_key_rotation_interval: 60

# TODO: this part should be moved to defaults/test_default.yaml when network interfaces configuration is supported for all backends
# NOT SUPPORTED CASES:
# 1. Multi network interfaces with IPv6 type

scylla_network_config:
- address: listen_address # Address Scylla listens for connections from other nodes. See storage_port and ssl_storage_ports.
listen_all: false # Should be True when multiple interfaces - Scylla should be listening on all interfaces
ip_type: ipv4
public: false # Only public IPv6 is supported by AWS
use_dns: false
nic: 0
- address: rpc_address # Address on which Scylla is going to expect Thrift and CQL client connections.
listen_all: false # Should be True when multiple interfaces - Scylla should be listening on all interfaces.
ip_type: ipv4
public: false # Only public IPv6 is supported by AWS
use_dns: false
nic: 0
- address: broadcast_rpc_address # Address that is broadcasted to tell the clients to connect to. Related to rpc_address.
ip_type: ipv4
public: false # Should be False when multiple interfaces; Only public IPv6 is supported by AWS
use_dns: false
nic: 0
- address: broadcast_address # Address that is broadcasted to tell other Scylla nodes to connect to. Related to listen_address above.
ip_type: ipv4
public: false # Should be False when multiple interfaces; Only public IPv6 is supported by AWS
use_dns: false
nic: 0
- address: test_communication # Type of IP used to connect from test to DB/monitor instances
ip_type: ipv4
public: false
use_dns: false
nic: 0 # If ipv4 and public is True it has to be primary network interface (device index is 0)
# TODO: end
24 changes: 24 additions & 0 deletions internal_test_data/network_config_interface_not_defined.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
test_duration: 5

n_db_nodes: 3
n_loaders: 1
n_monitor_nodes: 1
user_prefix: 'fruch-testing'

stress_cmd: ["cassandra-stress mixed cl=QUORUM duration=10m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=SizeTieredCompactionStrategy)' -mode cql3 native -rate threads=2 -pop seq=1..3000 -log interval=5" ]

stress_read_cmd: ["cassandra-stress user profile=/tmp/c-s_profile_4mv_5queries.yaml ops'(insert=15,read1=1,read2=1,read3=1,read4=1,read5=1)' cl=QUORUM duration=5760m -mode cql3 native -rate threads=10",
"cassandra-stress user profile=/tmp/c-s_profile_2mv_2queries.yaml ops'(insert=6,mv_p_read1=1,mv_p_read2=1)' cl=QUORUM duration=5760m -mode cql3 native -rate threads=10",
"cassandra-stress user profile=/tmp/c-s_profile_3si_5queries.yaml ops'(insert=25,si_read1=1,si_read2=1,si_read3=1,si_read4=1,si_read5=1)' cl=QUORUM duration=5760m -mode cql3 native -rate threads=10",
"cassandra-stress user profile=/tmp/c-s_profile_2si_2queries.yaml ops'(insert=10,si_p_read1=1,si_p_read2=1)' cl=QUORUM duration=5760m -mode cql3 native -rate threads=10"
]

db_type: scylla
instance_type_db: 'i4i.large'

scylla_network_config:
- address: "listen_address"
ip_type: “ipv4”
public: false
use_dns: false
nic: 0
28 changes: 28 additions & 0 deletions internal_test_data/network_config_interface_param_not_defined.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
test_duration: 5

n_db_nodes: 3
n_loaders: 1
n_monitor_nodes: 1
user_prefix: 'fruch-testing'

stress_cmd: ["cassandra-stress mixed cl=QUORUM duration=10m -schema 'replication(strategy=NetworkTopologyStrategy,replication_factor=3) compaction(strategy=SizeTieredCompactionStrategy)' -mode cql3 native -rate threads=2 -pop seq=1..3000 -log interval=5" ]

stress_read_cmd: ["cassandra-stress user profile=/tmp/c-s_profile_4mv_5queries.yaml ops'(insert=15,read1=1,read2=1,read3=1,read4=1,read5=1)' cl=QUORUM duration=5760m -mode cql3 native -rate threads=10",
"cassandra-stress user profile=/tmp/c-s_profile_2mv_2queries.yaml ops'(insert=6,mv_p_read1=1,mv_p_read2=1)' cl=QUORUM duration=5760m -mode cql3 native -rate threads=10",
"cassandra-stress user profile=/tmp/c-s_profile_3si_5queries.yaml ops'(insert=25,si_read1=1,si_read2=1,si_read3=1,si_read4=1,si_read5=1)' cl=QUORUM duration=5760m -mode cql3 native -rate threads=10",
"cassandra-stress user profile=/tmp/c-s_profile_2si_2queries.yaml ops'(insert=10,si_p_read1=1,si_p_read2=1)' cl=QUORUM duration=5760m -mode cql3 native -rate threads=10"
]

db_type: scylla
instance_type_db: 'i4i.large'

scylla_network_config:
- address: "listen_address"
ip_type: “ipv4”
use_dns: false
nic: 0
- address: "rpc_address"
ip_type: “ipv6”
public: true
use_dns: false
nic: 0
3 changes: 1 addition & 2 deletions jenkins-pipelines/longevity-10gb-3h-ipv6.jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ def lib = library identifier: 'sct@snapshot', retriever: legacySCM(scm)
longevityPipeline(
backend: 'aws',
region: 'eu-west-1',
ip_ssh_connections: 'ipv6',
test_name: 'longevity_test.LongevityTest.test_custom_time',
test_config: 'test-cases/longevity/longevity-10gb-3h.yaml'
test_config: '''["test-cases/longevity/longevity-10gb-3h.yaml", "configurations/network_config/all_ipv6_public.yaml"]'''

)
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ longevityPipeline(
backend: 'aws',
region: 'eu-west-1',
test_name: 'longevity_test.LongevityTest.test_custom_time',
test_config: 'test-cases/longevity/longevity-200GB-48h-network-monkey.yaml',
test_config: '''["test-cases/longevity/longevity-200GB-48h-network-monkey.yaml", "configurations/network_config/two_interfaces.yaml"]''',
ip_ssh_connections: 'public'
)
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@ def lib = library identifier: 'sct@snapshot', retriever: legacySCM(scm)

managerPipeline(
backend: 'aws',
ip_ssh_connections: 'ipv6',
region: 'us-east-1',
test_name: 'mgmt_cli_test.MgmtCliTest.test_manager_sanity',
test_config: 'test-cases/manager/manager-regression-ipv6.yaml',
test_config: '''["test-cases/manager/manager-regression-ipv6.yaml", "configurations/network_config/all_ipv6_public.yaml"]''',

post_behavior_db_nodes: 'destroy',
post_behavior_loader_nodes: 'destroy',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ longevityPipeline(
backend: 'aws',
region: 'eu-west-1',
test_name: 'longevity_test.LongevityTest.test_custom_time',
test_config: '''["test-cases/longevity/longevity-200GB-48h-network-monkey.yaml", "configurations/raft/enable_raft_experimental.yaml"]''',
test_config: '''["test-cases/longevity/longevity-200GB-48h-network-monkey.yaml", "configurations/raft/enable_raft_experimental.yaml", "configurations/network_config/two_interfaces.yaml"]''',
ip_ssh_connections: 'public'
)
39 changes: 36 additions & 3 deletions sdcm/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
from sdcm.prometheus import start_metrics_server, PrometheusAlertManagerListener, AlertSilencer
from sdcm.log import SDCMAdapter
from sdcm.provision.common.configuration_script import ConfigurationScriptBuilder
from sdcm.provision.network_configuration import ssh_connection_ip_type
from sdcm.provision.scylla_yaml import ScyllaYamlNodeAttrBuilder
from sdcm.provision.scylla_yaml.certificate_builder import ScyllaYamlCertificateAttrBuilder

Expand Down Expand Up @@ -233,8 +234,8 @@ class BaseNode(AutoSshContainerMixin, WebDriverContainerMixin): # pylint: disab
GOSSIP_STATUSES_FILTER_OUT = ['LEFT', # in case the node was decommissioned
'removed', # in case the node was removed by nodetool removenode
'BOOT', # node during boot and not exists in the cluster yet and they will remain
# in the gossipinfo 3 days.
# It's expected behaviour and we won't send the error in this case
# in the gossipinfo 3 days.
# It's expected behaviour and we won't send the error in this case
'shutdown' # when node was removed it may take more time to update the gossip info
]

Expand Down Expand Up @@ -292,6 +293,11 @@ def __init__(self, name, parent_cluster, ssh_login_info=None, base_logdir=None,

self._kernel_version = None
self._uuid = None
self.scylla_network_configuration = None

@property
def network_interfaces(self):
raise NotImplementedError()

def init(self) -> None:
if self.logdir:
Expand Down Expand Up @@ -725,6 +731,7 @@ def is_enterprise(self):

@property
def public_ip_address(self) -> Optional[str]:
# Primary network interface public IP
if self._public_ip_address_cached is None:
self._public_ip_address_cached = self._get_public_ip_address()
return self._public_ip_address_cached
Expand All @@ -746,6 +753,7 @@ def _get_public_ip_address(self) -> Optional[str]:

@property
def private_ip_address(self) -> Optional[str]:
# Primary network interface private IP
if self._private_ip_address_cached is None:
self._private_ip_address_cached = self._get_private_ip_address()
return self._private_ip_address_cached
Expand All @@ -759,6 +767,7 @@ def _get_private_ip_address(self) -> Optional[str]:

@property
def ipv6_ip_address(self) -> Optional[str]:
# Primary network interface public IPv6
if self._ipv6_ip_address_cached is None:
self._ipv6_ip_address_cached = self._get_ipv6_ip_address()
return self._ipv6_ip_address_cached
Expand All @@ -782,18 +791,28 @@ def _wait_private_ip(self):
time.sleep(1)
_, private_ips = self._refresh_instance_state()

def refresh_network_interfaces_info(self):
raise NotImplementedError()

def _refresh_instance_state(self):
raise NotImplementedError()

@cached_property
def cql_address(self):
# TODO: when new network configuration will be supported by all backends, take `cql_address` function from
# `sdcm.cluster_aws.AWSNode.cql_address`, move it here and remove from all cluster modules
if self.test_config.IP_SSH_CONNECTIONS == 'public':
self.log.debug("cql_address is: %s", self.external_address)
return self.external_address
with self.remote_scylla_yaml() as scylla_yaml:
return scylla_yaml.broadcast_rpc_address if scylla_yaml.broadcast_rpc_address else self.ip_address
cql_address = scylla_yaml.broadcast_rpc_address if scylla_yaml.broadcast_rpc_address else self.ip_address
self.log.debug("cql_address is: %s", cql_address)
return cql_address

@property
def ip_address(self):
# TODO: when new network configuration will be supported by all backends, take `ip_address` function from
# `sdcm.cluster_aws.AWSNode.ip_address`, move it here and remove from all cluster modules
if self.test_config.IP_SSH_CONNECTIONS == "ipv6":
return self.ipv6_ip_address
elif self.test_config.INTRA_NODE_COMM_PUBLIC:
Expand All @@ -803,6 +822,8 @@ def ip_address(self):

@property
def external_address(self):
# TODO: when new network configuration will be supported by all backends, take `external_address` function from
# `sdcm.cluster_aws.AWSNode.external_address`, move it here and remove from all cluster modules
"""
the communication address for usage between the test and the nodes
:return:
Expand Down Expand Up @@ -1265,6 +1286,8 @@ def wait_db_up(self, verbose=True, timeout=3600):
def is_manager_agent_up(self, port=None):
port = port if port else self.MANAGER_AGENT_PORT
# When the agent is IP, it should answer an https request of https://NODE_IP:10001/ping with status code 204
url = f"https://{normalize_ipv6_url(self.external_address)}:{port}/ping"
self.log.debug("Manager agent URL: %s", url)
response = requests.get(f"https://{normalize_ipv6_url(self.external_address)}:{port}/ping", verify=False)
return response.status_code == 204

Expand Down Expand Up @@ -5373,6 +5396,8 @@ def reconfigure_scylla_monitoring(self):
for db_node in self.targets["db_cluster"].nodes:
monitoring_targets.append(f"{normalize_ipv6_url(getattr(db_node, self.DB_NODES_IP_ADDRESS))}:9180")
monitoring_targets = " ".join(monitoring_targets)
if ssh_connection_ip_type(self.params) != "ipv6":
monitoring_targets = monitoring_targets.replace("[", "").replace("]", "")
node.remoter.sudo(shell_script_cmd(f"""\
cd {self.monitor_install_path}
mkdir -p {self.monitoring_conf_dir}
Expand Down Expand Up @@ -5610,6 +5635,10 @@ def __init__(self, name, parent_cluster, # pylint: disable=too-many-arguments,
super().__init__(name=name, parent_cluster=parent_cluster, ssh_login_info=ssh_login_info,
base_logdir=base_logdir, node_prefix=node_prefix, dc_idx=dc_idx, rack=rack)

@property
def network_interfaces(self):
raise NotImplementedError()

def _init_port_mapping(self):
pass

Expand Down Expand Up @@ -5655,3 +5684,7 @@ def private_dns_name(self):
@property
def public_dns_name(self) -> str:
raise NotImplementedError()

@property
def network_interfaces(self):
raise NotImplementedError()

0 comments on commit 1505d6f

Please sign in to comment.