Skip to content

Commit

Permalink
Suppress IPv6 metadata DAD failure and delete address
Browse files Browse the repository at this point in the history
IPv4 DAD is non-existent in Linux or its failure is silent, so we
never needed to catch and ignore it. On the other hand IPv6 DAD
failure is explicit, hence comes this change.

This of course leaves the metadata service dead on hosts where
duplicate address detection failed. But if we catch the
DADFailed exception and delete the address, at least other
functions of the dhcp-agent should not be affected.

With this the IPv6 isolated metadata service is not redundant, which
is the best we can do without a redesign.

Also document the promised service level of isolated metadata.

Added additional tests for the metadata driver as well.

Conflicts:
    neutron/tests/unit/agent/linux/test_dhcp.py
        conflict with 74224e7
    neutron/tests/unit/agent/metadata/test_driver.py
        conflict with 3d575f8

Change-Id: I6b544c5528cb22e5e8846fc47dfb8b05f70f975c
Partial-Bug: #1953165
(cherry picked from commit 2aee961)
(cherry picked from commit 071255f)
  • Loading branch information
rubasov committed May 2, 2023
1 parent 92cfdb4 commit 1c61528
Show file tree
Hide file tree
Showing 11 changed files with 139 additions and 27 deletions.
32 changes: 32 additions & 0 deletions doc/source/admin/config-dhcp-ha.rst
Expand Up @@ -441,6 +441,38 @@ To test the HA of DHCP agent:

#. Start DHCP agent on HostB. The VM gets the wanted IP again.

No HA for metadata service on isolated networks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All Neutron backends using the DHCP agent can also provide `metadata service
<https://docs.openstack.org/nova/latest/user/metadata.html>`_ in isolated
networks (i.e. networks without a router). In this case the DHCP agent manages
the metadata service (see config option `enable_isolated_metadata
<https://docs.openstack.org/neutron/latest/configuration/dhcp-agent.html#DEFAULT.enable_isolated_metadata>`_).

Note however that the metadata service is only redundant for IPv4, and not
IPv6, even when the DHCP service is configured to be highly available
(config option `dhcp_agents_per_network
<https://docs.openstack.org/neutron/latest/configuration/neutron.html#DEFAULT.dhcp_agents_per_network>`_
> 1). This is because the DHCP agent will insert a route to the well known
metadata IPv4 address (`169.254.169.254`) via its own IP address, so it will
be reachable as long as the DHCP service is available at that IP address.
This also means that recovery after a failure is tied to the renewal of the
DHCP lease, since that route will only change if the DHCP server for a VM
changes.

With IPv6, the well known metadata IPv6 address (`fe80::a9fe:a9fe`) is used,
but directly configured in the DHCP agent network namespace.
Due to the enforcement of duplicate address detection (DAD), this address
can only be configured in at most one DHCP network namespaces at any time.
See `RFC 4862 <https://www.rfc-editor.org/rfc/rfc4862#section-5.4>`_ for
details on the DAD process.

For this reason, even when you have multiple DHCP agents, an arbitrary one
(where the metadata IPv6 address is not in `dadfailed` state) will serve all
metadata requests over IPv6. When that metadata service instance becomes
unreachable there is no failover and the service will become unreachable.

Disabling and removing an agent
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
3 changes: 2 additions & 1 deletion neutron/agent/linux/dhcp.py
Expand Up @@ -40,6 +40,7 @@
from neutron.agent.linux import ip_lib
from neutron.agent.linux import iptables_manager
from neutron.cmd import runtime_checks as checks
from neutron.common import _constants as common_constants
from neutron.common import utils as common_utils
from neutron.ipam import utils as ipam_utils
from neutron.privileged.agent.linux import dhcp as priv_dhcp
Expand Down Expand Up @@ -1773,7 +1774,7 @@ def setup(self, network):
if self.conf.force_metadata or self.conf.enable_isolated_metadata:
ip_cidrs.append(constants.METADATA_CIDR)
if netutils.is_ipv6_enabled():
ip_cidrs.append(constants.METADATA_V6_CIDR)
ip_cidrs.append(common_constants.METADATA_V6_CIDR)

self.driver.init_l3(interface_name, ip_cidrs,
namespace=network.namespace)
Expand Down
8 changes: 6 additions & 2 deletions neutron/agent/linux/ip_lib.py
Expand Up @@ -103,6 +103,10 @@ class AddressNotReady(exceptions.NeutronException):
"become ready: %(reason)s")


class DADFailed(AddressNotReady):
pass


InvalidArgument = privileged.InvalidArgument


Expand Down Expand Up @@ -581,7 +585,7 @@ def wait_until_address_ready(self, address, wait_time=30):
"""Wait until an address is no longer marked 'tentative' or 'dadfailed'
raises AddressNotReady if times out, address not present on interface
or DAD fails
raises DADFailed if Duplicate Address Detection fails
"""
def is_address_ready():
try:
Expand All @@ -593,7 +597,7 @@ def is_address_ready():
# Since both 'dadfailed' and 'tentative' will be set if DAD fails,
# check 'dadfailed' first just to be explicit
if addr_info['dadfailed']:
raise AddressNotReady(
raise DADFailed(
address=address, reason=_('Duplicate address detected'))
if addr_info['tentative']:
return False
Expand Down
30 changes: 26 additions & 4 deletions neutron/agent/metadata/driver.py
Expand Up @@ -33,6 +33,7 @@
from neutron.agent.linux import external_process
from neutron.agent.linux import ip_lib
from neutron.agent.linux import utils as linux_utils
from neutron.common import _constants as common_constants
from neutron.common import coordination
from neutron.common import utils as common_utils

Expand Down Expand Up @@ -266,9 +267,30 @@ def spawn_monitored_metadata_proxy(cls, monitor, ns_name, port, conf,
# HAProxy cannot bind() until IPv6 Duplicate Address Detection
# completes. We must wait until the address leaves its 'tentative'
# state.
ip_lib.IpAddrCommand(
parent=ip_lib.IPDevice(name=bind_interface, namespace=ns_name)
).wait_until_address_ready(address=bind_address_v6)
try:
ip_lib.IpAddrCommand(
parent=ip_lib.IPDevice(name=bind_interface,
namespace=ns_name)
).wait_until_address_ready(address=bind_address_v6)
except ip_lib.DADFailed as exc:
# This failure means that another DHCP agent has already
# configured this metadata address, so all requests will
# be via that single agent.
LOG.info('DAD failed for address %(address)s on interface '
'%(interface)s in namespace %(namespace)s on network '
'%(network)s, deleting it. Exception: %(exception)s',
{'address': bind_address_v6,
'interface': bind_interface,
'namespace': ns_name,
'network': network_id,
'exception': str(exc)})
try:
ip_lib.delete_ip_address(bind_address_v6, bind_interface,
namespace=ns_name)
except Exception as exc:
# do not re-raise a delete failure, just log
LOG.info('Address deletion failure: %s', str(exc))
return
pm.enable()
monitor.register(uuid, METADATA_SERVICE_NAME, pm)
cls.monitors[router_id] = pm
Expand Down Expand Up @@ -363,6 +385,6 @@ def apply_metadata_nat_rules(router, proxy):
if netutils.is_ipv6_enabled():
for c, r in proxy.metadata_nat_rules(
proxy.metadata_port,
metadata_address=(constants.METADATA_V6_IP + '/128')):
metadata_address=(common_constants.METADATA_V6_CIDR)):
router.iptables_manager.ipv6['nat'].add_rule(c, r)
router.iptables_manager.apply()
3 changes: 3 additions & 0 deletions neutron/common/_constants.py
Expand Up @@ -81,3 +81,6 @@

# The lowest binding index for L3 agents and DHCP agents.
LOWEST_AGENT_BINDING_INDEX = 1

# Neutron-lib defines this with a /64 but it should be /128
METADATA_V6_CIDR = constants.METADATA_V6_IP + '/128'
4 changes: 3 additions & 1 deletion neutron/conf/agent/database/agentschedulers_db.py
Expand Up @@ -32,7 +32,9 @@
'network. If this number is greater than 1, the '
'scheduler automatically assigns multiple DHCP agents '
'for a given tenant network, providing high '
'availability for DHCP service.')),
'availability for the DHCP service. However this does '
'not provide high availability for the IPv6 metadata '
'service in isolated networks.')),
cfg.BoolOpt('enable_services_on_agents_with_admin_state_down',
default=False,
help=_('Enable services on an agent with admin_state_up '
Expand Down
3 changes: 2 additions & 1 deletion neutron/tests/unit/agent/dhcp/test_agent.py
Expand Up @@ -37,6 +37,7 @@
from neutron.agent.linux import interface
from neutron.agent.linux import utils as linux_utils
from neutron.agent.metadata import driver as metadata_driver
from neutron.common import _constants as common_constants
from neutron.common import config as common_config
from neutron.common import utils
from neutron.conf.agent import common as config
Expand Down Expand Up @@ -1902,7 +1903,7 @@ def _test_setup_helper(self, device_is_ready, ipv6_enabled=True,
expected_ips = ['172.9.9.9/24', const.METADATA_CIDR]

if ipv6_enabled:
expected_ips.append(const.METADATA_V6_CIDR)
expected_ips.append(common_constants.METADATA_V6_CIDR)

expected = [mock.call.get_device_name(port)]

Expand Down
3 changes: 2 additions & 1 deletion neutron/tests/unit/agent/linux/test_dhcp.py
Expand Up @@ -32,6 +32,7 @@
from neutron.agent.linux import dhcp
from neutron.agent.linux import ip_lib
from neutron.cmd import runtime_checks as checks
from neutron.common import _constants as common_constants
from neutron.conf.agent import common as config
from neutron.conf.agent import dhcp as dhcp_config
from neutron.conf import common as base_config
Expand Down Expand Up @@ -3258,7 +3259,7 @@ def mock_update(port_id, dict):
if enable_isolated_metadata or force_metadata:
expect_ips.extend([
constants.METADATA_CIDR,
constants.METADATA_V6_CIDR])
common_constants.METADATA_V6_CIDR])
mgr.driver.init_l3.assert_called_with('ns-XXX',
expect_ips,
namespace='qdhcp-ns')
Expand Down
2 changes: 1 addition & 1 deletion neutron/tests/unit/agent/linux/test_ip_lib.py
Expand Up @@ -791,7 +791,7 @@ def test_wait_until_address_ready_non_existent_address(self):
def test_wait_until_address_dadfailed(self):
self.addr_cmd.list = mock.Mock(
return_value=[{'tentative': True, 'dadfailed': True}])
with testtools.ExpectedException(ip_lib.AddressNotReady):
with testtools.ExpectedException(ip_lib.DADFailed):
self.addr_cmd.wait_until_address_ready('abcd::1234')

@mock.patch.object(common_utils, 'wait_until_true')
Expand Down
62 changes: 46 additions & 16 deletions neutron/tests/unit/agent/metadata/test_driver.py
Expand Up @@ -24,6 +24,7 @@

from neutron.agent.l3 import agent as l3_agent
from neutron.agent.l3 import router_info
from neutron.agent.linux import ip_lib
from neutron.agent.linux import iptables_manager
from neutron.agent.linux import utils as linux_utils
from neutron.agent.metadata import driver as metadata_driver
Expand Down Expand Up @@ -74,6 +75,7 @@ class TestMetadataDriverProcess(base.BaseTestCase):
EUNAME = 'neutron'
EGNAME = 'neutron'
METADATA_DEFAULT_IP = '169.254.169.254'
METADATA_DEFAULT_IPV6 = 'fe80::a9fe:a9fe'
METADATA_PORT = 8080
METADATA_SOCKET = '/socket/path'
PIDFILE = 'pidfile'
Expand Down Expand Up @@ -138,7 +140,7 @@ def test_after_router_updated_should_not_call_add_metadata_rules(self):
agent._process_updated_router(router)
f.assert_not_called()

def test_spawn_metadata_proxy(self):
def _test_spawn_metadata_proxy(self, dad_failed=False):
router_id = _uuid()
router_ns = 'qrouter-%s' % router_id
ip_class_path = 'neutron.agent.linux.ip_lib.IPWrapper'
Expand All @@ -162,29 +164,41 @@ def test_spawn_metadata_proxy(self):
'NamespaceManager.list_all', return_value={}),\
mock.patch(
'neutron.agent.linux.ip_lib.'
'IpAddrCommand.wait_until_address_ready') as mock_wait:
'IpAddrCommand.wait_until_address_ready') as mock_wait,\
mock.patch(
'neutron.agent.linux.ip_lib.'
'delete_ip_address') as mock_del:
agent = l3_agent.L3NATAgent('localhost')
agent.process_monitor = mock.Mock()
cfg_file = os.path.join(
metadata_driver.HaproxyConfigurator.get_config_path(
agent.conf.state_path),
"%s.conf" % router_id)
mock_open = self.useFixture(
lib_fixtures.OpenFixture(cfg_file)).mock_open
mock_wait.return_value = True
if dad_failed:
mock_wait.side_effect = ip_lib.DADFailed(
address=self.METADATA_DEFAULT_IP, reason='DAD failed')
else:
mock_wait.return_value = True
agent.metadata_driver.spawn_monitored_metadata_proxy(
agent.process_monitor,
router_ns,
self.METADATA_PORT,
agent.conf,
bind_address=self.METADATA_DEFAULT_IP,
router_id=router_id)
router_id=router_id,
bind_address_v6=self.METADATA_DEFAULT_IPV6,
bind_interface='fake-if')

netns_execute_args = [
'haproxy',
'-f', cfg_file]

log_tag = ("haproxy-" + metadata_driver.METADATA_SERVICE_NAME +
"-" + router_id)
bind_v6_line = 'bind %s:%s interface %s' % (
self.METADATA_DEFAULT_IPV6, self.METADATA_PORT, 'fake-if')
cfg_contents = metadata_driver._HAPROXY_CONFIG_TEMPLATE % {
'user': self.EUNAME,
'group': self.EGNAME,
Expand All @@ -197,18 +211,34 @@ def test_spawn_metadata_proxy(self):
'pidfile': self.PIDFILE,
'log_level': 'debug',
'log_tag': log_tag,
'bind_v6_line': ''}

mock_open.assert_has_calls([
mock.call(cfg_file, 'w'),
mock.call().write(cfg_contents)],
any_order=True)

ip_mock.assert_has_calls([
mock.call(namespace=router_ns),
mock.call().netns.execute(netns_execute_args, addl_env=None,
run_as_root=True)
])
'bind_v6_line': bind_v6_line}

if dad_failed:
agent.process_monitor.register.assert_not_called()
mock_del.assert_called_once_with(self.METADATA_DEFAULT_IPV6,
'fake-if',
namespace=router_ns)
else:
mock_open.assert_has_calls([
mock.call(cfg_file, 'w'),
mock.call().write(cfg_contents)], any_order=True)

ip_mock.assert_has_calls([
mock.call(namespace=router_ns),
mock.call().netns.execute(netns_execute_args,
addl_env=None, run_as_root=True)
])

agent.process_monitor.register.assert_called_once_with(
router_id, metadata_driver.METADATA_SERVICE_NAME,
mock.ANY)
mock_del.assert_not_called()

def test_spawn_metadata_proxy(self):
self._test_spawn_metadata_proxy()

def test_spawn_metadata_proxy_dad_failed(self):
self._test_spawn_metadata_proxy(dad_failed=True)

def test_create_config_file_wrong_user(self):
with mock.patch('pwd.getpwnam', side_effect=KeyError):
Expand Down
16 changes: 16 additions & 0 deletions releasenotes/notes/bug-1953165-6e848ea2c0398f56.yaml
@@ -0,0 +1,16 @@
---
issues:
- |
The high availability of metadata service on isolated networks is limited
or non-existent. IPv4 metadata is redundant when the DHCP agent managing
it is redundant, but recovery is tied to the renewal of the DHCP lease,
making most recoveries very slow. IPv6 metadata is not redundant at all
as the IPv6 metadata address can only be configured in a single place at
a time as it is link-local. Multiple agents trying to configure it will
generate an IPv6 duplicate address detection failure.
Administrators may observe the IPv6 metadata address in "dadfailed" state
in the DHCP namespace for this reason, which is only an indication it is
not highly available. Until a redesign is made to the isolated metadata
service there is not a better deployment option. See `bug 1953165
<https://bugs.launchpad.net/neutron/+bug/1953165>`_ for information.

0 comments on commit 1c61528

Please sign in to comment.