New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
openstack: octavia --provider ovn healthmon leaks ports #921
Comments
Instructions to reproduce: for net in 1 2 3; do openstack network create net$net; done
#[...]
for net in 1 2 3; do
openstack subnet create --network net$net --subnet-range 192.168.$net.0/24 subnet$net
done Boot three servers (it might be enough to allocate three ports, actually). openstack keypair create testkey > ~/.ssh/testkey.pem
chmod 0600 ~/.ssh/testkey.pem
DEB12=$(openstack image list --name="Debian 12" -f value -c ID)
echo $DEB12
#[...]
for net in 1 2 3; do
openstack server create --network net$net --key-name testkey --flavor SCS-1L-1 \
--block-device boot_index=0,uuid=$DEB12,source_type=image,volume_size=8,destination_type=volume,delete_on_termination=true \
vm$net
done
#[...] And now create loadbalancer with listener, pool, healthmon and three members: openstack loadbalancer create --provider ovn --vip-subnet-id=subnet3 --name testlb --wait
#[...]
openstack loadbalancer pool create --loadbalancer testlb --name testpool --protocol TCP --lb-algorithm SOURCE_IP_PORT --wait
#[...]
openstack loadbalancer listener create --name testlistener --default-pool testpool --protocol TCP --protocol-port 80 testlb --wait
#[...]
openstack loadbalancer healthmonitor create --name testhm --delay 3 --timeout 2 --max-retries 1 --max-retries-down 1 --type TCP testpool --wait
#[...]
for net in 1 2 3; do
FIXED=$(openstack server list --name vm$net -f value -c Networks | sed "s/^[^\[]*\['\([0-9]*\.[0-9]*\.[0-9]*\.[0-9]*\)'.*\$/\1/")
echo $FIXED
openstack loadbalancer member create --name member$net --address $FIXED --subnet-id subnet$net --protocol-port 80 testpool --wait
done
#[...] So now we have 3 servers in 3 networks with a loadbalancer member for each of the 3 and a health-monitor. openstack port list --fixed-ip subnet=subnet3
+--------------------------------------+-----------------------------------------+-------------------+-------------------------------------------+--------+
| ID | Name | MAC Address | Fixed IP Addresses | Status |
+--------------------------------------+-----------------------------------------+-------------------+-------------------------------------------+--------+
| 0737eb6b-305b-4174-a3b1-6b8055dd9527 | | fa:16:3e:bb:17:18 | ip_address='192.168.3.190', subnet_id='ee | ACTIVE |
| | | | bc5377-7c93-4012-87a6-37d8ba19d9a5' | |
| 65645d99-909f-4c72-937d-a2561af53561 | | fa:16:3e:88:68:a0 | ip_address='192.168.3.2', subnet_id='eebc | DOWN |
| | | | 5377-7c93-4012-87a6-37d8ba19d9a5' | |
| 78d1cbba-b549-41e2-849a-4453dfba7bd0 | ovn-lb-vip-b9c9c4a3-3d31-4028-bb4e- | fa:16:3e:17:f1:0b | ip_address='192.168.3.148', subnet_id='ee | DOWN |
| | e595b66a55d9 | | bc5377-7c93-4012-87a6-37d8ba19d9a5' | |
| 8873b3f2-f414-492a-9d30-0a870620dce1 | ovn-lb-hm- | fa:16:3e:88:cc:4e | ip_address='192.168.3.24', subnet_id='eeb | DOWN |
| | eebc5377-7c93-4012-87a6-37d8ba19d9a5 | | c5377-7c93-4012-87a6-37d8ba19d9a5' | |
+--------------------------------------+-----------------------------------------+-------------------+-------------------------------------------+--------+ There is the DHCP port (.2) in each network, the server port (.190 in this example), a port for the healthmonitor and a port for the virtual IP of the listener. The latter only exists in subnet3, as we have chosen it during the LB creation. So subnet1 and subnet2 have one less port. for net in 1 2 3; do openstack loadbalancer member delete testpool member$net --wait; done
# No output, takes a few seconds
openstack loadbalancer healthmonitor delete testhm --wait
# No output If we now look at the ports in the subnets, we can see that in subnet3, the |
For completeness, let's do the cleanup here. openstack loadbalancer delete --cascade testlb --wait
# No output
openstack server delete vm1 vm2 vm3 --wait
# No output subnet3 is clean now, but subnet1 and subnet2 still have the leaked port from the healthmon. for net in 1 2 3; do
PORT=$(openstack port list --fixed-ip subnet=subnet$net -f value -c ID -c Name | grep ovn-lb-hm | cut -f1 -d" ")
if test -n "$PORT"; then openstack port delete $PORT; fi
done Now we can proceed normally again and clean up the restt: for net in 1 2 3; do openstack subnet delete subnet$net; openstack network delete net$net; done And also remove the generated key pair: openstack keypair delete testkey |
If we delete the lb healthmonitor first, things are fine and all ovn-lb-hm ports are being cleaned up. |
And another valid workaround: |
ovn_octavia_provider/helper.py: member_delete() does only remote the hm port, if the whole pool has gone offline. |
Code should probably look similar to this: def _member_list(self):
existing_members = ovn_lb.external_ids.get(
ovn_const.OVN_MEMBER_STATUS_KEY)
try:
existing_members = jsonutils.loads(existing_members)
except TypeError:
LOG.debug("no member status on external_ids: %s",
str(existing_members))
existing_members = {}
return existing_members
def member_delete(self, member):
error_deleting_member = False
try:
pool_key, ovn_lb = self._find_ovn_lb_by_pool_id(
member[constants.POOL_ID])
pool_status = self._remove_member(member, ovn_lb, pool_key)
if ovn_lb.health_check:
mysubnet = member[constants.SUBNET_ID]
member_in_subnet = False
for amember in self._member_list():
if amember[constants.SUBNET_ID] == mysubnet and
amember[constants.PROVISIONING_STATUS] != constants.DELETED:
member_in_subnet = True
break
if not member_in_subnet:
# NOTE(garloff): if we were the last member in the subnet
# we should clean up the hm-port.
# (froyo) We need to do this call after the cleaning of the
# ip_port_mappings for the ovn LB.
self._clean_up_hm_port(member[constants.SUBNET_ID])
[...] |
I will submit bug and fix upstream as soon as I have a suitable test environment. |
With #890 out of the way, I'll have a development environment again to tackle this one... |
Testcase script: #!/bin/bash
# Testcase to demonstrate ovn hm port leak (OSISM issue 921)
# Usage: ./testcase_ovn_osism921.sh [CLEANUP]
# CLEANUP is to remove remaining ressources from previous runs,
# should not normally be needed.
# Script displays errors and has a non-zero exit code in case there are errors.
# (c) Kurt Garloff <scs@garloff.de>, 3/2024
# SPDX-License-Identifier: Apache-2.0
BOLD="\e[0;1m"
RED="\e[0;31m"
GREEN="\e[0;32m"
NORM="\e[0;0m"
NETS="1 2"
NONETS=$(echo $NETS | wc -w)
declare -i cnt=0
# Input: $@ => networks to look at (optional, default: $NETS), testsubnet will be prepended
# Output: $cnt has number of found ovn-lb-hm ports
count_hm_ports()
{
nets="$@"
if test -z "$nets"; then nets="$NETS"; fi
cnt=0
for net in $nets; do
let cnt+=$(openstack port list --fixed-ip subnet=testsubnet$net | grep 'ovn\-lb\-hm\-' | wc -l)
done
return 0
}
declare -i err=0
# Input: $1: Expected number of ports
# $2 - ...: networks to look at (optional, see count_hm_ports)
# Output: None (but err will be increase if error
expect_hm_ports()
{
expected=$1
shift
count_hm_ports "$@"
echo -en "${BOLD}Found $cnt ovn-lb-hm ports: "
if test $cnt != $expected; then
echo -e "${RED}BAD, expected ${expected}${NORM}"
if test $cnt -gt $expected; then let err+=$((cnt-expected)); else let err+=$((expected-cnt)); fi
else
echo -e "${GREEN}GOOD${NORM}"
fi
}
if test "$1" = "-d" -o "$1" = "--debug"; then set -x; shift; fi
if test "$1" != "CLEANUP"; then
set -e
echo -e "${BOLD}Create $NONETS networks and ports${NORM}"
for net in $NETS; do
openstack network create testnet$net >/dev/null
openstack subnet create --network testnet$net --subnet-range 192.168.$net.0/24 testsubnet$net >/dev/null
for port in 1 2; do
openstack port create --network testnet$net testport$net.$port >/dev/null
done
done
echo -e "${BOLD}Setup loadbalancer with ${NONETS}x2 members${NORM}"
openstack loadbalancer create --name testlb --provider ovn --vip-subnet-id testsubnet1 --wait >/dev/null
openstack loadbalancer pool create --loadbalancer testlb --name testpool --protocol TCP --lb-algorithm SOURCE_IP_PORT --wait >/dev/null
openstack loadbalancer listener create --name testlistener --default-pool testpool --protocol TCP --protocol-port 80 testlb --wait >/dev/null
openstack loadbalancer healthmonitor create --name testhm --delay 3 --timeout 2 --max-retries 1 --max-retries-down 1 --type TCP testpool --wait >/dev/null
echo -en " ${BOLD}Add the ${NONETS}x2 ports as members${NORM}"
for net in $NETS; do
for port in 1 2; do
FIXED=$(openstack port show testport$net.$port -f json | jq '.fixed_ips[] | .ip_address' | tr -d '"')
echo -n " $FIXED"
openstack loadbalancer member create --name testmember$net.$port --address $FIXED --subnet-id testsubnet$net --protocol-port 80 testpool --wait >/dev/null
done
done
echo
# We should have 2 ovn-lb-hm ports now, one in each subnet
expect_hm_ports $NONETS
set +e
fi # CLEANUP
# Now start deleting members
echo -e "${BOLD}Deleting one member from each subnet${NORM}"
for net in $NETS; do
openstack loadbalancer member delete testpool testmember$net.2 --wait
done
expect_hm_ports $NONETS
echo -e "${BOLD}Deleting last members from each subnet${NORM}"
for net in $NETS; do
openstack loadbalancer member delete testpool testmember$net.1 --wait
done
openstack loadbalancer healthmonitor delete testhm --wait
expect_hm_ports 0
echo -e "${BOLD}Cleaning up the rest of the LB${NORM}"
openstack loadbalancer listener delete testlistener --wait
openstack loadbalancer pool delete testpool --wait
openstack loadbalancer delete testlb --wait
echo -e "${BOLD}Cleaning up ports and networks ${NORM}"
for net in $NETS; do
for port in 1 2; do
openstack port delete testport$net.$port
done
# Extra cleanup ovn-lb-hm
OVNHMPORTS=$(openstack port list --fixed-ip subnet=testsubnet$net | grep 'ovn\-lb\-hm' | sed 's/^[^o]*\(ovn\-lb\-hm\-[0-9a-f\-]*\).*$/\1/')
if test -n "$OVNHMPORTS"; then
echo -e "${RED}ERROR: $OVNHMPORTS still present, cleaning up${NORM}"
openstack port delete $OVNHMPORTS
fi
openstack subnet delete testsubnet$net
if test $? != 0; then
SUBS=$(openstack subnet list -f value -c ID -c Name | grep testsubnet$net | awk '{ print $1; }')
if test -n "$SUBS"; then openstack subnet delete $SUBS; fi
fi
openstack network delete testnet$net
if test $? != 0; then
NETS=$(openstack network list -f value -c ID -c Name | grep testnet$net | awk '{ print $1; }')
if test -n "$NETS"; then openstack network delete $NETS; fi
fi
done
echo -e "${BOLD}There were $err errors${NORM}"
exit $err |
Issue is still present with 7.0.1, but I can now test and finalize the fix on my CiaB. |
Working patch for ovn_octavia_provider: --- ovn_octavia_provider/helper.py.orig 2024-03-27 20:17:00.000000000 +0000
+++ ovn_octavia_provider/helper.py 2024-03-31 23:19:13.747183391 +0000
@@ -2046,20 +2046,33 @@ class OvnProviderHelper():
user_fault_string=msg,
operator_fault_string=msg)
+ def _members_in_subnet(self, ovn_lb, subnet_id):
+ members = []
+ for key, value in ovn_lb.external_ids.items():
+ if key.startswith(ovn_const.LB_EXT_IDS_MEMBER_PREFIX):
+ mem_id, mem_ip_port, mem_subnet = key.split('_')[1:]
+ #mem_ip, mem_port = mem_ip_port.split(':')
+ if mem_subnet == subnet_id:
+ members.append(mem_id)
+ return members
+
def member_delete(self, member):
error_deleting_member = False
try:
- pool_key, ovn_lb = self._find_ovn_lb_by_pool_id(
- member[constants.POOL_ID])
+ pool_id = member[constants.POOL_ID]
+ pool_key, ovn_lb = self._find_ovn_lb_by_pool_id(pool_id)
pool_status = self._remove_member(member, ovn_lb, pool_key)
- if ovn_lb.health_check and pool_status == constants.OFFLINE:
- # NOTE(froyo): if the pool status is OFFLINE there are no more
- # members. So we should ensure the hm-port is deleted if no
- # more LB are using it. We need to do this call after the
- # cleaning of the ip_port_mappings for the ovn LB.
- self._clean_up_hm_port(member[constants.SUBNET_ID])
+ if ovn_lb.health_check:
+ mem_subnet = member[constants.SUBNET_ID]
+ member_list = self._members_in_subnet(ovn_lb, mem_subnet)
+ if not member_list:
+ # NOTE(garloff): if we were the last member in the subnet
+ # we should clean up the hm-port.
+ # (froyo) We need to do this call after the cleaning of the
+ # ip_port_mappings for the ovn LB.
+ self._clean_up_hm_port(member[constants.SUBNET_ID])
except Exception:
LOG.exception(ovn_const.EXCEPTION_MSG, "deletion of member")
error_deleting_member = True |
Next step is to report issue upstream, provide reproducing script and patch ... |
Created an upstream issue: https://bugs.launchpad.net/octavia/+bug/2062965 |
Merged. Prepared backports: |
Testing on gx-scs (which runs OSISM 7.0.0d aka SCS IaaS R6 RC3):
Creating an octavia loadbalancer with --provider ovn and a healthmonitor with members in several subnets results in a port in each subnet with name
ovn-lb-hm-$SUBNETID
.When deleting the members and healthmonitor again, only the port in the first subnet is cleaned up, the ports in the other subnets remain. This leaks ports and prevents the deletion of the subnets (and the networks that contain them).
The text was updated successfully, but these errors were encountered: