Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding nic to a vm on vLan 1 causes network issues on host node #1670

Closed
nick-fox opened this issue Apr 12, 2022 · 8 comments · Fixed by #1706
Closed

Adding nic to a vm on vLan 1 causes network issues on host node #1670

nick-fox opened this issue Apr 12, 2022 · 8 comments · Fixed by #1706

Comments

@nick-fox
Copy link

nick-fox commented Apr 12, 2022

Scenario

  • Management interface is in xen-br0, with untagged traffic
  • The default vLan on the network is vLan 1

Issue

If you add a nic interface to an instance (type=bridged, vlan=1) the host network becomes inaccessible.
gnt-instance modify --net add:mode=bridged,vlan=1 server.name

I feel this should be raised as a critical issue as this will cause network issues with the node.

@rbott
Copy link
Member

rbott commented Apr 12, 2022

Hey @nick-fox,

Could you please add some more information here?

  • what OS type/version are you running Ganeti on?
  • what Ganeti version do you use?
  • do you use distribution packages or is it custom built?
  • which hypervisor do you use (kvm, xen-hvm, xen-pvm)?
  • which interfaces are attached to the bridge xen-br0? (e.g. what does brctl show xen-br0 show?)

Thanks!

@nick-fox
Copy link
Author

nick-fox commented Apr 13, 2022

Hello @rbott

Sure

  • what OS type/version are you running Ganeti on?
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.3 LTS
Release:	20.04
Codename:	focal
  • what Ganeti version do you use?
3.0.1-1~ubuntu20.04+1
  • do you use distribution packages or is it custom built?
deb http://ppa.launchpad.net/pkg-ganeti-devel/lts/ubuntu focal main
  • which hypervisor do you use (kvm, xen-hvm, xen-pvm)?
kvm
  • which interfaces are attached to the bridge xen-br0? (e.g. what does brctl show xen-br0 show?)
bridge name	bridge id		STP enabled	interfaces
xen-br0		8000.2cea7fcc0e32	no		bond0

@rbott
Copy link
Member

rbott commented Apr 13, 2022

Is the bridge configured to be vlan aware? In my case (Debian Bullseye) it is configured like this:

auto gnt-bridge
iface gnt-bridge inet manual
 bridge_ports gnt-bond
 bridge_vlan_aware yes
 bridge_stp off
 bridge_waitport 0
 bridge_fd 0

gnt-bond is a regular 802.3ad bonding interface with two members (plain untagged ethernet interfaces).

@saschalucas
Copy link
Member

@nick-fox could you also supply the output of bridge vlan show and cat /proc/net/vlan/config if it exists?

I assume you tried to use VLAN aware bridging? Because "network becomes inaccessible" is, what happens when VLAN aware parts meets legacy VLAN.

Generally VLAN 1 should be avoided. It's the default native VLAN for most, if not any, network device and also for the linux bridge. For a secure VLAN aware bridge setup, VLAN 1 should be explicitly removed: @rbott

# https://vincent.bernat.ch/en/blog/2017-linux-bridge-isolation
post-up bridge vlan del dev gnt-bridge vid 1 self

For KVM the ifup-script tries to detect the "in use" case, but simply does not take VLAN 1 into account, which is per default present but not in /proc/net/vlan/config:

if [ -r /proc/net/vlan/config ]; then
local vlan_interface="$(awk -F '[| ]*' -v vlan="^${VID}$" 'match($2, vlan) { print $1 }' /proc/net/vlan/config)"
local lower_devs="$(awk -F '[| ]*' -v vlan="^${VID}$" 'match($2, vlan) { print $3 }' /proc/net/vlan/config)"
if [ -n "${vlan_interface}" ]; then
for i in ${lower_devs}; do
# allow bridge stacking and vlan overlap for veth devices
if [ "$(ip -o link show dev ${i} type veth | wc -l)" -ne 1 ]; then
echo "VLAN ${VID} is in use by lower interface ${i}"
exit 1

@nick-fox
Copy link
Author

nick-fox commented Apr 14, 2022

@rbott, Ubuntu 20.04 provides, as an alternative to netplan, ifupdown 0.8. I configure vLan aware bridge with

post-up ip link set xen-br0 type bridge vlan_filtering 1

cat /sys/class/net/xen-br0/bridge/vlan_filtering

1

@saschalucas, I agree with & the Default VLAN for VM's is not 1.

bridge vlan show

port	vlan ids
ens1f0np0
ens1f1np1
bond0	 1 PVID Egress Untagged
bond1	 1 PVID Egress Untagged
xen-br0	 1 PVID Egress Untagged
xen-br1	 1 PVID Egress Untagged

cat /proc/net/vlan/config

VLAN Dev name	 | VLAN ID
Name-Type: VLAN_NAME_TYPE_RAW_PLUS_VID_NO_PAD
eno1.1034      | 1034  | eno1

"For a secure VLAN aware bridge setup, VLAN 1 should be removed". Thanks for the heads up. I'll implement this, test breaking my platform & report back.

Maybe throw an error in ganeti/tools/net-common if VID == 1

Nick

@rbott
Copy link
Member

rbott commented Sep 2, 2022

Hi @nick-fox have your tests worked out with a different vlan ID? If so, I think we should update the docs to cover this specific case.

Maybe we should also have the CLI warn (refuse?) when someone tries to set vlan ID 1 for an instance?

@nick-fox
Copy link
Author

Hi @rbott.

My default VLAN ID for untagged traffic is 1. I suspect this is the same for many other users. The issue occurs when I set net (type=bridged, vlan=1). VLAN of the VM's interface is the same as untagged traffic on my network.

What happens if the default VLAN for untagged traffic on a network is 255? If I set net (type=bridged, vlan=255) I would expect the host network becomes inaccessible.

Ganeti is unaware of the default VLAN for untagged traffic on a network, and the proposed fix will not work in cases where the default VLAN for untagged is >1.

@rbott do you have any thoughts on this?

Nick

@rbott
Copy link
Member

rbott commented May 25, 2023

Hi @nick-fox,

sorry for being quiet for so long :-) My general advice from a network point of view would be "never use Vlan 1/Default Vlan for anything, neither explicitly nor implicitly". A switchport should only use tagged vlans ore one untagged access vlan. Combinations of both almost always cause headaches sooner or later :-)

Of course you are right, this can't be detected/prevented by Ganeti. I will nevertheless add a warning to the manpage.

rbott added a commit to rbott/ganeti that referenced this issue May 25, 2023
As discussed in ganeti#1670 this commit adds a warning to not set the PVID
(port vlan ID) as an instance VLAN ID. Network equipment might drop
frames which explicitly have the VLAN ID set which is configured as the
default VLAN ID/port VLAN ID on the network side.

Signed-off-by: Rudolph Bott <r@spam.wtf>
rbott added a commit that referenced this issue May 28, 2023
As discussed in #1670 this commit adds a warning to not set the PVID
(port vlan ID) as an instance VLAN ID. Network equipment might drop
frames which explicitly have the VLAN ID set which is configured as the
default VLAN ID/port VLAN ID on the network side.

Signed-off-by: Rudolph Bott <r@spam.wtf>
rbott added a commit to rbott/ganeti that referenced this issue Jun 20, 2023
As discussed in ganeti#1670 this commit adds a warning to not set the PVID
(port vlan ID) as an instance VLAN ID. Network equipment might drop
frames which explicitly have the VLAN ID set which is configured as the
default VLAN ID/port VLAN ID on the network side.

Signed-off-by: Rudolph Bott <r@spam.wtf>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants