Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DHCP isn't available in our bare-metal data centers #4

Closed
jeffaco opened this issue May 8, 2018 · 11 comments
Closed

DHCP isn't available in our bare-metal data centers #4

jeffaco opened this issue May 8, 2018 · 11 comments

Comments

@jeffaco
Copy link
Collaborator

jeffaco commented May 8, 2018

It turns out that DHCP servers are not available in our bare-metal data centers, and the latest YAML specification assumed that DHCP was available.

I suggest updating the YAML specification as follows:

REMOVE

nics: 2

ADD

  networking:
    -
      interface: eth0
      description: Client vlan-10
      ip: 10.260.10.51
      gateway: 10.250.10.1
      subnet mask: 255.255.255.0
    -
      interface: eth1
      description: NFS vlan-51
      ip: 10.260.51.51
      subnet mask: 255.255.255.0
    -
      interface: eth2
      description: B2B vlan-52
      ip: 10.260.52.51
      subnet mask: 255.255.255.0

In this particular case, you can infer that there are three NICs. Description is for Microsoft purposes only, and can be ignored by azure-li-services. A few things to note:

  1. If no interface is defined, can you just pick any available interface (not otherwise specified elsewhere in the YAML specification)?
  2. The gateway should be optional, and shouldn't be mandatory (depending on how that interface is used).
  3. We might want to leave the option open for DHCP in the future. Perhaps add something like: dhcp: true or dhcp: false. If DHCP is false, then it should follow with IP/subnet mask[/gateway]. This is purely optional.

The latest complete YAML specification, with this proposed change, can be found here.

@jeffaco jeffaco changed the title DHCP isn't available in our data centers DHCP isn't available in our bare-metal data centers May 8, 2018
@rjschwei
Copy link
Contributor

rjschwei commented May 8, 2018

This proposal will not work in that we will probably not assign the address to the expected interface, unless on the backend that doesn't matter.

Example:
When the config file specifies eth0 and based on the physical connection of the card this interface has access only to a network 10.260.10.0, to stick with the example, there is no guarantee that this network device is actually assigned the "eth0" name on the system. Names are assigned based on the order of discovery, or by using the predictable [1] names.

If we want to, and I suspect we need to, guarantee that the "right" device gets the "right" IP address as provided in the config file then we have to use predictable device names. Which is not a problem as we know the machine configuration w.r.t. network devices and where they show up in the device tree ahead of time.

This implies that the config would specify something like this:

interface: enp0s20u2

[1] https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/

@jeffaco
Copy link
Collaborator Author

jeffaco commented May 9, 2018

It appears that the interfaces that are used don't matter. Even though six NICs are exposed to the O/S, I believe there are only two networking cables to a blade chassis.

Here's the message I got back from our operations team:


We have total six Network Interfaces those Interfaces are virtual NIC’s Created in Cisco UCS.

Assigning IP to Any Interfaces will work.

We are following below Standards:

  1. Eth0 is for Client Network (In your Case use this Interface to configure IP, Gateway, and Subnet)
  2. Eth1 is for nfs traffic
  3. Eth2 is for blade to blade traffic
  4. Eth3 is for iSCSI traffic.

For now eth4 and eth5 are not using.

@rjschwei
Copy link
Contributor

rjschwei commented May 9, 2018

OK, that would imply that we cannot use persistent names but need to use predictable names. If I interpret the response correctly we can end up in a situation where "Eth1" is discovered first and thus will get "eth0" assigned by the kernel. Then based on the proposed config we would set that device up for client network traffic. This will obviously not work.

@jeffaco
Copy link
Collaborator Author

jeffaco commented May 9, 2018

Operations specifically said: "Assigning IP to any interfaces will work". This implies that, due to underlying wiring, traffic can be on any interface.

I'll validate this once I get the network up manually. I can verify proper connectivity, shift to a different ethernet interface, then verify if I still have connectivity. That will answer the question.

I'm having issues getting the network up manually, however, based on guidelines I was given. Once that is resolved, I'll verify this and then we'll know for sure, one way or the other.

@rjschwei
Copy link
Contributor

rjschwei commented May 9, 2018

Sorry, but one statement contradicts the other. Stating that

"Assigning IP to any interfaces will work"

and then providing information about which interface is used for what type of traffic just doesn't fit together.

@jeffaco
Copy link
Collaborator Author

jeffaco commented May 11, 2018

Okay, I have details on this.

  1. Once the VLAN is configured in hardware, the setup on the system is somewhat different from what you described.
  2. The ethernet interface truly does not matter. I had a system with ethernet connectivity on eth0. After reconfiguration to eth5, the system worked just fine.

I needed to modify the YAML file slightly for this, as we needed a VLAN number. The new YAML now contains this segment:

interface: eth0
vlan: 10
ip: 10.250.10.51
gateway: 10.250.10.1
subnet mask: 255.255.255.0

There are three files of interest that must be set up given that segment:

  • ifcfg-eth0
BOOTPROTO=static
BROADCAST=10.250.10.255
NETMASK=255.255.255.0
STARTMODE=auto
  • ifcfg-eth0.<vlan> (in my example, ifcfg-eth0.10)
BOOTPROTO=static
DEVICE=eth0.10
ETHERDEVICE=eth0
IPADDR=10.250.10.51
NETMASK=255.255.255.0
ONBOOT=yes
STARTMODE=auto
VLAN=yes
VLAN_ID=10

IF a gateway is defined in the YAML, then you need:

  • ifroute-eth0.10
default 10.250.10.1 - eth0.10

Once these files were set up on the destination system, I had proper network connectivity.

Note that the VLANs appear both in filenames and the files themselves. For example, in ifcfg-eth0.10, I have the following lines that contain VLAN information:

DEVICE=eth0.10
VLAN_ID=10

Let me know if you have any questions on this, thanks!

@rjschwei
Copy link
Contributor

Thanks for the data, based on this we can indeed stick to persistent names in the config.

schaefi added a commit that referenced this issue May 15, 2018
Work through the networking section of the Azure Li/Vli config
file and setup the network configuration for this instance
type. This Fixes #4

!!! WIP don't merge !!!
@schaefi
Copy link
Collaborator

schaefi commented May 15, 2018

@jeffaco thanks for the details, please note I will use 'subnet_mask' as the key not 'subnet mask' with a space character in the key. That space thing is a source for trouble which we could easily avoid :)

Other than that I started the implementation as you can see in the reference. Will finish this by tomorrow

@jeffaco
Copy link
Collaborator Author

jeffaco commented May 15, 2018

No problem, I've updated the reference YAML to reflect this change. Thanks.

It looks like you expect a broadcast: setting in the YAML based in your WIP changes. Is this true, or is this a mistake? I can add it if necessary, please let me know. I think I asked about that at one point, but I don't see it in this issue. It might have been in E-Mail. I don't believe I heard back about that, please let me know.

@schaefi
Copy link
Collaborator

schaefi commented May 15, 2018

the broadcast address gets calculated from the given ip and subnet_mask. There is no need for further information. I set the BROADCAST because your example ifcfg-eth0 contains it. Thus I thought having it explicitly set is important in your environment for some reason

@schaefi
Copy link
Collaborator

schaefi commented May 15, 2018

python has a very nice module called 'ipaddress' which does the broadcast calculation if you give it the ip and the subnet_mask

schaefi added a commit that referenced this issue May 15, 2018
Work through the networking section of the Azure Li/Vli config
file and setup the network configuration for this instance
type. This Fixes #4
schaefi added a commit that referenced this issue May 15, 2018
Work through the networking section of the Azure Li/Vli config
file and setup the network configuration for this instance
type. This Fixes #4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants