Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a hotfix for an issue a user reported, where the defined execs on container would run without the network interfaces being present already. Not a problem, as long as the execs are supposed to configure (
ip link set
orip address add
) these interfaces.To understand the issue you have to know how we deploy nodes at the moment. We schedule nodes and make them go through the different stages. (create, create-links, configure, healthy, exit) not all of these stages are always applicable. healthy or exited will only be applicable and waited for if some other node did define that as a wait-for stage.
For the create-link stage, there is another special workflow. If there is no dependency on the create-links phase, then the node will try to create all its network interfaces. For veths between clab nodes, it might be that the peer container is not yet created and hence the veth interface peer namespace can not be set. Hence we opted in case of veth interface for a strategy, that the second node of a link will create and assign the link endpoints to itself and the peer node.
The first node however will in the basic case not stop and simply continue with the configure phase after which the execs where being run. This could lead to race conditions where via the exec commands interfaces where being tried to be configured that are not already there.
This PR drags the exec commands into a post-node deployment phase, where all the nodes will definitely be created and therefor all the links will be there.
Anyways to fix this properly we will have to create the links on the first nodes call to create a veth link and somehow park the peer link in the host or another parking network namespace and make the second node pull the interface in when running the create-link stage.