Fixing Exec race condition #1907

steiler · 2024-02-22T09:58:42Z

This is a hotfix for an issue a user reported, where the defined execs on container would run without the network interfaces being present already. Not a problem, as long as the execs are supposed to configure (ip link set or ip address add) these interfaces.

To understand the issue you have to know how we deploy nodes at the moment. We schedule nodes and make them go through the different stages. (create, create-links, configure, healthy, exit) not all of these stages are always applicable. healthy or exited will only be applicable and waited for if some other node did define that as a wait-for stage.
For the create-link stage, there is another special workflow. If there is no dependency on the create-links phase, then the node will try to create all its network interfaces. For veths between clab nodes, it might be that the peer container is not yet created and hence the veth interface peer namespace can not be set. Hence we opted in case of veth interface for a strategy, that the second node of a link will create and assign the link endpoints to itself and the peer node.
The first node however will in the basic case not stop and simply continue with the configure phase after which the execs where being run. This could lead to race conditions where via the exec commands interfaces where being tried to be configured that are not already there.

This PR drags the exec commands into a post-node deployment phase, where all the nodes will definitely be created and therefor all the links will be there.

Anyways to fix this properly we will have to create the links on the first nodes call to create a veth link and somehow park the peer link in the host or another parking network namespace and make the second node pull the interface in when running the create-link stage.

tests/01-smoke/stages.clab.yml

codecov · 2024-02-22T14:57:08Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (f2b4a62) 48.59% compared to head (be18fb5) 53.74%.
Report is 223 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1907      +/-   ##
==========================================
+ Coverage   48.59%   53.74%   +5.15%     
==========================================
  Files         133      155      +22     
  Lines       12789    11264    -1525     
==========================================
- Hits         6215     6054     -161     
+ Misses       5870     4351    -1519     
- Partials      704      859     +155

Files	Coverage Δ
clab/clab.go	`76.12% <ø> (-1.45%)`	⬇️
clab/dependency_manager/dependency_node.go	`95.55% <100.00%> (ø)`
nodes/k8s_kind/logger.go	`66.66% <ø> (ø)`
cmd/deploy.go	`67.36% <50.00%> (+1.63%)`	⬆️

... and 124 files with indirect coverage changes

steiler added 2 commits February 22, 2024 10:58

Fixing Exec race condition

d8e64d3

move to ubuntu for millisecond date precision

6dbe675

hellt reviewed Feb 22, 2024

View reviewed changes

tests/01-smoke/stages.clab.yml Outdated Show resolved Hide resolved

steiler added 4 commits February 22, 2024 13:06

update

49f63e4

update image

4b4cb29

update

a464ce8

Deepsource

e2fcaba

hellt marked this pull request as ready for review February 22, 2024 14:29

hellt added 3 commits February 22, 2024 16:34

add more info in the comments

942508a

include test case 19 and add some comments

1704a34

added rn

be18fb5

hellt merged commit ff671fb into main Feb 22, 2024
61 of 62 checks passed

hellt deleted the fixExecRace branch February 22, 2024 14:57

steiler mentioned this pull request Feb 23, 2024

veth creation: make first node create the veth endpoints #1914

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing Exec race condition #1907

Fixing Exec race condition #1907

steiler commented Feb 22, 2024 •

edited

Loading

codecov bot commented Feb 22, 2024

Fixing Exec race condition #1907

Fixing Exec race condition #1907

Conversation

steiler commented Feb 22, 2024 • edited Loading

codecov bot commented Feb 22, 2024

Codecov Report

steiler commented Feb 22, 2024 •

edited

Loading