Bootstrap fails if trying to get the Salt Master container while two exist #2434

gdemonet · 2020-04-20T12:53:13Z

Component: salt, scripts

What happened:

During execution of bootstrap.sh (observed at least on 2.6):

> Syncing Utility modules on Salt master...
time="2020-04-20T07:14:03-04:00" level=fatal msg="execing command in container failed: rpc error: code = Unknown desc = failed to find container \"112e206b7a22bbfd7eb49ccad329db2558622326a7961e28f91d9530131b6175\\n173aa24cfd67ce8ab440becf5f8fe753b027f02c48230a3888699e761bed3227\" in store: does not exist

What was expected: The bootstrap script shouldn't fail for this kind of issue

Steps to reproduce: It's a timing issue, and only rarely happens.

Resolution proposal (optional):

In the get_salt_container function, defined in scripts/common.sh, wait for the crictl ps query to only return a single container (if two are Running, we can't know for sure which one we should use, so better wait than try to guess).

The text was updated successfully, but these errors were encountered:

Sometimes, if kubelet restarted the `salt-master` static Pod after an operation, two containers matching the usual selector will co-exist for a small time window. If we use the `scripts/common.sh:get_salt_container` function at that point in time, we may return a string with two container IDs instead of just one, and subsequent commands will fail. Instead, we now wait for a single container to exist (and also add a sleep time between two attemps, which we didn't before). Fixes: #2434

gdemonet added kind:bug Something isn't working topic:deployment Bugs in or enhancements to deployment stages topic:flakiness Some test are flaky and cause CI to do transient failing complexity:easy Something that requires less than a day to fix labels Apr 20, 2020

gdemonet added this to the MetalK8s 2.5.1 milestone Apr 20, 2020

gdemonet added this to To Do in Flakiness Investigations via automation Apr 20, 2020

gdemonet mentioned this issue Apr 20, 2020

scripts: Wait for a single Salt Master container #2435

Merged

bert-e closed this as completed in 17183dd Apr 21, 2020

Flakiness Investigations automation moved this from To Do to Done Apr 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrap fails if trying to get the Salt Master container while two exist #2434

Bootstrap fails if trying to get the Salt Master container while two exist #2434

gdemonet commented Apr 20, 2020

Bootstrap fails if trying to get the Salt Master container while two exist #2434

Bootstrap fails if trying to get the Salt Master container while two exist #2434

Comments

gdemonet commented Apr 20, 2020