Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] salt.util.cloud connections hang indefinitely. #60216

Closed
dwoz opened this issue May 19, 2021 · 0 comments
Closed

[BUG] salt.util.cloud connections hang indefinitely. #60216

dwoz opened this issue May 19, 2021 · 0 comments
Assignees
Labels
Bug broken, incorrect, or confusing behavior bugfix-bckport will be be back-ported to an older release branch by creating a PR against that branch Salt-Cloud severity-high 2nd top severity, seen by most users, causes major problems

Comments

@dwoz
Copy link
Contributor

dwoz commented May 19, 2021

Description

If a host or network goes away during a cloud deployment ssh connections to the box can be left in a hung state indefinitely.

Steps to Reproduce the behavior

Run 50+ deployments using the saltify cloud provider when some of the hosts might go away before the deployment finishes.

Expected behavior

We should add ServerAliveInterval and ServerAliveCountMax options to all connections in salt.utils.cloud. Add these options anywhere we are setting the StrictHostKeyChecking option.

Values of these options can be something like ServerAliveInterval=10 and ServerAliveCountMax=3 which will detect network failures and timeout after 30 seconds.

We can make these values configurable but we should at least have some sane defaults for them. Making them configurable is not necessary to close this issue.

Screenshots
If applicable, add screenshots to help explain your problem.

Versions Report

Observed on 3002.5

This issue came from debugging #59903

@dwoz dwoz added Bug broken, incorrect, or confusing behavior needs-triage Silicon v3004.0 Release code name severity-high 2nd top severity, seen by most users, causes major problems and removed needs-triage labels May 19, 2021
@sagetherage sagetherage added this to the Silicon milestone May 19, 2021
@sagetherage sagetherage added the bugfix-bckport will be be back-ported to an older release branch by creating a PR against that branch label Jun 25, 2021
@sagetherage sagetherage removed the Silicon v3004.0 Release code name label Jun 25, 2021
@cmcmarrow cmcmarrow mentioned this issue Jul 1, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug broken, incorrect, or confusing behavior bugfix-bckport will be be back-ported to an older release branch by creating a PR against that branch Salt-Cloud severity-high 2nd top severity, seen by most users, causes major problems
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants