Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve errors if the ssh server is not accessible #1160

Merged
merged 2 commits into from
Nov 12, 2020
Merged

Conversation

pchico83
Copy link
Contributor

Signed-off-by: Pablo Chico de Guzman pchico83@gmail.com

Proposed changes

  • Added a new phase to check container connectivity
  • Improve error when ssh server is not accessible
  • Retry for 5s the first connection to the ssh server

Screenshot 2020-11-11 at 22 25 57

@codecov
Copy link

codecov bot commented Nov 11, 2020

Codecov Report

Merging #1160 (afd2826) into master (8bc4ea1) will decrease coverage by 0.06%.
The diff coverage is 14.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1160      +/-   ##
==========================================
- Coverage   34.65%   34.59%   -0.07%     
==========================================
  Files          65       65              
  Lines        5399     5429      +30     
==========================================
+ Hits         1871     1878       +7     
- Misses       3324     3347      +23     
  Partials      204      204              
Impacted Files Coverage Δ
cmd/up/up.go 3.39% <0.00%> (-0.09%) ⬇️
pkg/ssh/pool.go 50.70% <42.10%> (-3.00%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8bc4ea1...afd2826. Read the comment docs.

if err == errors.ErrSSHConnectError {
return up.checkOktetoStartError(ctx, "Failed to connect to your development container")
}
return fmt.Errorf("couldn't connect to your development container: %s", err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the output of err.Error useful? we can log just the text and throw the full error in a log.info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it useful. It is how it was before and people usually report this error without needing to send a doctor file

}
}
} else {
if pods.OktetoDevPodMustBeRecreated(ctx, up.Dev, up.Client) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's odd that the function called checkOktetoStartError changes the system. can we split the destruction into a separate stage? (or return a speficic error, and then the mail loop does the recreation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -54,6 +56,33 @@ func startPool(ctx context.Context, serverAddr string, config *ssh.ClientConfig)
return p, nil
}

func retryNewClientConn(ctx context.Context, c net.Conn, addr string, conf *ssh.ClientConfig) (ssh.Conn, <-chan ssh.NewChannel, <-chan *ssh.Request, error) {
ticker := time.NewTicker(300 * time.Millisecond)
to := config.GetTimeout() / 6 // 5 seconds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just keep it for the full 30s ? it's the worst case scenario right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better to fail fast. the ssh server starts very fast, but I have found the first connection to fail a few times

case <-ticker.C:
continue
case <-ctx.Done():
log.Infof("ssh.retryNewClientConn cancelled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done returns an err (or you can get it from the ctx), I find it's useful to log it to know the reason of the cancellation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Signed-off-by: Pablo Chico de Guzman <pchico83@gmail.com>
Signed-off-by: Pablo Chico de Guzman <pchico83@gmail.com>
@pchico83 pchico83 merged commit 5965d0a into master Nov 12, 2020
@pchico83 pchico83 deleted the init-error branch November 12, 2020 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants