-
Notifications
You must be signed in to change notification settings - Fork 311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve errors if the ssh server is not accessible #1160
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1160 +/- ##
==========================================
- Coverage 34.65% 34.59% -0.07%
==========================================
Files 65 65
Lines 5399 5429 +30
==========================================
+ Hits 1871 1878 +7
- Misses 3324 3347 +23
Partials 204 204
Continue to review full report at Codecov.
|
if err == errors.ErrSSHConnectError { | ||
return up.checkOktetoStartError(ctx, "Failed to connect to your development container") | ||
} | ||
return fmt.Errorf("couldn't connect to your development container: %s", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the output of err.Error
useful? we can log just the text and throw the full error in a log.info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it useful. It is how it was before and people usually report this error without needing to send a doctor file
} | ||
} | ||
} else { | ||
if pods.OktetoDevPodMustBeRecreated(ctx, up.Dev, up.Client) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's odd that the function called checkOktetoStartError
changes the system. can we split the destruction into a separate stage? (or return a speficic error, and then the mail loop does the recreation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -54,6 +56,33 @@ func startPool(ctx context.Context, serverAddr string, config *ssh.ClientConfig) | |||
return p, nil | |||
} | |||
|
|||
func retryNewClientConn(ctx context.Context, c net.Conn, addr string, conf *ssh.ClientConfig) (ssh.Conn, <-chan ssh.NewChannel, <-chan *ssh.Request, error) { | |||
ticker := time.NewTicker(300 * time.Millisecond) | |||
to := config.GetTimeout() / 6 // 5 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just keep it for the full 30s ? it's the worst case scenario right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better to fail fast. the ssh
server starts very fast, but I have found the first connection to fail a few times
case <-ticker.C: | ||
continue | ||
case <-ctx.Done(): | ||
log.Infof("ssh.retryNewClientConn cancelled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done returns an err (or you can get it from the ctx), I find it's useful to log it to know the reason of the cancellation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Signed-off-by: Pablo Chico de Guzman <pchico83@gmail.com>
Signed-off-by: Pablo Chico de Guzman <pchico83@gmail.com>
Signed-off-by: Pablo Chico de Guzman pchico83@gmail.com
Proposed changes