Improve errors if the ssh server is not accessible #1160

pchico83 · 2020-11-11T21:27:02Z

Signed-off-by: Pablo Chico de Guzman pchico83@gmail.com

Proposed changes

Added a new phase to check container connectivity
Improve error when ssh server is not accessible
Retry for 5s the first connection to the ssh server

codecov · 2020-11-11T21:30:00Z

Codecov Report

Merging #1160 (afd2826) into master (8bc4ea1) will decrease coverage by 0.06%.
The diff coverage is 14.54%.

@@            Coverage Diff             @@
##           master    #1160      +/-   ##
==========================================
- Coverage   34.65%   34.59%   -0.07%     
==========================================
  Files          65       65              
  Lines        5399     5429      +30     
==========================================
+ Hits         1871     1878       +7     
- Misses       3324     3347      +23     
  Partials      204      204

Impacted Files	Coverage Δ
cmd/up/up.go	`3.39% <0.00%> (-0.09%)`	⬇️
pkg/ssh/pool.go	`50.70% <42.10%> (-3.00%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8bc4ea1...afd2826. Read the comment docs.

rberrelleza · 2020-11-11T21:30:01Z

cmd/up/up.go

+		if err == errors.ErrSSHConnectError {
+			return up.checkOktetoStartError(ctx, "Failed to connect to your development container")
+		}
+		return fmt.Errorf("couldn't connect to your development container: %s", err.Error())


is the output of err.Error useful? we can log just the text and throw the full error in a log.info

I find it useful. It is how it was before and people usually report this error without needing to send a doctor file

rberrelleza · 2020-11-11T21:31:14Z

cmd/up/up.go

+			}
+		}
+	} else {
+		if pods.OktetoDevPodMustBeRecreated(ctx, up.Dev, up.Client) {


it's odd that the function called checkOktetoStartError changes the system. can we split the destruction into a separate stage? (or return a speficic error, and then the mail loop does the recreation?

rberrelleza · 2020-11-11T21:32:15Z

pkg/ssh/pool.go

@@ -54,6 +56,33 @@ func startPool(ctx context.Context, serverAddr string, config *ssh.ClientConfig)
 	return p, nil
 }

+func retryNewClientConn(ctx context.Context, c net.Conn, addr string, conf *ssh.ClientConfig) (ssh.Conn, <-chan ssh.NewChannel, <-chan *ssh.Request, error) {
+	ticker := time.NewTicker(300 * time.Millisecond)
+	to := config.GetTimeout() / 6 // 5 seconds


could we just keep it for the full 30s ? it's the worst case scenario right?

it is better to fail fast. the ssh server starts very fast, but I have found the first connection to fail a few times

rberrelleza · 2020-11-11T21:45:12Z

pkg/ssh/pool.go

+		case <-ticker.C:
+			continue
+		case <-ctx.Done():
+			log.Infof("ssh.retryNewClientConn cancelled")


done returns an err (or you can get it from the ctx), I find it's useful to log it to know the reason of the cancellation.

Signed-off-by: Pablo Chico de Guzman <pchico83@gmail.com>

pchico83 requested review from rberrelleza and rlamana as code owners November 11, 2020 21:27

rberrelleza reviewed Nov 11, 2020

View reviewed changes

rberrelleza approved these changes Nov 11, 2020

View reviewed changes

pchico83 added 2 commits November 12, 2020 07:19

Improve errors if the ssh server is not accessible

24d8583

Signed-off-by: Pablo Chico de Guzman <pchico83@gmail.com>

Address review comments

afd2826

Signed-off-by: Pablo Chico de Guzman <pchico83@gmail.com>

pchico83 force-pushed the init-error branch from 13a1193 to afd2826 Compare November 12, 2020 11:56

pchico83 merged commit 5965d0a into master Nov 12, 2020

pchico83 deleted the init-error branch November 12, 2020 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve errors if the ssh server is not accessible #1160

Improve errors if the ssh server is not accessible #1160

pchico83 commented Nov 11, 2020

codecov bot commented Nov 11, 2020 •

edited

rberrelleza Nov 11, 2020

pchico83 Nov 12, 2020

rberrelleza Nov 11, 2020

pchico83 Nov 12, 2020

rberrelleza Nov 11, 2020

pchico83 Nov 12, 2020

rberrelleza Nov 11, 2020

pchico83 Nov 12, 2020

Improve errors if the ssh server is not accessible #1160

Improve errors if the ssh server is not accessible #1160

Conversation

pchico83 commented Nov 11, 2020

Proposed changes

codecov bot commented Nov 11, 2020 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Nov 11, 2020 •

edited