Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devspace Disconnections #639

Closed
dqfan2012 opened this issue Aug 15, 2019 · 7 comments
Closed

Devspace Disconnections #639

dqfan2012 opened this issue Aug 15, 2019 · 7 comments
Assignees
Labels
area/sync Issues related to the real-time code synchronization kind/enhancement Enhancement of an existing feature / improvement

Comments

@dqfan2012
Copy link

dqfan2012 commented Aug 15, 2019

What happened?

I get disconnected a few times per day with the following error:

www-data@iaapps-7fb5bdbc5c-8pmjx:/var/www/html$ [fatal]  Fatal sync error: upstream: apply changes: apply removes: after deletes: rpc error: code = Unavailable desc = transport is closing. For more information check .devspace/logs/sync.log

What did you expect to happen instead?

I expect to stay connected.

How can we reproduce the bug? (as minimally and precisely as possible)

I disconnect from devspace normally if I allow cmd to idle too long. I think the error occurs when I do work that triggers a sync after an idle disconnection.

Local Environment:

  • Operating System: windows
  • Deployment method: helm

Kubernetes Cluster:

  • Cloud Provider: baremetal cluster
  • Kubernetes Version: 1.13.5

Anything else we need to know?

I can include the error log and sync log if you need them.

/kind bug

@KaelBaldwin
Copy link

FYI, dqfan2012 is one of our developers who is using devspace on our cluster and running into this issue. I can answer any questions about the cluster set up.

My theory is the connection is being closed if there is no communication passing through for a while and then once devspace tries to send something through the closed connection, it throws an error.

[fatal] Fatal sync error: upstream: apply changes: apply removes: after deletes: rpc error: code = Unavailable desc = transport is closing. For more information check .devspace/logs/sync.log

I would suggest detecting the error when it occurs and attempting to reconnect. If the reconnection is successful, then continuing from there if possible.

If the reconnect fails, that indicates a connectivity problem larger than just a tunnel timeout and then it would be appropriate to fail I think.

@FabianKramm
Copy link
Collaborator

FabianKramm commented Aug 15, 2019

@dqfan2012 @KaelBaldwin thanks for reporting this issue! Yes this sounds like kubernetes is closing the idle connections which leads to the described errors.

This is a tricky issue because of the following things:

  • for the sync itself it will be hard to recover from such an error in the middle of a sync action
  • the terminal and portforwarding will be closing as well after a while (as described in 'devspace enter' session is interrupted #511)
  • there are many different error messages and types that could indicate a connection is suddenly lost or disconnected

I think the best approach to tackle this issue would be to restart the sync, terminal and portforwarding completely, if any of these encounter a closed connection

@FabianKramm FabianKramm added area/sync Issues related to the real-time code synchronization kind/enhancement Enhancement of an existing feature / improvement labels Aug 15, 2019
@KaelBaldwin
Copy link

That makes sense, thanks @FabianKramm !

FabianKramm added a commit that referenced this issue Aug 19, 2019
@FabianKramm
Copy link
Collaborator

FabianKramm commented Aug 19, 2019

@KaelBaldwin @dqfan2012 quick update on this issue. I investigated a little bit and found out that the timeout when kubernetes kills idle connections can be configured in the kubelet via the '--streaming-connection-idle-timeout' flag (see kubernetes docs). I could easily reproduce this issue in minikube by setting this flag to 30s.

Furthermore, I found a way in kubernetes how the underlying connection to the api server can be pinged periodically and hence avoid idle connection closing (because essentially they will never be idle with the constant pinging). I will implement this for the sync, terminal and port-forwarding and this should solve this issue without the need of a complete devspace restart

@FabianKramm
Copy link
Collaborator

@dqfan2012 @KaelBaldwin was this issue solved with v3.5.17?

@KaelBaldwin
Copy link

@dqfan2012 started having a new issue after upgrading to v3.5.17 and had to revert. I'm going to be trying it out and see if I have the same problem.

But I think it would belong in a new issue and this one can be closed.

@KaelBaldwin
Copy link

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sync Issues related to the real-time code synchronization kind/enhancement Enhancement of an existing feature / improvement
Projects
None yet
Development

No branches or pull requests

3 participants