Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orphaned agents stuck in reconnecting state #2335

Closed
alena1108 opened this issue Oct 13, 2015 · 6 comments
Closed

Orphaned agents stuck in reconnecting state #2335

alena1108 opened this issue Oct 13, 2015 · 6 comments
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release kind/question Issues that just require an answer. No code change needd

Comments

@alena1108
Copy link

Seen on user's setup. There are 3 agents stuck in Reconnecting state; and those agents don't have corresponding hosts.

https://gist.github.com/tobowers/366d3822c793555c1a3b

@cjellick @ibuildthecloud any idea of what might have caused this issue? Can it be that during initial agent registration, agent was created, but host wasn't, and agent wasn't cleaned up afterwards?

@alena1108 alena1108 added the kind/bug Issues that are defects reported by users or that we know have reached a real release label Oct 13, 2015
@cusspvz
Copy link

cusspvz commented Oct 16, 2015

@alena1108 could this somehow be related with #2196 ?

@alena1108
Copy link
Author

@cusspvz looks like #2196 is a bit different; host entry was created for the agent (As you see Host being stuck in Reconnecting state in the UI). In this particular bug, the agent connect process was stuck (or aborted) before host entry was populated, so we ended up having orphaned agent entry.

@cusspvz
Copy link

cusspvz commented Oct 17, 2015

Don't know if @fernandoneto has reported already, but it seems that on your latest version, docker-machine (Others on UI) based host creation isn't fully working, it stucks on waiting for agent. Feels like it is related with this.

Also thanks for the explaination @alena1108 .

@rokka-n
Copy link

rokka-n commented Dec 17, 2015

I just installed vagrant, virtualbox and checkout rancher repo. Everything on vanilla macbook.

vargrant up (with only change $number_of_nodes = 3) and after few minutes of running ok, hosts in UI go in "reconnecting" state.

Restarting VMs makes it connected again, but after few mins they fail again.

@cusspvz
Copy link

cusspvz commented Dec 17, 2015

@rokka-n, @alena1108 has described, this issue covers orphan agents, I supposed they get stuck indefinitely here:

@alena1108
In this particular bug, the agent connect process was stuck (or aborted) before host entry was populated, so we ended up having orphaned agent entry.

@rokka-n
Restarting VMs makes it connected again, but after few mins they fail again.

@rokka-n seems that you're describing the same behavior we noticed on #2196 .

You are more likely experiencing the same as we are, agent scripts are getting stuck and don't ping back or they simply time out.

Whats the docker version you have on hosts and server?
Whats the rancher version as well?
Have you deployed any stack/services/containers so far?
If yes, could you please try to describe reproducible steps from scratch to the particular issue on #2196 ?

Thanks!!1

@deniseschannon
Copy link

As of late December, we've introduced new networking in Rancher. If these issues are still being seen on the latest releases with the new networking changes, please re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues that are defects reported by users or that we know have reached a real release kind/question Issues that just require an answer. No code change needd
Projects
None yet
Development

No branches or pull requests

5 participants