Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent "context deadline exceeded" when no deadline set #138

Closed
parkan opened this issue Oct 24, 2016 · 7 comments
Closed

Intermittent "context deadline exceeded" when no deadline set #138

parkan opened this issue Oct 24, 2016 · 7 comments

Comments

@parkan
Copy link

parkan commented Oct 24, 2016

We are seeing intermittent dial attempt failed: context deadline exceeded with no deadline set. This is with local/nearby nodes and no suspicious network weather.

The error happens here: https://github.com/mediachain/concat/blob/3a07b2cb8d7104b570fd5bcccb5c8f48f429cc38/mcnode/proto.go#L359

The context is a plain Context.

@vyzo can provide more details -- happens every few restarts. Any clue where to look?

@whyrusleeping
Copy link
Contributor

@parkan roughly how long are the timeouts? Having an approximate number would help narrow this down significantly

@vyzo
Copy link
Contributor

vyzo commented Oct 25, 2016

The timeouts appear to be quite short in the occasions I observed -- maybe a couple seconds?

A little more detail about the specifics in what I have observed:

  • All our connections use the context we created for the swarm, which is a background context with a cancellation, which we use when we want to shutdown networking. There is no timeout attached.
  • I have noticed that when I hit the problem, closing the Host and creating a new instance also fails to connect. But if I let it retry (after 5 min -- it's a heartbeat for directory registrations in this particular case), it successfully connects. So the condition does not appear to be terminal.

@vyzo
Copy link
Contributor

vyzo commented Oct 27, 2016

Another data point, this time with an actual timeout measurement.
I am seeing repeated connection problems with dial attempt failed: context deadline exceeded, so I did a time measurement and it comes to around 10sec:

$ time mcclient id QmeiY2eHMwK92Zt6X4kUUC3MsjMmVb2VnGZ17DhnhRPCEQ
Not Found
Error: dial attempt failed: context deadline exceeded


real    0m10.498s
user    0m0.320s
sys 0m0.018s

All this is happening with the nodes involved (icmp) pinging happily and no unusual network conditions.

@vyzo
Copy link
Contributor

vyzo commented Oct 27, 2016

Also verified that I can connect to the node with telnet and get the go-multistream header back.

@vyzo
Copy link
Contributor

vyzo commented Oct 27, 2016

I also restarted the node I am trying to connect, and I still can't connect.

@vyzo
Copy link
Contributor

vyzo commented Oct 27, 2016

Digging further, seems there is a legitimate networking problem with the node in this case (ssh yes, but no pings or telnet to the multistream port from my network location).

@vyzo
Copy link
Contributor

vyzo commented Oct 27, 2016

I think we can ascribe my problems today at legitimate networking issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants