Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Wrap connections in an N minute timeout to ensure they get reaped correctly #1725

Merged
merged 6 commits into from Dec 29, 2016

Conversation

Projects
None yet
4 participants
Owner

erikjohnston commented Dec 29, 2016

No description provided.

erikjohnston added some commits Dec 16, 2016

synapse/http/endpoint.py
+
+ # Time this connection out if we haven't send a request in the last
+ # N minutes
+ reactor.callLater(3 * 60, self._time_things_out_maybe)
@ara4n

ara4n Dec 29, 2016

Owner

why is it 3 mins here, but 2 mins in the _time_things_out_maybe above?

synapse/http/endpoint.py
+ return _WrappingEndointFac(transport_endpoint(reactor, domain, port, **endpoint_kw_args))
+
+
+class _WrappingEndointFac(object):
@ara4n

ara4n Dec 29, 2016

Owner

s/Endoint/Endpoint/ probably ;)

synapse/http/matrixfederationclient.py
@@ -61,6 +61,11 @@
MAX_SHORT_RETRIES = 3
+def test(conn):
@ara4n

ara4n Dec 29, 2016

Owner

hm, is this meant to be here?

synapse/http/endpoint.py
+ reactor.callLater(3 * 60, self._time_things_out_maybe)
+ return res
+
+ d.addCallback(update_request_time)
@ara4n

ara4n Dec 29, 2016

Owner

why do we do this by both a callback and a reactor.callLater()?

@erikjohnston

erikjohnston Dec 29, 2016

Owner

We schedule a timeout in 3 minutes both before we send the request, and after we received the response.

synapse/http/endpoint.py
+
+
+class _WrappedConnection(object):
+ """Wraps a connection and calls abort on it if it hasn't seen any actio
@uhoreg

uhoreg Dec 29, 2016

missing "n" in "action". Also, the comment says 5 minutes, but the code says 3 minutes?

Manually abort the underlying TLS connection.
The abort() method calls loseConnection() which tries to shutdown the
TLS connection cleanly. We now call abortConnection() directly which
should promptly close both the TLS connection and the underlying TCP
connection.

I also added some TODO markers to consider cancelling the old previous
timeout rather than checking time.time(). But given how urgently we want
to get this code released I'd rather leave the existing code with the
duplicate timeouts and the time.time() check.

@NegativeMjark NegativeMjark changed the base branch from develop to release-v0.18.6 Dec 29, 2016

@NegativeMjark NegativeMjark merged commit 828c585 into release-v0.18.6 Dec 29, 2016

5 of 7 checks passed

Sytest Dendron (Merged PR) Build finished.
Details
Sytest Dendron (Commit) Build #1276 origin/erikj/timeout_conn in progress...
Details
Sytest Postgres (Commit) Build #2119 origin/erikj/timeout_conn succeeded in 9 min 25 sec
Details
Sytest Postgres (Merged PR) Build finished.
Details
Sytest SQLite (Commit) Build #2161 origin/erikj/timeout_conn succeeded in 8 min 22 sec
Details
Sytest SQLite (Merged PR) Build finished.
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@erikjohnston erikjohnston deleted the erikj/timeout_conn branch Mar 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment