Wrap connections in an N minute timeout to ensure they get reaped correctly #1725

Merged
merged 6 commits into from Dec 29, 2016

Projects

None yet

4 participants

@erikjohnston
Member

No description provided.

erikjohnston added some commits Dec 16, 2016
@erikjohnston erikjohnston Merge branch 'release-v0.18.5' of github.com:matrix-org/synapse
f5a4001
@erikjohnston erikjohnston Wrap connections in an N minute timeout to ensure they get reaped cor…
…rectly
5b6672c
synapse/http/endpoint.py
+
+ # Time this connection out if we haven't send a request in the last
+ # N minutes
+ reactor.callLater(3 * 60, self._time_things_out_maybe)
@ara4n
ara4n Dec 29, 2016 Member

why is it 3 mins here, but 2 mins in the _time_things_out_maybe above?

synapse/http/endpoint.py
+ return _WrappingEndointFac(transport_endpoint(reactor, domain, port, **endpoint_kw_args))
+
+
+class _WrappingEndointFac(object):
@ara4n
ara4n Dec 29, 2016 Member

s/Endoint/Endpoint/ probably ;)

@erikjohnston erikjohnston Clean up
b7336ff
synapse/http/matrixfederationclient.py
@@ -61,6 +61,11 @@
MAX_SHORT_RETRIES = 3
+def test(conn):
@ara4n
ara4n Dec 29, 2016 Member

hm, is this meant to be here?

@erikjohnston erikjohnston Spelling and comments
68030fd
synapse/http/endpoint.py
+ reactor.callLater(3 * 60, self._time_things_out_maybe)
+ return res
+
+ d.addCallback(update_request_time)
@ara4n
ara4n Dec 29, 2016 Member

why do we do this by both a callback and a reactor.callLater()?

@erikjohnston
erikjohnston Dec 29, 2016 Member

We schedule a timeout in 3 minutes both before we send the request, and after we received the response.

@erikjohnston erikjohnston Respect long_retries param and default to off
b4bc6fe
synapse/http/endpoint.py
+
+
+class _WrappedConnection(object):
+ """Wraps a connection and calls abort on it if it hasn't seen any actio
@uhoreg
uhoreg Dec 29, 2016

missing "n" in "action". Also, the comment says 5 minutes, but the code says 3 minutes?

@NegativeMjark NegativeMjark Manually abort the underlying TLS connection.
The abort() method calls loseConnection() which tries to shutdown the
TLS connection cleanly. We now call abortConnection() directly which
should promptly close both the TLS connection and the underlying TCP
connection.

I also added some TODO markers to consider cancelling the old previous
timeout rather than checking time.time(). But given how urgently we want
to get this code released I'd rather leave the existing code with the
duplicate timeouts and the time.time() check.
97ffc56
@NegativeMjark NegativeMjark changed the base branch to release-v0.18.6 from develop Dec 29, 2016
@NegativeMjark NegativeMjark merged commit 828c585 into release-v0.18.6 Dec 29, 2016

5 of 7 checks passed

Sytest Dendron (Merged PR) Build finished.
Details
Sytest Dendron (Commit) Build #1276 origin/erikj/timeout_conn in progress...
Details
Sytest Postgres (Commit) Build #2119 origin/erikj/timeout_conn succeeded in 9 min 25 sec
Details
Sytest Postgres (Merged PR) Build finished.
Details
Sytest SQLite (Commit) Build #2161 origin/erikj/timeout_conn succeeded in 8 min 22 sec
Details
Sytest SQLite (Merged PR) Build finished.
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment