Skip to content

Increase socket timeout#247

Merged
adamchainz merged 1 commit intomasterfrom
increase_socket_timeout
Nov 4, 2019
Merged

Increase socket timeout#247
adamchainz merged 1 commit intomasterfrom
increase_socket_timeout

Conversation

@adamchainz
Copy link
Copy Markdown
Contributor

Both @naohiro-t and I have seen the socket time out during startup only to reconnect shortly after:

2019-08-30T09:25:10+0100 DEBUG Parsing Core Agent manifest path: /tmp/scout_apm_core/scout_apm_core-v1.1.8-x86_64-apple-darwin/manifest.json
2019-08-30T09:25:10+0100 DEBUG Core Agent manifest json: {u'core_agent_version': u'1.1.8', u'core_agent_binary': u'core-agent', u'version': u'1.1.8', u'core_agent_binary_sha256': u'2896f584cb373c596164805510531a34b29e4eb34a13685edd4c6df364161414'}
[2019-08-30T08:25:10][core_agent][INFO] Initializing logger with log level: Info
2019-08-30T09:25:11+0100 DEBUG CoreAgentSocket thread exception: timeout('timed out',)
Traceback (most recent call last):
  File "/Users/chainz/Documents/Projects/scout_apm_python/src/scout_apm/core/socket.py", line 105, in run
    result = self._send(body)
  File "/Users/chainz/Documents/Projects/scout_apm_python/src/scout_apm/core/socket.py", line 174, in _send
    self._read_response()
  File "/Users/chainz/Documents/Projects/scout_apm_python/src/scout_apm/core/socket.py", line 184, in _read_response
    raw_size = self.socket.recv(4)
timeout: timed out
2019-08-30T09:25:11+0100 DEBUG CoreAgentSocket thread stopped.
2019-08-30T09:25:11+0100 DEBUG CoreAgentSocket attempt 1, connecting to /tmp/scout_apm_core/scout_apm_core-v1.1.8-x86_64-apple-darwin/scout-agent.sock, PID: 90625, Thread: <CoreAgentSocket(Thread-1, started daemon 123145439891456)>
2019-08-30T09:25:11+0100 DEBUG Monkey patched SQL
2019-08-30T09:25:11+0100 DEBUG CoreAgentSocket is connected

A bump up to a 3 second timeout seems to fix it - at least on my machine.

I guess the biggest risk of a too-fast timeout is continually reconnecting and re-registering when a server is low on CPU cycles, which would amplify the CPU usage of the library/agent. A longer timeout seems appropriate and unlikely to cause problems because this happens in a background thread.

@dlanderson
Copy link
Copy Markdown
Contributor

This may be papering over an issue with the core agent. Since the socket is already connected, communication with the core agent should be very fast. It's possible the core agent is blocked somewhere it shouldn't be. @cschneid

@adamchainz
Copy link
Copy Markdown
Contributor Author

Agree it's papering over an issue, but after debugging on Slack we don't seem to have found a cause? @cschneid what do you think about merging this?

Both @naohiro-t and I have seen the socket time out during startup only to reconnect shortly after:

```
2019-08-30T09:25:10+0100 DEBUG Parsing Core Agent manifest path: /tmp/scout_apm_core/scout_apm_core-v1.1.8-x86_64-apple-darwin/manifest.json
2019-08-30T09:25:10+0100 DEBUG Core Agent manifest json: {u'core_agent_version': u'1.1.8', u'core_agent_binary': u'core-agent', u'version': u'1.1.8', u'core_agent_binary_sha256': u'2896f584cb373c596164805510531a34b29e4eb34a13685edd4c6df364161414'}
[2019-08-30T08:25:10][core_agent][INFO] Initializing logger with log level: Info
2019-08-30T09:25:11+0100 DEBUG CoreAgentSocket thread exception: timeout('timed out',)
Traceback (most recent call last):
  File "/Users/chainz/Documents/Projects/scout_apm_python/src/scout_apm/core/socket.py", line 105, in run
    result = self._send(body)
  File "/Users/chainz/Documents/Projects/scout_apm_python/src/scout_apm/core/socket.py", line 174, in _send
    self._read_response()
  File "/Users/chainz/Documents/Projects/scout_apm_python/src/scout_apm/core/socket.py", line 184, in _read_response
    raw_size = self.socket.recv(4)
timeout: timed out
2019-08-30T09:25:11+0100 DEBUG CoreAgentSocket thread stopped.
2019-08-30T09:25:11+0100 DEBUG CoreAgentSocket attempt 1, connecting to /tmp/scout_apm_core/scout_apm_core-v1.1.8-x86_64-apple-darwin/scout-agent.sock, PID: 90625, Thread: <CoreAgentSocket(Thread-1, started daemon 123145439891456)>
2019-08-30T09:25:11+0100 DEBUG Monkey patched SQL
2019-08-30T09:25:11+0100 DEBUG CoreAgentSocket is connected
```

A bump up to a 3 second timeout seems to fix it - at least on my machine.

I guess the biggest risk of a too-fast timeout is continually reconnecting and re-registering when a server is low on CPU cycles, which would amplify the CPU usage of the library/agent. A longer timeout seems appropriate and unlikely to cause problems because this happens in a background thread.
@adamchainz adamchainz force-pushed the increase_socket_timeout branch from 74688f2 to c1b55b6 Compare November 4, 2019 16:48
@adamchainz adamchainz merged commit b2f21a0 into master Nov 4, 2019
@adamchainz adamchainz deleted the increase_socket_timeout branch November 4, 2019 17:06
@adamchainz
Copy link
Copy Markdown
Contributor Author

Merged after discussion on Slack - there may still be some sticking points in the core agent as @cschneid said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants