Environment
- Python version: 3.8.0
- OS (
uname -a): Darwin io 18.7.0 Darwin Kernel Version 18.7.0
- Thespian version: 3.10.0
- Transport:
TCPTransport (on a single machine)
Error
When running an actor system on a single machine we get the following stack trace after a while:
2020-04-17 10:51:26.012604 p24308 ERR Actor esrally.driver.driver.Worker @ ActorAddr-(T|:53459) transport run exception: Traceback (most recent call last):
File "/Users/daniel/Projects/rally/.venv/lib/python3.8/site-packages/thespian/system/actorManager.py", line 87, in run
r = self.transport.run(self.handleMessages)
File "/Users/daniel/Projects/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 74, in run
rval = self._run_subtransport(incomingHandler, max_runtime)
File "/Users/daniel/Projects/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 80, in _run_subtransport
rval = self._runWithExpiry(incomingHandler)
File "/Users/daniel/Projects/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py", line 1143, in _runWithExpiry
rrecv, rsend, rerr = select.select(wrecv, wsend,
ValueError: file descriptor cannot be a negative integer (-1)
In the error log we also see:
2020-04-17 10:51:26.016014 p24308 ERR ValueError on select(#0: [], #3: [-1, 14, 13], #3: {13, -1, 14}, 0.166675)
I saw that the code in TCPTransport has handling for invalid file descriptors but it expects OSError to be raised.
I can reproduce this in our application when an actor sends large log messages so I wonder whether it is related. I also managed to capture a complete debug log output of Thespian (~ 13MB expanded; the error is towards the end of the log file).
The error is pretty well reproducible but the reproduction scenario is a bit involved (it requires running an application that is somewhat complex to setup) so I'm not sure whether it is helpful to describe it here. I am of course happy to replicate the steps here, provide further details and / or participate in testing but I wanted to get the issue out here first.
Environment
uname -a): Darwin io 18.7.0 Darwin Kernel Version 18.7.0TCPTransport(on a single machine)Error
When running an actor system on a single machine we get the following stack trace after a while:
In the error log we also see:
I saw that the code in
TCPTransporthas handling for invalid file descriptors but it expectsOSErrorto be raised.I can reproduce this in our application when an actor sends large log messages so I wonder whether it is related. I also managed to capture a complete debug log output of Thespian (~ 13MB expanded; the error is towards the end of the log file).
The error is pretty well reproducible but the reproduction scenario is a bit involved (it requires running an application that is somewhat complex to setup) so I'm not sure whether it is helpful to describe it here. I am of course happy to replicate the steps here, provide further details and / or participate in testing but I wanted to get the issue out here first.