Skip to content

ValueError: file descriptor cannot be a negative integer (-1) #63

@danielmitterdorfer

Description

@danielmitterdorfer

Environment

  • Python version: 3.8.0
  • OS (uname -a): Darwin io 18.7.0 Darwin Kernel Version 18.7.0
  • Thespian version: 3.10.0
  • Transport: TCPTransport (on a single machine)

Error

When running an actor system on a single machine we get the following stack trace after a while:

2020-04-17 10:51:26.012604 p24308 ERR  Actor esrally.driver.driver.Worker @ ActorAddr-(T|:53459) transport run exception: Traceback (most recent call last):
  File "/Users/daniel/Projects/rally/.venv/lib/python3.8/site-packages/thespian/system/actorManager.py", line 87, in run
    r = self.transport.run(self.handleMessages)
  File "/Users/daniel/Projects/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 74, in run
    rval = self._run_subtransport(incomingHandler, max_runtime)
  File "/Users/daniel/Projects/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/wakeupTransportBase.py", line 80, in _run_subtransport
    rval = self._runWithExpiry(incomingHandler)
  File "/Users/daniel/Projects/rally/.venv/lib/python3.8/site-packages/thespian/system/transport/TCPTransport.py", line 1143, in _runWithExpiry
    rrecv, rsend, rerr = select.select(wrecv, wsend,
ValueError: file descriptor cannot be a negative integer (-1)

In the error log we also see:

2020-04-17 10:51:26.016014 p24308 ERR  ValueError on select(#0: [], #3: [-1, 14, 13], #3: {13, -1, 14}, 0.166675)

I saw that the code in TCPTransport has handling for invalid file descriptors but it expects OSError to be raised.

I can reproduce this in our application when an actor sends large log messages so I wonder whether it is related. I also managed to capture a complete debug log output of Thespian (~ 13MB expanded; the error is towards the end of the log file).

The error is pretty well reproducible but the reproduction scenario is a bit involved (it requires running an application that is somewhat complex to setup) so I'm not sure whether it is helpful to describe it here. I am of course happy to replicate the steps here, provide further details and / or participate in testing but I wanted to get the issue out here first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions