Skip to content

BF: allow enough time for all ioHub device._close() to finish#6540

Merged
peircej merged 2 commits intopsychopy:releasefrom
mh105:release-iohubserver-shutdown
Jun 10, 2024
Merged

BF: allow enough time for all ioHub device._close() to finish#6540
peircej merged 2 commits intopsychopy:releasefrom
mh105:release-iohubserver-shutdown

Conversation

@mh105
Copy link
Copy Markdown
Contributor

@mh105 mh105 commented Jun 8, 2024

TLDR:

  1. This BF fixes the problem of gevent.sleep() during greenlets foreign to gevent.joinall() returning the server process during shutdown. --- this is achieved by adding a generic gevent.wait() at the end of the server process script.

  2. This BF gives all ioHub devices infinite amount of time to finish their processing --- this is achieved by removing the timeout=5 when waiting for Computer.iohub_process to finish during ioHubConnection._shutDownServer().

I believe this change is in line with how PsychoPy waits for threads during core.quit(). When things hang, people can always force shut the python session to exit anyways.


Detailed explanation for reference:

There is currently a horse race problem at the end of an experiment when closing down ioHubConnection. This is because ioHubServer is started on a subprocess and uses greenlets via gevent to allow different C stacks to get cooperatively scheduled. The problem is that gevent.joinall(glets) here only explicitly joins the spawned greenlets. When the psychopy-eyetracker-sr-research calls gevent.sleep(0.01) during setRecordingState(), it is in a strange position. It is not on the MAIN greenlet, but it still immediately yields back to the gevent hub the greenlet executing _close() and continues the ioHub server process script. This is likely because gevent.joinall(glets) doesn't know about the greenlet. Since no other joined greenlet runs slower than the eyetracker _close() processing, the gevent.sleep() call completes the start_iohub_process.py execution right away and returns Computer.iohub_process.wait(). Then core.quit() immediately kills the poor greenlets without mercy before they could call for help, so to speak.

Since we now rely on ioHub server _shutDownServer() to close all the ioHub devices properly and finish remaining processing (such as setting eyetracker recording and connection states to False, downloading the eyetracking data files, etc., etc.), the use of timeout=5 in Computer.iohub_process.wait() is no longer appropriate.

This problem has likely existed in ioHub from the beginning. In the demo coder scripts (here and here), we do

    # Stop eye data recording
    tracker.setRecordingState(False)
    t += 1

# All Trials are done
# End experiment
win.close()
tracker.setConnectionState(False)
core.quit()

at the end of an experiment. I don't think the iohub_process subprocess would wait for tracker.setRecordingState(False) to finish if gevent.sleep() is called as a part of it. NOTE: this doesn't happen in the setRecordingState() for mouseGaze.

Presumably

win.close()
tracker.setConnectionState(False)

would take a bit longer than 500ms. There is no gevent.sleep() called during tracker.setConnectionState(False), so the subprocess will wait for it to return. However, if .setConnectionState(False) takes more than 5 seconds (such as due to downloading a large eye tracking data file), the core.quit() will still kill it.

Currently in Builder we don't make explicit call to close eye trackers like in the example coder scripts. There was an attempt to do so, which I removed in #6526 since deviceManager.removeDevice('eyetracker') does not actually close the eye tracker without .close() defined for ioHub EyeTrackerDevice. Of course, even if we were to expose a .close() function, this would become one greenlet, which the gevent.sleep(0.01) during .setRecordingState(False) would yield, i.e., the ioHub subprocess will not wait for .close() to return.

The problem is therefore twofold:

  1. The ioHub subprocess doesn't wait for anonymous greenlets;
  2. PsychoPy main process imposes a 5s timeout on the ioHub subprocess

The reason why timeout=5 is problematic is because ioHub devices in deviceManager.devices are exposed as ioHubDeviceView instances and communicated with requests via UDP sockets. Whenever we do a tracker dot function call, it is handled as sending a request to the ioHub subprocess and that line immediately returns. The main process of PsychoPy carries on. So even if the ioHub subprocess waits (by making additional UDP requests such as .setConnectionState(False), or even just adding time.sleep() in _close()), when things take longer than 5s, ioHubConnection._shutdownSever() will still kill it because of the Computer.iohub_process.wait(timeout=5).

In summary, I don't think it is feasible to close down eye trackers safely during core.quit() with the Computer.iohub_process.wait(timeout=5) in there. All function calls to the eyetracker as an ioHubDeviceView instance is made through UDP requests to the ioHub server running on the iohub_process subprocess. So everything running longer than 5 seconds will get killed by the timeout. The cleanest way is to remove the timeout and ask the main psychopy process to wait for the iohub_process to return, just like how it waits for other threads to close.

@TEParsons TEParsons requested a review from mdcutone June 10, 2024 10:54
@peircej peircej merged commit b0bb4ab into psychopy:release Jun 10, 2024
@peircej
Copy link
Copy Markdown
Member

peircej commented Jun 10, 2024

You make a convincing argument and I'm pulling this in. We have a few weeks of testing ahead before a target release date for 2024.2.0 of 15 July, so we should have a little time to work out if this new approach is causing trouble (e.g. taking a long time to close etc).

@mh105 mh105 deleted the release-iohubserver-shutdown branch June 14, 2024 03:53
@peircej peircej added the 🐞 bug Issue describes a bug (crash or error) or undefined behavior. label Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Issue describes a bug (crash or error) or undefined behavior.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants