-
-
Notifications
You must be signed in to change notification settings - Fork 752
Description
Steps to reproduce:
- Pair a Bluetooth serial braille display to the machine. E.g. An Orbit Reader 40.
- Ensure the braille display is off / not in range of Bluetooth.
- Ensure NvDA's braille display selection is set to 'Automatic'.
- Restart NvDA. Note for the best chance of reproducing this, Quit NvDA by pressing NVDA+q and then enter, and then quickly start NVDA from the Run dialog or a terminal.
Actual behavior:
When the new NvDA instance starts:
- alt tabbing reports nothing.
- NVDA seems to occasionally freeze for up to 10 seconds at a time.
- NVDA cannot provide virtual buffers and or other features that rely on process injection.
Expected behavior:
The new instance of NvDA should function normally.
NVDA logs, crash dumps and other attachments:
In nvda.log, with debugging, after the normal final line "NvDA Exit", we see the following:
DEBUG - hwIo.base.Serial.__init__ (16:42:40.778) - ThreadPoolExecutor-0_0 (12472):
Open failed: could not open port 'COM13': OSError(22, 'The semaphore timeout period has expired.', None, 121)
DEBUGWARNING - brailleDisplayDrivers.baum.BrailleDisplayDriver.__init__ (16:42:40.778) - ThreadPoolExecutor-0_0 (12472):
Traceback (most recent call last):
File "brailleDisplayDrivers\baum.py", line 172, in __init__
self._dev = hwIo.Serial(
^^^^^^^^^^^^
File "hwIo\base.py", line 253, in __init__
self._ser = serial.Serial(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\mick\programming\git\nvda\.venv\Lib\site-packages\serial\serialwin32.py", line 33, in __init__
super(Serial, self).__init__(*args, **kwargs)
File "C:\Users\mick\programming\git\nvda\.venv\Lib\site-packages\serial\serialutil.py", line 244, in __init__
self.open()
File "C:\Users\mick\programming\git\nvda\.venv\Lib\site-packages\serial\serialwin32.py", line 64, in open
raise SerialException("could not open port {!r}: {!r}".format(self.portstr, ctypes.WinError()))
serial.serialutil.SerialException: could not open port 'COM13': OSError(22, 'The semaphore timeout period has expired.', None, 121)
DEBUGWARNING - braille.BrailleHandler.setDisplayByName (16:42:40.788) - ThreadPoolExecutor-0_0 (12472):
Couldn't initialize display driver 'baum'
Traceback (most recent call last):
File "braille.py", line 2482, in setDisplayByName
self._setDisplay(newDisplayClass, isFallback=isFallback, detected=detected)
File "braille.py", line 2552, in _setDisplay
newDisplay = self._switchDisplay(oldDisplay, newDisplayClass, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "braille.py", line 2522, in _switchDisplay
extensionPoints.callWithSupportedKwargs(newDisplay.__init__, **kwargs)
File "extensionPoints\util.py", line 216, in callWithSupportedKwargs
return func(*boundArguments.args, **boundArguments.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "brailleDisplayDrivers\baum.py", line 218, in __init__
raise RuntimeError("No Baum display found")
RuntimeError: No Baum display found
DEBUG - autoSettingsUtils.autoSettings.AutoSettings._registerConfigSaveAction (16:42:40.788) - ThreadPoolExecutor-0_0 (12472):
registering pre_configSave action: <class 'brailleDisplayDrivers.noBraille.BrailleDisplayDriver'>
DEBUG - autoSettingsUtils.autoSettings.AutoSettings._loadSpecificSettings (16:42:40.794) - ThreadPoolExecutor-0_0 (12472):
loading braille noBraille
INFO - braille.BrailleHandler._setDisplay (16:42:40.794) - ThreadPoolExecutor-0_0 (12472):
Loaded braille display driver 'noBraille', current display has 0 cells.
Technical findings
It is best if all background threads in NVDA are daemon threads, so that we can guarantee that they will be killed off if running when NVDA's main thread dies.
However, NVDA's braille display detector (that runs _bgScan) is implemented as a ThreadPoolExecutor. ThreadPoolExecutor starts non-daemon worker threads which are never joined before NVDA exits. Thus if NVDA is in the middle of a bluetooth scan when exiting, this thread may hold the NVDA process around until it is done.
In the case where a bluetooth serial device is out of range / off, this can cause Windows to block on CreateFile for 5 seconds while it tries to open the COM port. The only way this can be interrupted is to kill the thread currently blocked on the call, and or kill the entire process.
Because this thread is not joined and or killed before NvDA exits, NVDA releases its mutex, which allows a new instance of NvDA to start while the old process is still blocked on the bluetooth scan. In theory this should be okay - everything else in NVDA has been correctly finalized. However, it causes some significant issues in nvdaHelper which are very hard to work around.
@jcsteh also came up with steps to reproduce this without requiring Braille display auto detection:
1.Open the Python console.
2.Paste this code:
import atexit, time, winsound
def e():
time.sleep(10)
winsound.PlaySound("waves/browseMode.wav", 0)
atexit.register(e)
3. Exit NVDA.
4. Before 10 seconds elapses, start NVDA again from the Run dialog. Note that using the restart option from the Exit dialog sometimes results in the old NVDA getting killed off, perhaps because the old NVDA is still cleaning up the mutex and GUI when the new one starts. If this happens, the browse mode sound won't play 10 seconds after you run the Python code.
The first issue is that there can be a deadlock in the UI thread of a process if a new nvdahelperRemote is injected while the old nvdahelperRemote is terminating. Specifically, the winEvent callback which starts the new inproc manager thread if needed, blocks on the inproc manager thread for up to 10 seconds as it initializes all subsystems. However, when we terminate IAccessible2 support in the old inproc manager thread, in order to correctly unregister IAccessible2 from the UI thread, we post a window message to the UI thread and wait for it to be handled (for up to 10 seconds). But, in this case, the UI thread is currently waiting on the new inproc manager thread to initialize, which is itself waiting on the inproc manager thread mutex.
This bug is pretty easy to fix really, We should just not wait in the UI thread on the inproc manager thread to initialize. True we may miss a focus/foreground event going to the inproc manager thread callbacks, but that is not as bad as deadlocking.
The second issue is that our current RPC implementation for nvdaController / nvdaControllerInternal does not support being replaced by a new instance of NvDA while the old NvDA process is still alive, even if the old NVDA process correctly terminated nvdaHelperLocal / RPC support. this is due to the fact that we use rpcUseProtSeqEp to tell RPC to listen on a particular known ncalrpc endpoint, but there is no function available to unbind from this endpoint, even if we do correctly unregister all our individual RPC interfaces. We could instead use functions like rpcEpRegister and register nvdaControllerInternal with a unique UUID, or we could continue to use rpcUseProcSeqEp but use a unique string. But either way, we'd need a way to communicate this new UUID or string to the in-process code beforehand. Lots of possibilities here, but would involve some significant rewriting of our injection code.
Possible solution
At an absolute minimum, just before releasing our mutex in nvda.pyw, we should log and join any remaining non-daemon thread.
This will ensure we continue to hold the mutex while any possible braille auto detect bluetooth scan completes, and or any other non-daemon background thread (such as in a bad add-on) completes.
It is possible that the new instance of NVDA will eventually kill off the old process before the background thread completes, but at least it will either complete or be killed off, rather than the process staying around after the mutex is released.
Further to this, we could rewrite the bgScan thread to be a daemon thread. This may have the added advantage that Python can kill off this thread at exit, therefore speed up NVDA exit when auto detecting.
Another alternative we considered but I don't think is appropriate now is to simply set wait=True on ThreadPoolExecutor.shutdown when terminating the detector. this is bad as although NvDA will wait, it has not yet terminated a lot of other subsystems, and will most likely be killed off by the new instance.
Thoughts on all this @jcsteh and @LeonarddeR ?