Follow up of #14312: ensure IO completion routines and waitable timers are safe #14627

LeonarddeR · 2023-02-11T08:19:46Z

Link to issue number:

Follow up of #14312

Summary of the issue:

In #14312, we decoupled the background I/O thread from the braille module. However, the thread was still bound to IoBase in it's _initialRead method unless you'd override it.
While I hoped the changes in #14312 would decrease crashing of braille, this was indeed the cause. However, crashes are still occurring here and there. I'm pretty sure I know the cause lies in the IO done completion routines that are still executed on the IO thread without ensuring that the instances of the routines still exist. I've seen this causing several access violation errors.
Furthermore, #14312 saved bound methods in a dictionary on the iOThread. While this shouldn't be a big problem, this could cause potential issues with garbage collection (i.e. instances being kept alife forever because the APC of an instance was never called and it would be stuck in the cached apc dictionary).

Description of user facing changes

Hopefully, more stability.

Description of development approach

Added the ioThread as a constructor method to IoBase and derivatives as an optional argument. If not provided (default), the default IoThread is used. It will be added as a weak reference on the IoBase instance. Thereby it is no longer necessary to subclass IoBase or a derivative to override _initialRead to use another thread.
Added IoThread.setWaitableTimer and IoThread.getCompletionRoutine, also ensuring that the function pointers are available when the background thread tries to call them. The function wrapper keeps a weak reference to the method it wraps. Thereby it ensures that a method of a died instance is no longer executed.
IoThread.queueAsApc still stores a strong reference to the wrapped method on its APC, to keep backwards compatibility. This will be changed to weak references starting in NVDA 2024.1. API deprecation logic is in place.
Removed the pre_IoThreadStop extension point. It was added in 2023.1 but never implemented and announced, so I think it is safe to do so.

Testing strategy:

Test that hardware I/O still works with an APH mantis Q40. Only the most basic form of testing already proves that IoThread.queueAsApc and IoThread.getCompletionRoutine work, as the queued routines are executed correctly.
set receivesAckPackets on the Brailliant driver, observe that debug warnings will be logged indicating that the ACK timer is reset correctly when no ack packets can be handled in time. This proves that IoThread.setWaitableTimer works as expected.
Test points above bot with NVDAState._allowDeprecatedAPI True and False

Known issues with pull request:

This creates a new CFunc instance for every called completion routine and therefore for every async read. I think this is the safest method to both avoid crashes and ensure that routines won't be left behind.

Change log entries:

Bug fixes:

Several stability fixes to input/output for braille displays, resulting in less frequent access violations or even crashes of NVDA. (Follow up of #14312: ensure IO completion routines and waitable timers are safe #14627)
For Developers:
hwIo.base.IoBase and its derivatives now have a new constructor parameter to take a hwIo.IoThread. If not provided, the default thread is used.
hwIo.ioThread.IoThread now has a setWaitableTimer method to set a waitable timer using a python function. Similarly, the new getCompletionRoutine method allows you to convert a python method into a completion routine safely.
API deprecations:
Passing lambda functions to hwIo.ioThread.IoThread.queueAsApc is deprecated. Instead, functions should be weakly referenceable.

Code Review Checklist:

Pull Request description:
- description is up to date
- change log entries
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
API is compatible with existing add-ons.
Documentation:
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
Security precautions taken.

AppVeyorBot · 2023-02-11T13:42:33Z

PASS: Translation comments check.
PASS: Unit tests.
PASS: Lint check.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/o8brurgtwltoh0sq/artifacts/output/nvda_snapshot_pr14627-27667,37c02035.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 0.8,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 22.8,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 22.7,
FINISH_END 0.2

See test results for failed build of commit 37c02035ea

LeonarddeR · 2023-02-11T18:56:26Z

Never mind, I think it's too much for 2023.1. I really want to kill this issue with APC's once and for good, and it's probably best to bundle that.

AppVeyorBot · 2023-02-13T08:50:54Z

Build (for testing PR): https://ci.appveyor.com/api/buildjobs/fg6s2965mhiaiooq/artifacts/output/
CI timing (mins):
INIT 0.0,
INSTALL_START 0.9,
INSTALL_END 0.9,
BUILD_START 0.0,
FINISH_END 21.1

See test results for failed build of commit f66ba17393

AppVeyorBot · 2023-02-13T12:29:43Z

PASS: Translation comments check.
PASS: Unit tests.
FAIL: System tests (tags: installer NVDA). See test results for more information.
CI timing (mins):
INIT 0.0,
INSTALL_START 0.8,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 23.4,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 20.2,
FINISH_END 0.4
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/7vx2kaei68aapj2e/artifacts/output/nvda_snapshot_pr14627-27673,61f2cba8.exe
FAIL: Lint check. See test results for more information.

See test results for failed build of commit 61f2cba87a

AppVeyorBot · 2023-02-14T13:02:16Z

PASS: Translation comments check.
PASS: Unit tests.
PASS: Lint check.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/59onus6ax07jicyp/artifacts/output/nvda_snapshot_pr14627-27696,366b48ec.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 0.9,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 23.6,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 22.7,
FINISH_END 0.2

See test results for failed build of commit 366b48eca7

LeonarddeR · 2023-02-14T13:38:15Z

@michaelDCurran and @seanbudd I leave it up to you whether this can go in 2023.1. I think there are some aspects counting against delaying this to 2023.2.

I've heard reports from people like @dkager that current alpha versions still crash here and there, especially around going in and out of stand by with a bluetooth display connected
In comparison with Decouple BgThread from braille module #14312, this now stores weak references instead of strong references to functions. Strictly spoken this adds an additional API requirement, since references to functions need to be kept alive by the caller

AppVeyorBot · 2023-02-15T07:53:59Z

PASS: Translation comments check.
PASS: Unit tests.
PASS: Lint check.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/9teekygife13i81t/artifacts/output/nvda_snapshot_pr14627-27706,400e7464.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 0.9,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 22.4,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 19.8,
FINISH_END 0.3

See test results for failed build of commit 400e746420

burmancomp · 2023-03-16T14:07:06Z

I'm somewhat unsure if I'm writing this to the right place but I thought to ask if you have encountered following errors. I'm using Albatross. Build is based on current main branch code. I think this is rarely occuring issue. I may have encountered this some time ago but this time I investigated log file. When NVDA has run several days (which is common in my case) investigating log file, especially when debug level is in use, may be quite hard due to size of file.

I can attach more detailed log entries from this session if needed.

"..." means skipped lines):

...
ERROR - stderr (13:09:11.080) - hwIo.ioThread.IoThread (10916):
Exception in thread hwIo.ioThread.IoThread:
Traceback (most recent call last):
File "threading.pyc", line 926, in bootstrap_inner
File "hwIo\ioThread.pyc", line 91, in run
OSError: exception: access violation reading 0x053B0058
...
ERROR - eventHandler.executeEvent (13:09:13.063) - MainThread (11860):
error executing event: UIA_elementSelected on <NVDAObjects.Dynamic_UIItemListItemUIA object at 0x013ABF30> with extra args of {}
Traceback (most recent call last):
File "eventHandler.pyc", line 300, in executeEvent
File "eventHandler.pyc", line 101, in init
File "eventHandler.pyc", line 110, in next
File "NVDAObjects\UIA_init.pyc", line 2089, in event_UIA_elementSelected
File "NVDAObjects_init_.pyc", line 1248, in event_selection
File "NVDAObjects\UIA_init_.pyc", line 2289, in event_stateChange
File "NVDAObjects_init_.pyc", line 1264, in event_stateChange
File "braille.pyc", line 2487, in handleUpdate
File "braille.pyc", line 2274, in update
File "braille.pyc", line 2195, in _updateDisplay
File "braille.pyc", line 2259, in _displayWithCursor
File "braille.pyc", line 2243, in _writeCells
File "braille.pyc", line 2248, in _writeCellsInBackground
File "hwIo\ioThread.pyc", line 58, in queueAsApc
RuntimeError: Thread is not running
...

LeonarddeR · 2023-03-16T14:56:08Z

These access violation errors are exactly the type of errors this pr wants to fix indeed.

LeonarddeR · 2023-03-27T06:29:58Z

@michaelDCurran and @seanbudd I went again over this, ensuring that this is backwards compatible now. Would love it if this could go into 2023.2.

source/extensionPoints/util.py

source/hwIo/base.py

tests/unit/test_hwIo.py

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

…IoMore

AppVeyorBot · 2023-03-28T08:12:02Z

PASS: Translation comments check.
PASS: Unit tests.
FAIL: Lint check. See test results for more information.
PASS: System tests (tags: installer NVDA).
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/b8tb75y3ut3cdaga/artifacts/output/nvda_snapshot_pr14627-27930,9e9742ec.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 0.8,
INSTALL_END 0.8,
BUILD_START 0.0,
BUILD_END 23.7,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 21.9,
FINISH_END 0.1

See test results for failed build of commit 9e9742ec38

tests/unit/test_hwIo.py

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

…IoMore

AppVeyorBot · 2023-03-29T07:11:37Z

PASS: Translation comments check.
PASS: Unit tests.
FAIL: Lint check. See test results for more information.
PASS: System tests (tags: installer NVDA).
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/2athgd0umiptm7rp/artifacts/output/nvda_snapshot_pr14627-27951,157546c9.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 0.8,
INSTALL_END 0.8,
BUILD_START 0.0,
BUILD_END 23.5,
TESTSETUP_START 0.0,
TESTSETUP_END 0.3,
TEST_START 0.0,
TEST_END 21.9,
FINISH_END 0.2

See test results for failed build of commit 157546c997

@jcsteh

Related to #14899, #14312, #14627 Fixes #14895 Summary of the issue: Despite several attempts to fix this, NVDA's IoThread can crash without a clear cause. Description of user facing changes Less crashes, most likely, as tests indicate that this is the case. Description of development approach As proposed by @jcsteh , rather than creating a new function pointer for every APC or completion routine call, use a single internal APC and completion routine and use an internal cache to store the python functions, not the actual APC functions.

Fixup of #14924 Summary of the issue: In #14627, we introduced weak references for APCs called as part of a waitable timer. In #14924, this was made more robust by using a single internal APC func. However in the porting process, a part of the logic was reversed, therefore in the internal APC store, we still stored strong rather than weak references. Description of user facing changes None. Description of development approach Store references instead of functions in the apc store.

hwIo: better way to allow another ioThread for IoBase

4e8b7ed

LeonarddeR requested a review from a team as a code owner February 11, 2023 08:19

LeonarddeR requested a review from michaelDCurran February 11, 2023 08:19

LeonarddeR added 2 commits February 11, 2023 09:22

Perform initial read

e659e50

Obsolete comment

7b03697

LeonarddeR marked this pull request as draft February 11, 2023 18:56

LeonarddeR and others added 3 commits February 13, 2023 08:54

Use weak references for completion routines and APCs

dd96647

DebugWarning level

acf953e

Ensure lambda's don't go out of scope

3aa608a

LeonarddeR changed the title ~~Follow up of #14312: Better way to allow another ioThread for IoBase~~ Follow up of #14312: Only save weak references of APC methods and ensure IO completion routines are safe Feb 13, 2023

Typo

aa6fa73

Add waitable timer support

66246c3

LeonarddeR marked this pull request as ready for review February 14, 2023 13:34

LeonarddeR added 2 commits February 15, 2023 08:07

Merge remote-tracking branch 'origin/master' into hwIoMore

2894618

Add annotations import

bfd9840

LeonarddeR and others added 2 commits February 16, 2023 08:52

Merge remote-tracking branch 'origin/master' into hwIoMore

23be9cc

Merge remote-tracking branch 'origin/master' into hwIoMore

ca79752

LeonarddeR mentioned this pull request Mar 23, 2023

Test client side support for Vmware Horizon LeonarddeR/rdAccess#2

Closed

Preserve backwards compat

78c14f8

LeonarddeR marked this pull request as draft March 25, 2023 08:10

LeonarddeR changed the title ~~Follow up of #14312: Only save weak references of APC methods and ensure IO completion routines are safe~~ Follow up of #14312: ensure IO completion routines and waitable timers are safe Mar 25, 2023

LeonarddeR and others added 2 commits March 25, 2023 19:21

Remove unnecessary import

45fcfd4

Merge remote-tracking branch 'origin/master' into hwIoMore

16035b2

LeonarddeR marked this pull request as ready for review March 27, 2023 06:28

seanbudd added the merge-early Merge Early in a developer cycle label Mar 28, 2023

seanbudd reviewed Mar 28, 2023

View reviewed changes

source/extensionPoints/util.py Show resolved Hide resolved

source/hwIo/base.py Show resolved Hide resolved

source/hwIo/base.py Outdated Show resolved Hide resolved

source/hwIo/base.py Outdated Show resolved Hide resolved

tests/unit/test_hwIo.py Outdated Show resolved Hide resolved

LeonarddeR and others added 5 commits March 28, 2023 08:34

Add depr logic for hwIo.base.LPOVERLAPPED_COMPLETION_ROUTINE

998d37e

Merge remote-tracking branch 'origin/master' into hwIoMore

81c43c1

Update source/hwIo/base.py

a79d8a7

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Add type annotation

1c0b460

Merge branch 'hwIoMore' of https://github.com/leonardder/nvda into hw…

95ec538

…IoMore

seanbudd reviewed Mar 29, 2023

View reviewed changes

tests/unit/test_hwIo.py Outdated Show resolved Hide resolved

LeonarddeR and others added 4 commits March 29, 2023 07:54

Merge remote-tracking branch 'origin/master' into hwIoMore

0ce0fe9

Add comments

8e582c0

Update tests/unit/test_hwIo.py

a6f39dc

Co-authored-by: Sean Budd <seanbudd123@gmail.com>

Merge branch 'hwIoMore' of https://github.com/leonardder/nvda into hw…

5a1a485

…IoMore

Linting

e23271d

seanbudd approved these changes Mar 30, 2023

View reviewed changes

seanbudd added 2 commits March 30, 2023 13:35

Merge remote-tracking branch 'origin/master' into hwIoMore

8e78785

update changes

67a3dc7

seanbudd merged commit e0ba12c into nvaccess:master Mar 30, 2023
1 check was pending

nvaccessAuto added this to the 2023.2 milestone Mar 30, 2023

This was referenced May 9, 2023

Error in hardware support with some events in Thunderbird and other places #14895

Closed

Use a single APC and completion routine #14924

Merged

LeonarddeR mentioned this pull request Jul 29, 2023

Fixup #14924: again use weak references for waitable timer functions #15215

Merged

7 tasks

LeonarddeR mentioned this pull request Sep 6, 2023

Remove scheduled deprecations for 2024.1 #15385

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Follow up of #14312: ensure IO completion routines and waitable timers are safe #14627

Follow up of #14312: ensure IO completion routines and waitable timers are safe #14627

LeonarddeR commented Feb 11, 2023 •

edited by seanbudd

AppVeyorBot commented Feb 11, 2023

LeonarddeR commented Feb 11, 2023

AppVeyorBot commented Feb 13, 2023

AppVeyorBot commented Feb 13, 2023

AppVeyorBot commented Feb 14, 2023

LeonarddeR commented Feb 14, 2023

AppVeyorBot commented Feb 15, 2023

burmancomp commented Mar 16, 2023

LeonarddeR commented Mar 16, 2023

LeonarddeR commented Mar 27, 2023

AppVeyorBot commented Mar 28, 2023

AppVeyorBot commented Mar 29, 2023

Follow up of #14312: ensure IO completion routines and waitable timers are safe #14627

Follow up of #14312: ensure IO completion routines and waitable timers are safe #14627

Conversation

LeonarddeR commented Feb 11, 2023 • edited by seanbudd

Link to issue number:

Summary of the issue:

Description of user facing changes

Description of development approach

Testing strategy:

Known issues with pull request:

Change log entries:

Code Review Checklist:

AppVeyorBot commented Feb 11, 2023

LeonarddeR commented Feb 11, 2023

AppVeyorBot commented Feb 13, 2023

AppVeyorBot commented Feb 13, 2023

AppVeyorBot commented Feb 14, 2023

LeonarddeR commented Feb 14, 2023

AppVeyorBot commented Feb 15, 2023

burmancomp commented Mar 16, 2023

LeonarddeR commented Mar 16, 2023

LeonarddeR commented Mar 27, 2023

AppVeyorBot commented Mar 28, 2023

AppVeyorBot commented Mar 29, 2023

LeonarddeR commented Feb 11, 2023 •

edited by seanbudd