fix(winml): null EpCatalog handle after enumeration to prevent QNN NPU crash on exit#701
Merged
Conversation
…U crash on exit WinMLEpCatalogRelease crashes with ACCESS_VIOLATION (0xC0000005) on some QNN NPU driver configurations during process cleanup. The crash is a Windows SEH exception that Python's try/except cannot catch, causing every non-cached winml build to exit with STATUS_ACCESS_VIOLATION instead of 0. Two independent singletons each create an EpCatalog and hold its native handle live until interpreter shutdown — WinML (winml.py) and WinMLEPRegistry (ep_registry.py). Both are initialised during the Optimize stage and their __del__ methods call WinMLEpCatalogRelease at process exit, which crashes on affected systems. Fix: null out _handle on both EpCatalog instances immediately after find_all_providers() returns. All EP library paths have been extracted by that point, so the handle is no longer needed. EpCatalog.close() checks `if self._handle` before calling WinMLEpCatalogRelease, so the call becomes a no-op for the rest of the process lifetime regardless of when or which thread triggers cleanup. The OS reclaims native resources when the process exits. Verified: Intel/dpt-hybrid-midas NPU build previously crashed 2/2 times at exit; passes 2/2 times after this fix.
xieofxie
approved these changes
May 22, 2026
Add Workaround:/TODO: markers to both workaround sites so reviewers and future maintainers can easily identify and remove the code once windowsml fixes WinMLEpCatalogRelease upstream.
chinazhangchao
approved these changes
May 22, 2026
DingmaomaoBJTU
added a commit
that referenced
this pull request
May 22, 2026
…U crash on exit (#701) ## Summary `WinMLEpCatalogRelease` crashes with `ACCESS_VIOLATION` (0xC0000005) on some QNN NPU driver configurations during Python interpreter shutdown, causing every non-cached `winml build` to exit with `STATUS_ACCESS_VIOLATION` instead of 0. ## Root Cause Two independent singletons — `WinML` (in `winml.py`) and `WinMLEPRegistry` (in `session/ep_registry.py`) — each create a `windowsml.EpCatalog` instance to enumerate EP library paths. After use, Python's garbage collector eventually calls `EpCatalog.__del__` → `close()` → `WinMLEpCatalogRelease(self._handle)`. On affected QNN NPU driver configurations this native call raises a Windows SEH exception (`STATUS_ACCESS_VIOLATION`), which Python's `try/except Exception` cannot catch. The crash fires on a background thread during interpreter shutdown — after all build stages complete successfully — so exit code 3221225477 (0xC0000005) is observed even though `quantized.onnx` was written correctly. This explains why 19/22 non-cached models failed in the eval run (cache hits never initialize `EpCatalog`). Stack trace captured by `faulthandler`: ``` Thread 0x0000bc98: File "windowsml/__init__.py", line 428 in close # WinMLEpCatalogRelease(self._handle) File "windowsml/__init__.py", line 439 in __del__ ``` ## Fix After `find_all_providers()` returns, all EP library paths have been extracted into a plain Python dict. The `EpCatalog` handle is no longer needed for the rest of the process lifetime. Setting `self._catalog._handle = None` makes `EpCatalog.close()` a no-op (it guards on `if self._handle:`), preventing the crash regardless of when or which thread triggers `__del__`. OS reclaims native resources on process exit. Applied to both `EpCatalog`-holding singletons: - `WinML.__init__` in `winml.py` - `WinMLEPRegistry._load_ep_catalog` in `session/ep_registry.py` ## Verification Ran `Intel/dpt-hybrid-midas` NPU build (the model that crashed most reliably): - **Before fix**: 2/2 runs exit with code 3221225477 (0xC0000005) - **After fix**: 2/2 runs exit with code 0 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WinMLEpCatalogReleasecrashes withACCESS_VIOLATION(0xC0000005) on some QNN NPU driver configurations during Python interpreter shutdown, causing every non-cachedwinml buildto exit withSTATUS_ACCESS_VIOLATIONinstead of 0.Root Cause
Two independent singletons —
WinML(inwinml.py) andWinMLEPRegistry(insession/ep_registry.py) — each create awindowsml.EpCataloginstance to enumerate EP library paths. After use, Python's garbage collector eventually callsEpCatalog.__del__→close()→WinMLEpCatalogRelease(self._handle). On affected QNN NPU driver configurations this native call raises a Windows SEH exception (STATUS_ACCESS_VIOLATION), which Python'stry/except Exceptioncannot catch.The crash fires on a background thread during interpreter shutdown — after all build stages complete successfully — so exit code 3221225477 (0xC0000005) is observed even though
quantized.onnxwas written correctly. This explains why 19/22 non-cached models failed in the eval run (cache hits never initializeEpCatalog).Stack trace captured by
faulthandler:Fix
After
find_all_providers()returns, all EP library paths have been extracted into a plain Python dict. TheEpCataloghandle is no longer needed for the rest of the process lifetime. Settingself._catalog._handle = NonemakesEpCatalog.close()a no-op (it guards onif self._handle:), preventing the crash regardless of when or which thread triggers__del__. OS reclaims native resources on process exit.Applied to both
EpCatalog-holding singletons:WinML.__init__inwinml.pyWinMLEPRegistry._load_ep_cataloginsession/ep_registry.pyVerification
Ran
Intel/dpt-hybrid-midasNPU build (the model that crashed most reliably):🤖 Generated with Claude Code