Add on-device OCR engine (RapidOCR + ONNX Runtime + PP-OCRv5) by mattheliu · Pull Request #19787 · nvaccess/nvda

mattheliu · 2026-03-13T12:12:14Z

Summary

Integrate RapidOCR + ONNX Runtime as an on-device OCR engine for NVDA, providing offline text recognition with significantly better CJK accuracy than Windows UWP OCR
Add settings panel, keyboard shortcut (NVDA+Shift+R), language cycling, and auto-refresh toggle
Modular engine architecture (OcrEngine ABC) allows swapping the underlying OCR backend without modifying the recognizer

New files

source/contentRecog/onDeviceOcr/ — engine package (4 modules)
tests/unit/contentRecog/test_onDeviceOcr.py — unit tests

Modified files

configSpec.py — [onDeviceOcr] config section
settingsDialogs.py — OnDeviceOcrPanel + registration
globalCommands.py — 3 new scripts (recognize, cycle language, toggle auto-refresh)
core.py — engine shutdown on NVDA exit
pyproject.toml — rapidocr>=3.3.0, onnxruntime>=1.17.0

Pre-merge TODO

Run uv lock on Windows to update uv.lock
Integration test on Windows with NVDA running

Test plan

Unit tests: python -m pytest tests/unit/contentRecog/test_onDeviceOcr.py -v
NVDA+Shift+R triggers on-device OCR recognition
Settings panel appears and saves language/auto-refresh preferences
Language cycle script works
Engine shuts down cleanly on NVDA exit

🤖 Generated with Claude Code

Integrate RapidOCR with ONNX Runtime as an on-device OCR engine for NVDA, providing offline text recognition with significantly better CJK accuracy than Windows UWP OCR. The engine runs entirely on-device with no cloud dependency. New files: - source/contentRecog/onDeviceOcr/ - OCR engine package with abstract engine interface, RapidOCR implementation, and result coordinate converter - tests/unit/contentRecog/test_onDeviceOcr.py - unit tests Modified files: - configSpec.py: [onDeviceOcr] configuration section - settingsDialogs.py: OnDeviceOcrPanel settings UI - globalCommands.py: NVDA+Shift+R recognition, language cycle, auto-refresh toggle - core.py: engine shutdown on exit - pyproject.toml: rapidocr>=3.3.0, onnxruntime>=1.17.0 dependencies Note: uv.lock not updated - must run `uv lock` on Windows before merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

josephsl · 2026-03-13T12:43:35Z

Hi,

Thanks for the pull request. However, I would like you to consider the following before continuing:

We had recent pull requests that were closed for an important reason: AI generated AND not filling out the pull request template specified by the NVDA project. Part of the thing about writing pull requests is that we are trying to submit code that makes a difference for a project or an organization in charge of a collaborative software development project. However, some projects and organizations expect pull request writers (including members of the project) to fill out a pull request template. This means anyone wishing to submit a pull request must think about more than just code or the AI engine used to generate the pull request (in this case, Anthropic/Claude), specifically the purpose of creating a PR template.

So why did NVDA project and similar projects create a pull request template in the first place? For several reasons:

For the project and/or the organization to communicate its expectations. The PR template for the NVDA screen reader project helps you gather and report information such as the issue the pull request addresses, testing plan and expectations, actual testing strategies, development notes, and user impact. The NVDA porject expects PR writers to fulfill expectations such as these, and the template serves as a guide to help you perform these checks.
To help you organize your thoughts and information. An open-source project without a set pull request can give you freedom to describe the pull request in whatever style you are most comfortable with. This also means an AI system such as Claude can generate a template-like structure to describe the pull request. However, because the NVDA project has a template serving as a guide to fulfilling project expectations, the PR templat can help you organize your thoughts (and that of generated AI output) in a structured way.
Speculation: to guard against freeform use of the pull request facility by AI systems. AI can generate pull request template content, and this PR is a good example. However, a project can define its own template to guard against a generalized PR content written by AI systems, more so if the template includes project specific questions. Following that, the pull request content written by Claude represents how an AI system developed by one organization with data pulled from various sources can create problems when another organization uses a pull request template that enforces specific expectations for anyone proposing a difference, including both human pull request writers and generative/agentic AI systems.

In addition to "admission" by Claude, this pull request creates a major problem with keyboard command assignment: NVDA+Shift+R is actually unavailable; NVDA+Shift+R is a command defined in Excel support module to set row headers. This fact should not be glossed over because what an AI thinks is available is not the case unless the code generator AI system can parse ALL of NVDA's source code (which takes time and resources).

Thanks.

mattheliu · 2026-03-13T12:49:51Z

Hi,

Thanks for the pull request. However, I would like you to consider the following before continuing:

We had recent pull requests that were closed for an important reason: AI generated AND not filling out the pull request template specified by the NVDA project. Part of the thing about writing pull requests is that we are trying to submit code that makes a difference for a project or an organization in charge of a collaborative software development project. However, some projects and organizations expect pull request writers (including members of the project) to fill out a pull request template. This means anyone wishing to submit a pull request must think about more than just code or the AI engine used to generate the pull request (in this case, Anthropic/Claude), specifically the purpose of creating a PR template.

So why did NVDA project and similar projects create a pull request template in the first place? For several reasons:

For the project and/or the organization to communicate its expectations. The PR template for the NVDA screen reader project helps you gather and report information such as the issue the pull request addresses, testing plan and expectations, actual testing strategies, development notes, and user impact. The NVDA porject expects PR writers to fulfill expectations such as these, and the template serves as a guide to help you perform these checks.

To help you organize your thoughts and information. An open-source project without a set pull request can give you freedom to describe the pull request in whatever style you are most comfortable with. This also means an AI system such as Claude can generate a template-like structure to describe the pull request. However, because the NVDA project has a template serving as a guide to fulfilling project expectations, the PR templat can help you organize your thoughts (and that of generated AI output) in a structured way.

Speculation: to guard against freeform use of the pull request facility by AI systems. AI can generate pull request template content, and this PR is a good example. However, a project can define its own template to guard against a generalized PR content written by AI systems, more so if the template includes project specific questions. Following that, the pull request content written by Claude represents how an AI system developed by one organization with data pulled from various sources can create problems when another organization uses a pull request template that enforces specific expectations for anyone proposing a difference, including both human pull request writers and generative/agentic AI systems.

In addition to "admission" by Claude, this pull request creates a major problem with keyboard command assignment: NVDA+Shift+R is actually unavailable; NVDA+Shift+R is a command defined in Excel support module to set row headers. This fact should not be glossed over because what an AI thinks is available is not the case unless the code generator AI system can parse ALL of NVDA's source code (which takes time and resources).

Thanks.

Hi,

Thank you for the detailed explanation and for pointing out these issues.

You are absolutely right that the NVDA project’s pull request template exists to ensure contributors clearly describe the problem being addressed, testing approach, development considerations, and user impact. I apologize for not following the template in the initial submission.

For now, I will mark this pull request as a draft while I continue testing and refining the implementation. This will give me time to properly review the NVDA PR template requirements, verify gesture availability (including avoiding conflicts like NVDA+Shift+R in the Excel module), and update the description with the required information such as testing plans and development notes.

Thank you again for the guidance and for taking the time to review this.

Best regards.

mattheliu requested a review from a team as a code owner March 13, 2026 12:12

mattheliu requested a review from seanbudd March 13, 2026 12:12

Pre-commit auto-fix

0bcb87d

mattheliu marked this pull request as draft March 13, 2026 12:49

mattheliu closed this Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add on-device OCR engine (RapidOCR + ONNX Runtime + PP-OCRv5)#19787

Add on-device OCR engine (RapidOCR + ONNX Runtime + PP-OCRv5)#19787
mattheliu wants to merge 2 commits intonvaccess:masterfrom
mattheliu:feature/on-device-ocr

mattheliu commented Mar 13, 2026

Uh oh!

josephsl commented Mar 13, 2026

Uh oh!

mattheliu commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mattheliu commented Mar 13, 2026

Summary

New files

Modified files

Pre-merge TODO

Test plan

Uh oh!

josephsl commented Mar 13, 2026

Uh oh!

mattheliu commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants