Skip to content

Add on-device OCR engine (RapidOCR + ONNX Runtime + PP-OCRv5)#19787

Closed
mattheliu wants to merge 2 commits intonvaccess:masterfrom
mattheliu:feature/on-device-ocr
Closed

Add on-device OCR engine (RapidOCR + ONNX Runtime + PP-OCRv5)#19787
mattheliu wants to merge 2 commits intonvaccess:masterfrom
mattheliu:feature/on-device-ocr

Conversation

@mattheliu
Copy link

Summary

  • Integrate RapidOCR + ONNX Runtime as an on-device OCR engine for NVDA, providing offline text recognition with significantly better CJK accuracy than Windows UWP OCR
  • Add settings panel, keyboard shortcut (NVDA+Shift+R), language cycling, and auto-refresh toggle
  • Modular engine architecture (OcrEngine ABC) allows swapping the underlying OCR backend without modifying the recognizer

New files

  • source/contentRecog/onDeviceOcr/ — engine package (4 modules)
  • tests/unit/contentRecog/test_onDeviceOcr.py — unit tests

Modified files

  • configSpec.py[onDeviceOcr] config section
  • settingsDialogs.pyOnDeviceOcrPanel + registration
  • globalCommands.py — 3 new scripts (recognize, cycle language, toggle auto-refresh)
  • core.py — engine shutdown on NVDA exit
  • pyproject.tomlrapidocr>=3.3.0, onnxruntime>=1.17.0

Pre-merge TODO

  • Run uv lock on Windows to update uv.lock
  • Integration test on Windows with NVDA running

Test plan

  • Unit tests: python -m pytest tests/unit/contentRecog/test_onDeviceOcr.py -v
  • NVDA+Shift+R triggers on-device OCR recognition
  • Settings panel appears and saves language/auto-refresh preferences
  • Language cycle script works
  • Engine shuts down cleanly on NVDA exit

🤖 Generated with Claude Code

Integrate RapidOCR with ONNX Runtime as an on-device OCR engine for NVDA,
providing offline text recognition with significantly better CJK accuracy
than Windows UWP OCR. The engine runs entirely on-device with no cloud dependency.

New files:
- source/contentRecog/onDeviceOcr/ - OCR engine package with abstract engine
  interface, RapidOCR implementation, and result coordinate converter
- tests/unit/contentRecog/test_onDeviceOcr.py - unit tests

Modified files:
- configSpec.py: [onDeviceOcr] configuration section
- settingsDialogs.py: OnDeviceOcrPanel settings UI
- globalCommands.py: NVDA+Shift+R recognition, language cycle, auto-refresh toggle
- core.py: engine shutdown on exit
- pyproject.toml: rapidocr>=3.3.0, onnxruntime>=1.17.0 dependencies

Note: uv.lock not updated - must run `uv lock` on Windows before merge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattheliu mattheliu requested a review from a team as a code owner March 13, 2026 12:12
@mattheliu mattheliu requested a review from seanbudd March 13, 2026 12:12
@josephsl
Copy link
Contributor

Hi,

Thanks for the pull request. However, I would like you to consider the following before continuing:

We had recent pull requests that were closed for an important reason: AI generated AND not filling out the pull request template specified by the NVDA project. Part of the thing about writing pull requests is that we are trying to submit code that makes a difference for a project or an organization in charge of a collaborative software development project. However, some projects and organizations expect pull request writers (including members of the project) to fill out a pull request template. This means anyone wishing to submit a pull request must think about more than just code or the AI engine used to generate the pull request (in this case, Anthropic/Claude), specifically the purpose of creating a PR template.

So why did NVDA project and similar projects create a pull request template in the first place? For several reasons:

  1. For the project and/or the organization to communicate its expectations. The PR template for the NVDA screen reader project helps you gather and report information such as the issue the pull request addresses, testing plan and expectations, actual testing strategies, development notes, and user impact. The NVDA porject expects PR writers to fulfill expectations such as these, and the template serves as a guide to help you perform these checks.
  2. To help you organize your thoughts and information. An open-source project without a set pull request can give you freedom to describe the pull request in whatever style you are most comfortable with. This also means an AI system such as Claude can generate a template-like structure to describe the pull request. However, because the NVDA project has a template serving as a guide to fulfilling project expectations, the PR templat can help you organize your thoughts (and that of generated AI output) in a structured way.
  3. Speculation: to guard against freeform use of the pull request facility by AI systems. AI can generate pull request template content, and this PR is a good example. However, a project can define its own template to guard against a generalized PR content written by AI systems, more so if the template includes project specific questions. Following that, the pull request content written by Claude represents how an AI system developed by one organization with data pulled from various sources can create problems when another organization uses a pull request template that enforces specific expectations for anyone proposing a difference, including both human pull request writers and generative/agentic AI systems.

In addition to "admission" by Claude, this pull request creates a major problem with keyboard command assignment: NVDA+Shift+R is actually unavailable; NVDA+Shift+R is a command defined in Excel support module to set row headers. This fact should not be glossed over because what an AI thinks is available is not the case unless the code generator AI system can parse ALL of NVDA's source code (which takes time and resources).

Thanks.

@mattheliu
Copy link
Author

Hi,

Thanks for the pull request. However, I would like you to consider the following before continuing:

We had recent pull requests that were closed for an important reason: AI generated AND not filling out the pull request template specified by the NVDA project. Part of the thing about writing pull requests is that we are trying to submit code that makes a difference for a project or an organization in charge of a collaborative software development project. However, some projects and organizations expect pull request writers (including members of the project) to fill out a pull request template. This means anyone wishing to submit a pull request must think about more than just code or the AI engine used to generate the pull request (in this case, Anthropic/Claude), specifically the purpose of creating a PR template.

So why did NVDA project and similar projects create a pull request template in the first place? For several reasons:

  1. For the project and/or the organization to communicate its expectations. The PR template for the NVDA screen reader project helps you gather and report information such as the issue the pull request addresses, testing plan and expectations, actual testing strategies, development notes, and user impact. The NVDA porject expects PR writers to fulfill expectations such as these, and the template serves as a guide to help you perform these checks.
  2. To help you organize your thoughts and information. An open-source project without a set pull request can give you freedom to describe the pull request in whatever style you are most comfortable with. This also means an AI system such as Claude can generate a template-like structure to describe the pull request. However, because the NVDA project has a template serving as a guide to fulfilling project expectations, the PR templat can help you organize your thoughts (and that of generated AI output) in a structured way.
  3. Speculation: to guard against freeform use of the pull request facility by AI systems. AI can generate pull request template content, and this PR is a good example. However, a project can define its own template to guard against a generalized PR content written by AI systems, more so if the template includes project specific questions. Following that, the pull request content written by Claude represents how an AI system developed by one organization with data pulled from various sources can create problems when another organization uses a pull request template that enforces specific expectations for anyone proposing a difference, including both human pull request writers and generative/agentic AI systems.

In addition to "admission" by Claude, this pull request creates a major problem with keyboard command assignment: NVDA+Shift+R is actually unavailable; NVDA+Shift+R is a command defined in Excel support module to set row headers. This fact should not be glossed over because what an AI thinks is available is not the case unless the code generator AI system can parse ALL of NVDA's source code (which takes time and resources).

Thanks.

Hi,

Thank you for the detailed explanation and for pointing out these issues.

You are absolutely right that the NVDA project’s pull request template exists to ensure contributors clearly describe the problem being addressed, testing approach, development considerations, and user impact. I apologize for not following the template in the initial submission.

For now, I will mark this pull request as a draft while I continue testing and refining the implementation. This will give me time to properly review the NVDA PR template requirements, verify gesture availability (including avoiding conflicts like NVDA+Shift+R in the Excel module), and update the description with the required information such as testing plans and development notes.

Thank you again for the guidance and for taking the time to review this.

Best regards.

@mattheliu mattheliu marked this pull request as draft March 13, 2026 12:49
@mattheliu mattheliu closed this Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants