Add on-device OCR engine (RapidOCR + ONNX Runtime + PP-OCRv5)#19787
Add on-device OCR engine (RapidOCR + ONNX Runtime + PP-OCRv5)#19787mattheliu wants to merge 2 commits intonvaccess:masterfrom
Conversation
Integrate RapidOCR with ONNX Runtime as an on-device OCR engine for NVDA, providing offline text recognition with significantly better CJK accuracy than Windows UWP OCR. The engine runs entirely on-device with no cloud dependency. New files: - source/contentRecog/onDeviceOcr/ - OCR engine package with abstract engine interface, RapidOCR implementation, and result coordinate converter - tests/unit/contentRecog/test_onDeviceOcr.py - unit tests Modified files: - configSpec.py: [onDeviceOcr] configuration section - settingsDialogs.py: OnDeviceOcrPanel settings UI - globalCommands.py: NVDA+Shift+R recognition, language cycle, auto-refresh toggle - core.py: engine shutdown on exit - pyproject.toml: rapidocr>=3.3.0, onnxruntime>=1.17.0 dependencies Note: uv.lock not updated - must run `uv lock` on Windows before merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hi, Thanks for the pull request. However, I would like you to consider the following before continuing: We had recent pull requests that were closed for an important reason: AI generated AND not filling out the pull request template specified by the NVDA project. Part of the thing about writing pull requests is that we are trying to submit code that makes a difference for a project or an organization in charge of a collaborative software development project. However, some projects and organizations expect pull request writers (including members of the project) to fill out a pull request template. This means anyone wishing to submit a pull request must think about more than just code or the AI engine used to generate the pull request (in this case, Anthropic/Claude), specifically the purpose of creating a PR template. So why did NVDA project and similar projects create a pull request template in the first place? For several reasons:
In addition to "admission" by Claude, this pull request creates a major problem with keyboard command assignment: NVDA+Shift+R is actually unavailable; NVDA+Shift+R is a command defined in Excel support module to set row headers. This fact should not be glossed over because what an AI thinks is available is not the case unless the code generator AI system can parse ALL of NVDA's source code (which takes time and resources). Thanks. |
Hi, Thank you for the detailed explanation and for pointing out these issues. You are absolutely right that the NVDA project’s pull request template exists to ensure contributors clearly describe the problem being addressed, testing approach, development considerations, and user impact. I apologize for not following the template in the initial submission. For now, I will mark this pull request as a draft while I continue testing and refining the implementation. This will give me time to properly review the NVDA PR template requirements, verify gesture availability (including avoiding conflicts like NVDA+Shift+R in the Excel module), and update the description with the required information such as testing plans and development notes. Thank you again for the guidance and for taking the time to review this. Best regards. |
Summary
New files
source/contentRecog/onDeviceOcr/— engine package (4 modules)tests/unit/contentRecog/test_onDeviceOcr.py— unit testsModified files
configSpec.py—[onDeviceOcr]config sectionsettingsDialogs.py—OnDeviceOcrPanel+ registrationglobalCommands.py— 3 new scripts (recognize, cycle language, toggle auto-refresh)core.py— engine shutdown on NVDA exitpyproject.toml—rapidocr>=3.3.0,onnxruntime>=1.17.0Pre-merge TODO
uv lockon Windows to updateuv.lockTest plan
python -m pytest tests/unit/contentRecog/test_onDeviceOcr.py -v🤖 Generated with Claude Code