Support for the OCR engine included in Windows 10. #7361

jcsteh · 2017-07-06T06:42:34Z

Link to issue number:

None. This is a new feature I started experimenting with a few months ago.

Edit by @LeonarddeR: Fixes #3050, since this adds the json module.

Summary of the issue:

There are many cases where text is inaccessible and it is useful to be able to use OCR to access it. There is already an OCR add-on for NVDA, but this requires a separate download. We could never include it in NVDA because of the size of the dependencies. Also, it uses the open source Tesseract engine, which has some quality problems. Windows 10 includes an OCR engine with support for 25+ languages. Having support for this out-of-the-box means users can use it without any additional downloads. Because it's a commercial engine, it's possible that it is better quality (though I haven't done much in the way of comparison).

Description of how this pull request fixes the issue:

This accesses the UWP OCR API via code in the nvdaHelperLocalWin10 C++/CX dll.
Users press NVDA+r to recognize the text of the current navigator object. Once recognition is complete, the result is presented in a document which can be read with the cursor keys, etc. Enter can also be pressed to click the text at the cursor.
Much of the base content recognition functionality has been abstracted into the new contentRecog framework, allowing other recognizers to be easily implemented in future (in both NVDA core and add-ons).

Testing performed:

Tested in the Run dialog, both the entire dialog and individual controls.
Tested on the nvaccess.org website, both on the logo and various paragraphs of text.

Known issues with pull request:

Obviously, this only works in Windows 10.
Content that is visually separate but positioned horizontally gets rendered on separate lines; e.g. the buttons in the Run dialog. The OCR engine is choosing to treat this as separate lines. Whether this is desirable is controversial. We could probably compare y coordinates in future if there is sufficient demand.

Change log entry:

New Features:

- NVDA can now use the OCR functionality included in Windows 10 to recognize the text of images or inaccessible applications. (#7361)
 - The language can be set from the new Windows 10 OCR dialog in NVDA Preferences.
 - To recognize the content of the current navigator object, press NVDA+r.
 - See the Content Recognition section of the User Guide for further details.

Changes for Developers:

- Support for content recognizers such as OCR and image description tools can be easily implemented using the new contentRecog package. (#7361)
- The Python json package is now included in NVDA binary builds. (#3050)

This accesses the UWP OCR API via code in the nvdaHelperLocalWin10 C++/CX dll. Users press NVDA+r to recognize the text of the current navigator object. Once recognition is complete, the result is presented in a document which can be read with the cursor keys, etc. Enter can also be pressed to click the text at the cursor. Much of the base content recognition functionality has been abstracted into the new contentRecog framework, allowing other recognizers to be easily implemented in future (in both NVDA core and add-ons).

jcsteh · 2017-07-06T06:43:53Z

Forgot to note that I tested clicking buttons in the Run dialog by pressing enter on them. This is an important piece of the functionality.

derekriemer · 2017-07-06T09:24:26Z

does the review cursor support now go away (Originally it was doing this with review).

LeonarddeR · 2017-07-06T09:29:29Z

I wonder whether it would be possible to have the OCR language tied to the NVDA language by default, with a fallback to English when the language is not available for OCR? Setting the default to English means that in many cases, one will have to change the default.

PratikP1 · 2017-07-06T09:37:29Z

Once incubated, I'll do comparison testing with the Tesseract engine output. I probably won't have time to do testing with multiple languages. If there's a call for it, I'll put testing Spanish, French, German, and Italian on my list.

Brian1Gaff · 2017-07-06T10:23:39Z

Are you saying that there is an ocr in windows 10? Also was there not some talk awhile back about a new version of the open source ocr and the problem was its size as I recall. bglists@blueyonder.co.uk Sent via blueyonder. Please address personal email to:- briang1@blueyonder.co.uk, putting 'Brian Gaff' in the display name field. ----- Original Message -----

josephsl · 2017-07-06T15:03:28Z

Hi, would it be possible for someone to include online - AI-based contentRecog into this later? Thanks. From: James Teh [mailto:notifications@github.com] Sent: Wednesday, July 5, 2017 11:44 PM To: nvaccess/nvda <nvda@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: Re: [nvaccess/nvda] Support for the OCR engine included in Windows 10. (#7361) Forgot to note that I tested clicking buttons in the Run dialog by pressing enter on them. This is an important piece of the functionality. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#7361 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AHgLkCsBWDpXIimdadw9dvBRY0Mp6me-ks5sLIIqgaJpZM4OPOFb> .

josephsl · 2017-07-06T15:17:55Z

Hi, or rather, NVDA language, Windows display language, and then to English as a last resort. Thanks. From: Leonard de Ruijter [mailto:notifications@github.com] Sent: Thursday, July 6, 2017 2:30 AM To: nvaccess/nvda <nvda@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: Re: [nvaccess/nvda] Support for the OCR engine included in Windows 10. (#7361) I wonder whether it would be possible to have the OCR language tied to the NVDA language by default, with a fallback to English when the language is not available for OCR? Setting the default to English means that in many cases, one will have to change the default. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#7361 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AHgLkKpfi3fBtlNla5fmXlAuRhyQ4w4pks5sLKj6gaJpZM4OPOFb> .

LeonarddeR · 2017-07-06T15:31:31Z

@jcsteh: May be this is the right time to add appropriate headers to screenBitmap.py?

jcsteh · 2017-07-07T00:33:40Z

@derekriemer commented on 6 Jul 2017, 19:24 GMT+10:

does the review cursor support now go away (Originally it was doing this with review).

You can still use the review cursor, but you may as well use the system cursor, since you get thrown into a fake document with the result. I had three reasons for doing this:

Some users seemed to be confused by having to use the review cursor after doing OCR.
The approach used by the OCR add-on overrode the object's TextInfo. This means that, for example, if you ran this on an editable text control, pressing the arrow keys afterwards would throw exceptions.
It's nice to just be able to press enter on a piece of text to activate it, just like browse mode. :)

@leonardder commented on 6 Jul 2017, 19:29 GMT+10:

I wonder whether it would be possible to have the OCR language tied to the NVDA language by default, with a fallback to English when the language is not available for OCR? Setting the default to English means that in many cases, one will have to change the default.

Originally, I couldn't think of a decent way to do this; the UWP OCR code doesn't get loaded until it's needed, so there's no obvious place to put this. However, I think I might have some other ideas, so I'll see what I can do.

@josephsl commented on 7 Jul 2017, 01:17 GMT+10:

NVDA language, Windows display language, and then to English as a last resort.

The NVDA language is set based on the Windows display language in most cases unless the user explicitly overrides it.

@josephsl commented on 7 Jul 2017, 01:03 GMT+10:

Hi, would it be possible for someone to include online - AI-based contentRecog into this later?

The framework does allow for that, yes. Note that most cloud image description APIs are not free.

@leonardder commented on 7 Jul 2017, 01:31 GMT+10:

May be this is the right time to add appropriate headers to screenBitmap.py?

Shall do.

jcsteh · 2017-07-07T02:16:46Z

I believe I've now addressed these comments. Thanks for the pre-review feedback. :)

LeonarddeR · 2017-07-07T18:08:11Z

You might want to fix the merge conflict before the review takes place.

Another thing I'd like to bring up: The script in globalCommands is now specific for windows 10 ocr. How about a system where new OCR engines can register themselves? You could change the windows 10 ocr dialog to a more generic dialog where you can set the OCR engine to use if multiple engines are available. This means that, on windows<10, an add-on can provide an engine and people will be able to use nvda+r out of the box. Thus, NVDA+r will OCR with the currently selected OCR engine.

jcsteh · 2017-07-07T20:40:32Z

Blerg. Thanks for pointing out the merge conflict; I'll take a look In the first version of this code (before I submitted a PR), there was a system for registering recognizers. I imagined the user would select a recognizer to use as a setting and then NVDA+r would use that, much as you suggest. However, this system was designed to be for any kind of image to text recognition, not just OCR. Discussions with @derekriemer and @md_curran convinced me to drop the idea of a single selectable recognizer because that would make it tedious to frequently use both Win 10 OCR and Microsoft Computer Vision for image description, for example. also, given how new image description is, users may conceivably want to try multiple services to get the best of all worlds. This might even be true for OCR, though perhaps a little less likely.

jcsteh · 2017-07-09T00:07:20Z

Obviously, that's the general idea, but it seems the changes I made to address comments after I requested review (but before it occurred) inadvertently caused a merge conflict. No idea why yet; will investigate when I'm back at work, probably before Reef gets to it. :)

feerrenrut

I haven't made it all the way through this. But here are my comments so far.

feerrenrut · 2017-07-10T08:07:57Z

nvdaHelper/localWin10/utils.h

+
+#include <robuffer.h>
+
+byte* getBytes(Windows::Storage::Streams::IBuffer^ buffer);


Some documentation?

feerrenrut · 2017-07-10T08:14:49Z

nvdaHelper/localWin10/uwpOcr.h

+*/
+
+#pragma once
+#define export __declspec(dllexport) 


In case I dont notice later, how does this become an import?

In case I dont notice later, how does this become an import?

This exports the symbol from the dll so that a caller loading the dll can access it. We then access it using ctypes (which internally uses GetProcAddress).

feerrenrut · 2017-07-10T08:16:14Z

nvdaHelper/localWin10/uwpOcr.h

+export UwpOcr* __stdcall uwpOcr_initialize(const char16* language, uwpOcr_Callback callback);
+export void __stdcall uwpOcr_terminate(UwpOcr* instance);
+export void __stdcall uwpOcr_recognize(UwpOcr* instance, const RGBQUAD* image, unsigned int width, unsigned int height);
+export BSTR __stdcall uwpOcr_getLanguages();


might be handy to know what formats the languages come back in?

A COM string.

I mean, since this gets several languages. Presumably each language is in the form en-gb and there is some separator?

feerrenrut · 2017-07-10T08:40:54Z

nvdaHelper/localWin10/uwpOcr.cpp

+UwpOcr* __stdcall uwpOcr_initialize(const char16* language, uwpOcr_Callback callback) {
+	auto engine = OcrEngine::TryCreateFromLanguage(ref new Language(ref new String(language)));
+	if (!engine)
+		return NULL;


can you use null_ptr instead of NULL

feerrenrut · 2017-07-10T08:42:17Z

source/contentRecog/uwpOcr.py

+			self.language = getConfigLanguage()
+		self._dll = NVDAHelper.getHelperLocalWin10Dll()
+
+	def recognize(self, pixels, width, height, coordConv, onResult):


Should this protect against width or height of zero or less?

feerrenrut · 2017-07-10T08:45:35Z

source/contentRecog/__init__.py

+		raise NotImplementedError
+
+# Used by LinesWordsResult.
+LwrWord = namedtuple("LwrWord", ("offset", "left", "top"))


I'm a little confused by the name LwrWord does it stand for "Lines words Result Word"?

feerrenrut · 2017-07-10T08:52:05Z

source/contentRecog/__init__.py

+			word = nextWord
+		return textInfos.Point(word.left, word.top)
+
+class SimpleTextResult(RecognitionResult):


This might become clear later in the review and probably needs to be handled higher up than here, but at the very least we would know the area of the screen that the image used for the OCR came from (assuming nvda did the screen grab). Obviously this is not true for an image obtained from elsewhere.

feerrenrut · 2017-07-11T02:26:52Z

source/contentRecog/recogUi.py

+def recognizeNavigatorObject(recognizer):
+	"""User interface function to recognize content in the navigator object.
+	This should be called from a script or in response to a GUI action.
+	@param recognizer: The content recognizer ot use.


typo ot -> to

feerrenrut · 2017-07-11T03:19:12Z

source/contentRecog/recogUi.py

+	left, top, width, height = nav.location
+	resize = recognizer.getResizeFactor(width, height)
+	coordConv = ResultCoordConverter(left, top, resize)
+	destWidth = int(width * resize)


I think it would be nice if the caller did not need to know about which direction the resize works, or the offsets with the resultCoordConverter.

feerrenrut · 2017-07-11T03:20:02Z

tests/unit/contentRecog/test_contentRecog.py

+		self.assertEqual(actual, (110, 220))
+
+	def test_noOffsetWithResize(self):
+		conv = contentRecog.ResultCoordConverter(0, 0, 2)


As discussed its a little confusing to me that 2 results in a halving of the coords.

feerrenrut · 2017-07-11T03:23:06Z

tests/unit/contentRecog/test_contentRecog.py

+	pass
+
+class TestLinesWordsResult(unittest.TestCase):
+	"""Tests contentRecog.LinesWordsResult and contentRecog.LwrTextInfo.


I think its worth expanding on this a little bit. This is essentially testing that we are able to parse and interpret the json we expect to be returned from an OCR library, right?

feerrenrut · 2017-07-11T05:50:06Z

source/contentRecog/__init__.py

-			in the supplied image to screen coordinates.
-			This should be used when returning coordinates to NVDA.
-		@type coordConverter: L{ResultCoordConverter}
+		@param imageInfo: Informationabout the image for recognition.


missing space between Information and about

derekriemer · 2017-07-11T11:51:41Z

what's with the assignment of @jcsteh here by nvaccessAuto?

jcsteh · 2017-07-11T12:45:43Z

The person who incubates an issue gets assigned to it, since they're responsible for merging it to master, etc.

LeonarddeR · 2017-07-12T08:52:32Z

Two remarks:

It is not possible to use the find functionality for the resulting cursor manager. You can open the find dialog and even find something, but as soon as you return, the focus will return back to the original focus object.
Would it be an option to also add the space bar as a default binding for activate?

…g it on the current thread. UWP OCR calls the result callback in a background thread, but NVDA events must never be handled in a background thread. This was causing MSHTML vbufs to die due to COM queries failing because they were on the wrong thread. Added a debugWarning in IAccessibleRole which would have made this easier to catch. Fixes #7399.

…ogs and cause the fake focus to be lost.

jcsteh · 2017-07-18T04:15:09Z

@leonardder commented on 12 Jul 2017, 20:52 GMT+12:

It is not possible to use the find functionality for the resulting cursor manager. You can open the find dialog and even find something, but as soon as you return, the focus will return back to the original focus object.

For now, I've disabled the find commands; the user gets a message indicating that they aren't supported. This is tricky to support because the dialog causes the focus to move and we thus lose our fake focus. There are ways we can work around this, but I'd like to get this into 2017.3 and there isn't time ot explore them. I think the functionality is sufficient as it is to be worth releasing. I filed #7415 as a feature request for this.

Would it be an option to also add the space bar as a default binding for activate?

Done.

Thanks for the feedback.

jcsteh · 2017-07-18T04:17:05Z

@feerrenrut, could you please review the latest three commits ASAP? As discussed, we want to get this into 2017.3 if possible, hence the time constraint. Thanks.

jcsteh · 2017-07-18T04:25:09Z

Sorry @feerrenrut; just pushed one commit because I forgot to commit the User Guide update. :(

feerrenrut · 2017-07-18T05:44:42Z

source/contentRecog/recogUi.py

@@ -74,7 +74,7 @@ def setFocus(self):
 			# we want the cursor to move to the focus (#3145).
 			# However, we don't want this for recognition results, as these aren't focusable.
 			ti._enteringFromOutside = True
-		eventHandler.executeEvent("gainFocus", self)
+		eventHandler.queueEvent("gainFocus", self)


Might be worth having a comment here to say why we do this. You can probably pretty much copy your commit message.

tmthywynn8 · 2017-07-20T02:38:37Z

I assume that at this point pressing Enter/Space will perform the default action (NVDA+Enter) on the coordinates of the given text? What about a left double click, a right click, moving the mouse pointer, etc?

jcsteh · 2017-07-20T02:52:22Z

Pressing enter or space moves the mouse and left clicks. If you want to do any other mouse things, you can roue mouse to review and then do the appropriate click, etc.

tmthywynn8 · 2017-07-20T03:19:27Z

Got it. As I have not tried it yet, and the documentation had no mentions of mouse commands, I assumed that either (1) the functionality was not implemented yet, or (2) anything you can do in a virtual buffer that does not cause any sort of movement can also be done in the OCR results. Thanks for the clarification.

LeonarddeR · 2017-07-24T06:06:13Z

It seems that in the current situation, one is able to run multiple ocr actions at once, e.g. when the OCR result is shown, you can do another ocr on the result object, and it will show another result.

derekriemer · 2017-07-31T02:24:41Z

This isn't really a bug.

jcsteh · 2017-07-31T02:33:37Z

@leonardder commented on 24 Jul 2017, 16:06 GMT+10:

It seems that in the current situation, one is able to run multiple ocr actions at once, e.g. when the OCR result is shown, you can do another ocr on the result object, and it will show another result.

I don't think this should delay this from merging to master or going into 2017.3, but I'll file a follow up PR to deal with it.

fisher729 · 2017-08-02T01:57:33Z

I suggest the string(?) used when pressing the shortcut in input help says 'recognizes rather than recognize' to fall in line with other descriptors.

… of "Recognize" for consistency. Re #7361.

jcsteh requested a review from feerrenrut July 6, 2017 06:42

jcsteh added 2 commits July 7, 2017 11:24

uwpOcr: Choose initial language based on NVDA interface language.

515ea4f

screenBitmap: Add header.

7476a97

Merge branch 'master' into uwpOcr, resolving conflicts.

620b283

feerrenrut reviewed Jul 10, 2017

View reviewed changes

feerrenrut suggested changes Jul 11, 2017

View reviewed changes

Review actions.

d3db08f

jcsteh requested a review from feerrenrut July 11, 2017 04:54

feerrenrut approved these changes Jul 11, 2017

View reviewed changes

Fix typo.

3a14035

jcsteh added a commit that referenced this pull request Jul 11, 2017

Support for the OCR engine included in Windows 10. Incubates #7361.

be0a44e

nvaccessAuto added the incubating label Jul 11, 2017

nvaccessAuto assigned jcsteh Jul 11, 2017

LeonarddeR mentioned this pull request Jul 14, 2017

In internet explorer, using newest OCR function will break browse mode. #7399

Closed

jcsteh added 3 commits July 18, 2017 15:49

recogUi: Space now activates the text uder the cursor as well as enter.

255368a

recogUi: Disable the find commands for now because they bring up dial…

dbf144a

…ogs and cause the fake focus to be lost.

jcsteh mentioned this pull request Jul 18, 2017

Support find commands in content recognition results #7415

Open

jcsteh requested a review from feerrenrut July 18, 2017 04:15

Ug. Forgot to commit the User Guide update.

2b46aa0

feerrenrut approved these changes Jul 18, 2017

View reviewed changes

LeonarddeR mentioned this pull request Jul 18, 2017

Add JSon module to NVDA distribution #3050

Closed

Comment.

e7b3ba1

jcsteh added a commit that referenced this pull request Jul 19, 2017

Win 10 OCR fixes. Incubates #7361.

6220c1b

jcsteh merged commit 5527f27 into master Aug 1, 2017

nvaccessAuto removed the incubating label Aug 1, 2017

nvaccessAuto added this to the 2017.3 milestone Aug 1, 2017

jcsteh deleted the uwpOcr branch August 1, 2017 23:17

jcsteh added a commit that referenced this pull request Aug 2, 2017

Change the command description for NVDA+r to say "Recognizes" instead…

ad1dc17

… of "Recognize" for consistency. Re #7361.

This was referenced Aug 2, 2017

Don't allow the user to use a content recognition command when already focused in a content recognition result #7465

Merged

Fix incorrect arguments in signature of ContentRecognizer.recognize abstract method #7478

Merged


		#include <robuffer.h>

		byte* getBytes(Windows::Storage::Streams::IBuffer^ buffer);

Support for the OCR engine included in Windows 10. #7361

Support for the OCR engine included in Windows 10. #7361

Conversation

jcsteh commented Jul 6, 2017 • edited

Link to issue number:

Summary of the issue:

Description of how this pull request fixes the issue:

Testing performed:

Known issues with pull request:

Change log entry:

jcsteh commented Jul 6, 2017

derekriemer commented Jul 6, 2017

LeonarddeR commented Jul 6, 2017

PratikP1 commented Jul 6, 2017

Brian1Gaff commented Jul 6, 2017 via email

josephsl commented Jul 6, 2017 via email

josephsl commented Jul 6, 2017 via email

LeonarddeR commented Jul 6, 2017

jcsteh commented Jul 7, 2017

jcsteh commented Jul 7, 2017

LeonarddeR commented Jul 7, 2017

jcsteh commented Jul 7, 2017 via email

jcsteh commented Jul 9, 2017 via email

feerrenrut left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekriemer commented Jul 11, 2017

jcsteh commented Jul 11, 2017 via email

LeonarddeR commented Jul 12, 2017 • edited

jcsteh commented Jul 18, 2017

jcsteh commented Jul 18, 2017

jcsteh commented Jul 18, 2017

Choose a reason for hiding this comment

tmthywynn8 commented Jul 20, 2017

jcsteh commented Jul 20, 2017 via email

tmthywynn8 commented Jul 20, 2017 via email

LeonarddeR commented Jul 24, 2017

derekriemer commented Jul 31, 2017

jcsteh commented Jul 31, 2017

fisher729 commented Aug 2, 2017

jcsteh commented Jul 6, 2017 •

edited

LeonarddeR commented Jul 12, 2017 •

edited