New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for the OCR engine included in Windows 10. #7361

Merged
merged 11 commits into from Aug 1, 2017

Conversation

Projects
None yet
10 participants
@jcsteh
Contributor

jcsteh commented Jul 6, 2017

Link to issue number:

None. This is a new feature I started experimenting with a few months ago.

Edit by @leonardder: Fixes #3050, since this adds the json module.

Summary of the issue:

There are many cases where text is inaccessible and it is useful to be able to use OCR to access it. There is already an OCR add-on for NVDA, but this requires a separate download. We could never include it in NVDA because of the size of the dependencies. Also, it uses the open source Tesseract engine, which has some quality problems. Windows 10 includes an OCR engine with support for 25+ languages. Having support for this out-of-the-box means users can use it without any additional downloads. Because it's a commercial engine, it's possible that it is better quality (though I haven't done much in the way of comparison).

Description of how this pull request fixes the issue:

This accesses the UWP OCR API via code in the nvdaHelperLocalWin10 C++/CX dll.
Users press NVDA+r to recognize the text of the current navigator object. Once recognition is complete, the result is presented in a document which can be read with the cursor keys, etc. Enter can also be pressed to click the text at the cursor.
Much of the base content recognition functionality has been abstracted into the new contentRecog framework, allowing other recognizers to be easily implemented in future (in both NVDA core and add-ons).

Testing performed:

  1. Tested in the Run dialog, both the entire dialog and individual controls.
  2. Tested on the nvaccess.org website, both on the logo and various paragraphs of text.

Known issues with pull request:

  1. Obviously, this only works in Windows 10.
  2. Content that is visually separate but positioned horizontally gets rendered on separate lines; e.g. the buttons in the Run dialog. The OCR engine is choosing to treat this as separate lines. Whether this is desirable is controversial. We could probably compare y coordinates in future if there is sufficient demand.

Change log entry:

New Features:

- NVDA can now use the OCR functionality included in Windows 10 to recognize the text of images or inaccessible applications. (#7361)
 - The language can be set from the new Windows 10 OCR dialog in NVDA Preferences.
 - To recognize the content of the current navigator object, press NVDA+r.
 - See the Content Recognition section of the User Guide for further details.

Changes for Developers:

- Support for content recognizers such as OCR and image description tools can be easily implemented using the new contentRecog package. (#7361)
- The Python json package is now included in NVDA binary builds. (#3050)
Support for the OCR engine included in Windows 10.
This accesses the UWP OCR API via code in the nvdaHelperLocalWin10 C++/CX dll.
Users press NVDA+r to recognize the text of the current navigator object. Once recognition is complete, the result is presented in a document which can be read with the cursor keys, etc. Enter can also be pressed to click the text at the cursor.
Much of the base content recognition functionality has been abstracted into the new contentRecog framework, allowing other recognizers to be easily implemented in future (in both NVDA core and add-ons).

@jcsteh jcsteh requested a review from feerrenrut Jul 6, 2017

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 6, 2017

Contributor

Forgot to note that I tested clicking buttons in the Run dialog by pressing enter on them. This is an important piece of the functionality.

Contributor

jcsteh commented Jul 6, 2017

Forgot to note that I tested clicking buttons in the Run dialog by pressing enter on them. This is an important piece of the functionality.

@derekriemer

This comment has been minimized.

Show comment
Hide comment
@derekriemer

derekriemer Jul 6, 2017

Collaborator

does the review cursor support now go away (Originally it was doing this with review).

Collaborator

derekriemer commented Jul 6, 2017

does the review cursor support now go away (Originally it was doing this with review).

@leonardder

This comment has been minimized.

Show comment
Hide comment
@leonardder

leonardder Jul 6, 2017

Collaborator

I wonder whether it would be possible to have the OCR language tied to the NVDA language by default, with a fallback to English when the language is not available for OCR? Setting the default to English means that in many cases, one will have to change the default.

Collaborator

leonardder commented Jul 6, 2017

I wonder whether it would be possible to have the OCR language tied to the NVDA language by default, with a fallback to English when the language is not available for OCR? Setting the default to English means that in many cases, one will have to change the default.

@PratikP1

This comment has been minimized.

Show comment
Hide comment
@PratikP1

PratikP1 Jul 6, 2017

Once incubated, I'll do comparison testing with the Tesseract engine output. I probably won't have time to do testing with multiple languages. If there's a call for it, I'll put testing Spanish, French, German, and Italian on my list.

PratikP1 commented Jul 6, 2017

Once incubated, I'll do comparison testing with the Tesseract engine output. I probably won't have time to do testing with multiple languages. If there's a call for it, I'll put testing Spanish, French, German, and Italian on my list.

@Brian1Gaff

This comment has been minimized.

Show comment
Hide comment
@Brian1Gaff

Brian1Gaff Jul 6, 2017

Brian1Gaff commented Jul 6, 2017

@josephsl

This comment has been minimized.

Show comment
Hide comment
@josephsl

josephsl Jul 6, 2017

Collaborator
Collaborator

josephsl commented Jul 6, 2017

@josephsl

This comment has been minimized.

Show comment
Hide comment
@josephsl

josephsl Jul 6, 2017

Collaborator
Collaborator

josephsl commented Jul 6, 2017

@leonardder

This comment has been minimized.

Show comment
Hide comment
@leonardder

leonardder Jul 6, 2017

Collaborator

@jcsteh: May be this is the right time to add appropriate headers to screenBitmap.py?

Collaborator

leonardder commented Jul 6, 2017

@jcsteh: May be this is the right time to add appropriate headers to screenBitmap.py?

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 7, 2017

Contributor

@derekriemer commented on 6 Jul 2017, 19:24 GMT+10:

does the review cursor support now go away (Originally it was doing this with review).

You can still use the review cursor, but you may as well use the system cursor, since you get thrown into a fake document with the result. I had three reasons for doing this:

  1. Some users seemed to be confused by having to use the review cursor after doing OCR.
  2. The approach used by the OCR add-on overrode the object's TextInfo. This means that, for example, if you ran this on an editable text control, pressing the arrow keys afterwards would throw exceptions.
  3. It's nice to just be able to press enter on a piece of text to activate it, just like browse mode. :)

@leonardder commented on 6 Jul 2017, 19:29 GMT+10:

I wonder whether it would be possible to have the OCR language tied to the NVDA language by default, with a fallback to English when the language is not available for OCR? Setting the default to English means that in many cases, one will have to change the default.

Originally, I couldn't think of a decent way to do this; the UWP OCR code doesn't get loaded until it's needed, so there's no obvious place to put this. However, I think I might have some other ideas, so I'll see what I can do.

@josephsl commented on 7 Jul 2017, 01:17 GMT+10:

NVDA language, Windows display language, and then to English as a last resort.

The NVDA language is set based on the Windows display language in most cases unless the user explicitly overrides it.

@josephsl commented on 7 Jul 2017, 01:03 GMT+10:

Hi, would it be possible for someone to include online - AI-based contentRecog into this later?

The framework does allow for that, yes. Note that most cloud image description APIs are not free.

@leonardder commented on 7 Jul 2017, 01:31 GMT+10:

May be this is the right time to add appropriate headers to screenBitmap.py?

Shall do.

Contributor

jcsteh commented Jul 7, 2017

@derekriemer commented on 6 Jul 2017, 19:24 GMT+10:

does the review cursor support now go away (Originally it was doing this with review).

You can still use the review cursor, but you may as well use the system cursor, since you get thrown into a fake document with the result. I had three reasons for doing this:

  1. Some users seemed to be confused by having to use the review cursor after doing OCR.
  2. The approach used by the OCR add-on overrode the object's TextInfo. This means that, for example, if you ran this on an editable text control, pressing the arrow keys afterwards would throw exceptions.
  3. It's nice to just be able to press enter on a piece of text to activate it, just like browse mode. :)

@leonardder commented on 6 Jul 2017, 19:29 GMT+10:

I wonder whether it would be possible to have the OCR language tied to the NVDA language by default, with a fallback to English when the language is not available for OCR? Setting the default to English means that in many cases, one will have to change the default.

Originally, I couldn't think of a decent way to do this; the UWP OCR code doesn't get loaded until it's needed, so there's no obvious place to put this. However, I think I might have some other ideas, so I'll see what I can do.

@josephsl commented on 7 Jul 2017, 01:17 GMT+10:

NVDA language, Windows display language, and then to English as a last resort.

The NVDA language is set based on the Windows display language in most cases unless the user explicitly overrides it.

@josephsl commented on 7 Jul 2017, 01:03 GMT+10:

Hi, would it be possible for someone to include online - AI-based contentRecog into this later?

The framework does allow for that, yes. Note that most cloud image description APIs are not free.

@leonardder commented on 7 Jul 2017, 01:31 GMT+10:

May be this is the right time to add appropriate headers to screenBitmap.py?

Shall do.

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 7, 2017

Contributor

I believe I've now addressed these comments. Thanks for the pre-review feedback. :)

Contributor

jcsteh commented Jul 7, 2017

I believe I've now addressed these comments. Thanks for the pre-review feedback. :)

@leonardder

This comment has been minimized.

Show comment
Hide comment
@leonardder

leonardder Jul 7, 2017

Collaborator

You might want to fix the merge conflict before the review takes place.

Another thing I'd like to bring up: The script in globalCommands is now specific for windows 10 ocr. How about a system where new OCR engines can register themselves? You could change the windows 10 ocr dialog to a more generic dialog where you can set the OCR engine to use if multiple engines are available. This means that, on windows<10, an add-on can provide an engine and people will be able to use nvda+r out of the box. Thus, NVDA+r will OCR with the currently selected OCR engine.

Collaborator

leonardder commented Jul 7, 2017

You might want to fix the merge conflict before the review takes place.

Another thing I'd like to bring up: The script in globalCommands is now specific for windows 10 ocr. How about a system where new OCR engines can register themselves? You could change the windows 10 ocr dialog to a more generic dialog where you can set the OCR engine to use if multiple engines are available. This means that, on windows<10, an add-on can provide an engine and people will be able to use nvda+r out of the box. Thus, NVDA+r will OCR with the currently selected OCR engine.

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 7, 2017

Contributor
Contributor

jcsteh commented Jul 7, 2017

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 9, 2017

Contributor
Contributor

jcsteh commented Jul 9, 2017

@feerrenrut

I haven't made it all the way through this. But here are my comments so far.

Show outdated Hide outdated nvdaHelper/localWin10/utils.h Outdated
*/
#pragma once
#define export __declspec(dllexport)

This comment has been minimized.

@feerrenrut

feerrenrut Jul 10, 2017

Contributor

In case I dont notice later, how does this become an import?

@feerrenrut

feerrenrut Jul 10, 2017

Contributor

In case I dont notice later, how does this become an import?

This comment has been minimized.

@jcsteh

jcsteh Jul 11, 2017

Contributor

In case I dont notice later, how does this become an import?

This exports the symbol from the dll so that a caller loading the dll can access it. We then access it using ctypes (which internally uses GetProcAddress).

@jcsteh

jcsteh Jul 11, 2017

Contributor

In case I dont notice later, how does this become an import?

This exports the symbol from the dll so that a caller loading the dll can access it. We then access it using ctypes (which internally uses GetProcAddress).

Show outdated Hide outdated nvdaHelper/localWin10/uwpOcr.h Outdated
Show outdated Hide outdated nvdaHelper/localWin10/uwpOcr.cpp Outdated
Show outdated Hide outdated source/contentRecog/uwpOcr.py Outdated
Show outdated Hide outdated source/contentRecog/__init__.py Outdated
word = nextWord
return textInfos.Point(word.left, word.top)
class SimpleTextResult(RecognitionResult):

This comment has been minimized.

@feerrenrut

feerrenrut Jul 10, 2017

Contributor

This might become clear later in the review and probably needs to be handled higher up than here, but at the very least we would know the area of the screen that the image used for the OCR came from (assuming nvda did the screen grab). Obviously this is not true for an image obtained from elsewhere.

@feerrenrut

feerrenrut Jul 10, 2017

Contributor

This might become clear later in the review and probably needs to be handled higher up than here, but at the very least we would know the area of the screen that the image used for the OCR came from (assuming nvda did the screen grab). Obviously this is not true for an image obtained from elsewhere.

Show outdated Hide outdated source/contentRecog/recogUi.py Outdated
Show outdated Hide outdated source/contentRecog/recogUi.py Outdated
Show outdated Hide outdated tests/unit/contentRecog/test_contentRecog.py Outdated
Show outdated Hide outdated tests/unit/contentRecog/test_contentRecog.py Outdated

@jcsteh jcsteh requested a review from feerrenrut Jul 11, 2017

Show outdated Hide outdated source/contentRecog/__init__.py Outdated
@derekriemer

This comment has been minimized.

Show comment
Hide comment
@derekriemer

derekriemer Jul 11, 2017

Collaborator

what's with the assignment of @jcsteh here by nvaccessAuto?

Collaborator

derekriemer commented Jul 11, 2017

what's with the assignment of @jcsteh here by nvaccessAuto?

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 11, 2017

Contributor
Contributor

jcsteh commented Jul 11, 2017

@leonardder

This comment has been minimized.

Show comment
Hide comment
@leonardder

leonardder Jul 12, 2017

Collaborator

Two remarks:

  1. It is not possible to use the find functionality for the resulting cursor manager. You can open the find dialog and even find something, but as soon as you return, the focus will return back to the original focus object.
  2. Would it be an option to also add the space bar as a default binding for activate?
Collaborator

leonardder commented Jul 12, 2017

Two remarks:

  1. It is not possible to use the find functionality for the resulting cursor manager. You can open the find dialog and even find something, but as soon as you return, the focus will return back to the original focus object.
  2. Would it be an option to also add the space bar as a default binding for activate?

jcsteh added some commits Jul 18, 2017

recogUi: Queue the focus event to the main thread, rather than runnin…
…g it on the current thread.

UWP OCR calls the result callback in a background thread, but NVDA events must never be handled in a background thread.
This was causing MSHTML vbufs to die due to COM queries failing because they were on the wrong thread.
Added a debugWarning in IAccessibleRole which would have made this easier to catch.
Fixes #7399.
@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 18, 2017

Contributor

@leonardder commented on 12 Jul 2017, 20:52 GMT+12:

  1. It is not possible to use the find functionality for the resulting cursor manager. You can open the find dialog and even find something, but as soon as you return, the focus will return back to the original focus object.

For now, I've disabled the find commands; the user gets a message indicating that they aren't supported. This is tricky to support because the dialog causes the focus to move and we thus lose our fake focus. There are ways we can work around this, but I'd like to get this into 2017.3 and there isn't time ot explore them. I think the functionality is sufficient as it is to be worth releasing. I filed #7415 as a feature request for this.

  1. Would it be an option to also add the space bar as a default binding for activate?

Done.

Thanks for the feedback.

Contributor

jcsteh commented Jul 18, 2017

@leonardder commented on 12 Jul 2017, 20:52 GMT+12:

  1. It is not possible to use the find functionality for the resulting cursor manager. You can open the find dialog and even find something, but as soon as you return, the focus will return back to the original focus object.

For now, I've disabled the find commands; the user gets a message indicating that they aren't supported. This is tricky to support because the dialog causes the focus to move and we thus lose our fake focus. There are ways we can work around this, but I'd like to get this into 2017.3 and there isn't time ot explore them. I think the functionality is sufficient as it is to be worth releasing. I filed #7415 as a feature request for this.

  1. Would it be an option to also add the space bar as a default binding for activate?

Done.

Thanks for the feedback.

@jcsteh jcsteh requested a review from feerrenrut Jul 18, 2017

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 18, 2017

Contributor

@feerrenrut, could you please review the latest three commits ASAP? As discussed, we want to get this into 2017.3 if possible, hence the time constraint. Thanks.

Contributor

jcsteh commented Jul 18, 2017

@feerrenrut, could you please review the latest three commits ASAP? As discussed, we want to get this into 2017.3 if possible, hence the time constraint. Thanks.

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 18, 2017

Contributor

Sorry @feerrenrut; just pushed one commit because I forgot to commit the User Guide update. :(

Contributor

jcsteh commented Jul 18, 2017

Sorry @feerrenrut; just pushed one commit because I forgot to commit the User Guide update. :(

Show outdated Hide outdated source/contentRecog/recogUi.py Outdated

jcsteh added a commit that referenced this pull request Jul 19, 2017

@tmthywynn8

This comment has been minimized.

Show comment
Hide comment
@tmthywynn8

tmthywynn8 Jul 20, 2017

I assume that at this point pressing Enter/Space will perform the default action (NVDA+Enter) on the coordinates of the given text? What about a left double click, a right click, moving the mouse pointer, etc?

tmthywynn8 commented Jul 20, 2017

I assume that at this point pressing Enter/Space will perform the default action (NVDA+Enter) on the coordinates of the given text? What about a left double click, a right click, moving the mouse pointer, etc?

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 20, 2017

Contributor
Contributor

jcsteh commented Jul 20, 2017

@tmthywynn8

This comment has been minimized.

Show comment
Hide comment
@tmthywynn8

tmthywynn8 Jul 20, 2017

tmthywynn8 commented Jul 20, 2017

@leonardder

This comment has been minimized.

Show comment
Hide comment
@leonardder

leonardder Jul 24, 2017

Collaborator

It seems that in the current situation, one is able to run multiple ocr actions at once, e.g. when the OCR result is shown, you can do another ocr on the result object, and it will show another result.

Collaborator

leonardder commented Jul 24, 2017

It seems that in the current situation, one is able to run multiple ocr actions at once, e.g. when the OCR result is shown, you can do another ocr on the result object, and it will show another result.

@derekriemer

This comment has been minimized.

Show comment
Hide comment
@derekriemer

derekriemer Jul 31, 2017

Collaborator

This isn't really a bug.

Collaborator

derekriemer commented Jul 31, 2017

This isn't really a bug.

@jcsteh

This comment has been minimized.

Show comment
Hide comment
@jcsteh

jcsteh Jul 31, 2017

Contributor

@leonardder commented on 24 Jul 2017, 16:06 GMT+10:

It seems that in the current situation, one is able to run multiple ocr actions at once, e.g. when the OCR result is shown, you can do another ocr on the result object, and it will show another result.

I don't think this should delay this from merging to master or going into 2017.3, but I'll file a follow up PR to deal with it.

Contributor

jcsteh commented Jul 31, 2017

@leonardder commented on 24 Jul 2017, 16:06 GMT+10:

It seems that in the current situation, one is able to run multiple ocr actions at once, e.g. when the OCR result is shown, you can do another ocr on the result object, and it will show another result.

I don't think this should delay this from merging to master or going into 2017.3, but I'll file a follow up PR to deal with it.

@jcsteh jcsteh merged commit 5527f27 into master Aug 1, 2017

@nvaccessAuto nvaccessAuto removed the incubating label Aug 1, 2017

@nvaccessAuto nvaccessAuto added this to the 2017.3 milestone Aug 1, 2017

@jcsteh jcsteh deleted the uwpOcr branch Aug 1, 2017

@fisher729

This comment has been minimized.

Show comment
Hide comment
@fisher729

fisher729 Aug 2, 2017

I suggest the string(?) used when pressing the shortcut in input help says 'recognizes rather than recognize' to fall in line with other descriptors.

fisher729 commented Aug 2, 2017

I suggest the string(?) used when pressing the shortcut in input help says 'recognizes rather than recognize' to fall in line with other descriptors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment