NVDA with OCR #1791

Closed
nvaccessAuto opened this Issue Sep 15, 2011 · 16 comments

Projects

None yet

1 participant

@nvaccessAuto

Reported by KevanGC on 2011-09-15 03:20
This suggestion is fairly simple. I don't think you'll be able to integrate OCR into NVDA soon, I'm not even sure if you've started working on it.
However I think we all would benefit from this. JAWS 13 now has it, and I don't feel it's worth the $1000 to purchase JFW just for OCR.

@nvaccessAuto

Comment 1 by kevinchao89 on 2011-11-11 10:18
Via @jcsteh (3 days ago)
Experimental #NVDASR OCR global plugin. Uses Tesseract. Only works in English. NVDA+r to OCr navigator obj. http://dl.dropbox.com/u/28976681/nvda_ocr.zip
Can route mouse to words using move mouse to current navigator object command. If it breaks, you keep both pieces.
Copy the ocr folder itself (not just its content) into NVDA globalPlugins folder.
then review commands.
Minor update, incl newer training data, remove extraneous whitespace in output.

@nvaccessAuto

Comment 2 by lpintes on 2011-11-11 12:03
Interesting plugin. I tested it in application where Nvda is unable to see text in flat review (maybe another opportunity for me to hack displayModel). Application is totally inaccessible. It is LiveCode, only for interest.
It is however so bad, text contains so many errors, that it is unusable in my opinion. I am not criticizing, everyone please treat this as a comment.

Is Tesseract really so bad? Or screen images are of low quality? Or can this be system dependent?

However if you have any idea about what to change and test, let me know. I am very interested in this.

@nvaccessAuto

Comment 3 by jteh on 2011-11-11 13:16
It could be many things. As I understand it, screen images are lower resolution than printed text and Tesseract hasn't been optimised to handle them. From what I've seen, Tesseract probably has a way to go before it is as good as some of the paid-for engines. It could also be that the application you are using uses a weird font or perhaps it is not displaying text at all. For best results, Make sure the application or window you are trying to OCR is maximised.

It sometimes does produce very good results. Even when it doesn't, the text doesn't have to be entirely readable to make this plugin useful. Sometimes, it's enough to give you the general idea of what's on the screen. Even when it's complete rubbish, the important thing is that there are now characters you can click on, which might help you click something you need with some trial and error.

@nvaccessAuto

Comment 4 by vortex on 2011-11-12 01:05
Did you try CuneiForm OCR? IN my tests it has better recognition accuracy than Tesseract. It is open source too and seems to have some special mode for low resolution images, haven't tried it though.
It can be found at:
http://en.openocr.org

@nvaccessAuto

Comment 5 by jteh on 2011-11-12 01:36
I did try to investigate Cuneiform. However, I had a lot of trouble finding a windows command line executable to download. The page I eventually got to was in Russian, which I can't read. Also, the OCR package we use needs to be able to communicate the location information for each word (at least) so the user can route the mouse to individual words. I believe Cuneiform supports HOCR, which does provide for this, but a quick search suggested that there were problems in Cuneiform's HOCR implementation.

@nvaccessAuto

Comment 6 by vortex on 2011-11-13 00:44
Here's CuneiFormrm command line for windows, if you're interested:
http://www.vortex.IM/download/cuneiform cli win.rar
Documentation for the command line can be found at:
http://manpages.ubuntu.com/manpages/natty/man1/cuneiform.1.html
I found some discussions about HOCR bad implementation in CuneiForm, But I'm not sure this is a problem for getting words coordinates.

@nvaccessAuto

Comment 7 by pvagner on 2011-11-15 10:08
Guys I may have missed this.
On my system the subprocess module does not include most of the flags this plugin uses.
Perhaps it's because I'm on Windows XP.
Just a note for someone else who might start playing with this.

@nvaccessAuto

Comment 8 by jteh on 2011-11-15 10:35
A few people have had this issue. Please try with Python 2.7.2 if you haven't already. I suspect they were only added in 2.7.1 or 2.7.2, but I can't confirm that and it isn't mentioned in the changelog.

@nvaccessAuto

Comment 9 by pvagner (in reply to comment 8) on 2011-11-16 14:51
Replying to jteh:

Please try with Python 2.7.2

Okay thanks for the tip updating my python installation helped.

Replying to vortex:

Here's CuneiFormrm command line for windows, if you're interested:

I've done some initial test with this build you've linked and eventhough cuneiform is reported to be able to handle smaller images for best results it's still good to double the size of an image.
With my limited captures tesseract appears to work better for me.

Onething I am wondering about and haven't checket whether is doable yet is whether it won't be better to convert to gray scale instead of black.

@nvaccessAuto

Comment 10 by vortex on 2011-11-16 19:30
@Peter:
I also tried replacing Tesseract with Cuneiform and failed, the log says something about the process returning a return code different than 1.
Could you post the modified init.py? I've seen cuneiform has some switches for low resolution fax images and dot matrix printers and would like to play with them.

@nvaccessAuto

Comment 11 by jteh (in reply to comment 9) on 2011-11-16 20:53
Replying to pvagner:

Onething I am wondering about and haven't checket whether is doable yet is whether it won't be better to convert to gray scale instead of black.

It should already be grayscale unless I'm misinterpreting the docs. I originally used monochrome, but changed it to grayscale later due to what I felt were better results. However, this needs further testing.

@nvaccessAuto

Comment 12 by lpintes on 2011-11-18 18:39
Another proof of real usefulness of this plugin, I was able to switch programs in totally inaccessible flash TV. Really cool!
I navigated to the object, switched to focus mode, performed OCR and clicked. It even worked in Aurora where the flash object was reported as unavailable, I didn't try to switch to focus mode there of course.
During testing of this, I encountered several errors like this:

Traceback (most recent call last):
  File "scriptHandler.py", line 165, in executeScript
    script(gesture)
  File ".\userConfig\globalPlugins\ocr\__init__.py", line 137, in script_ocrNavigatorObject
    img.save(imgFile)
  File ".\userConfig\globalPlugins\ocr\PIL\Image.py", line 1439, in save
    save_handler(self, fp, filename)
  File ".\userConfig\globalPlugins\ocr\PIL\BmpImagePlugin.py", line 242, in _save
    ImageFile._save(im, fp, [(0,0)+im.size, 0, (rawmode, stride, -1))](("raw",))
  File ".\userConfig\globalPlugins\ocr\PIL\ImageFile.py", line 498, in _save
    e.setimage(im.im, b)
SystemError: tile cannot extend outside image

What this error means?

@nvaccessAuto

Comment 13 by barichd on 2011-11-21 19:13
I had to make a small change to init.py to make the plugin work on my XP machine. See link:

[http://dl.dropbox.com/u/48550043/init.py]

@nvaccessAuto

Comment 14 by jteh (in reply to comment 13) on 2011-11-21 21:04
Replying to barichd:

I had to make a small change to init.py to make the plugin work on my XP machine.

I really don't follow why this change is necessary. Very strange. It's not just XP, as it works fine on other XP machines. Are you using a language other than English?

@nvaccessAuto

Comment 15 by ateu on 2011-12-03 10:24
Helo, dear developers

Forgive me if it's not a good idea.

What do you think microsoft office document scanning?

If microsoft already contributed with NV Access, is it not possible they alows NVDA uses its OCR?

It's only a suggestion.

@nvaccessAuto

Comment 16 by jteh on 2012-10-15 04:13
The OCR add-on has been available for some time now.
Changes:
State: closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment