[New PowerToy] Create new OCR PowerToy #19172

TheJoeFin · 2022-07-05T02:51:29Z

Summary of the Pull Request

This PR introduces a new PowerToy to perform OCR anywhere on screen by selecting a rectangular region, clicking a word, or right-clicking an image file and selecting PowerOCR.

As of the first week in July, there is much more that needs to be done to make this PowerToy ready for prime time, but this is the MVP if anyone wanted to pull and build just this tool. Most of the code is copied from my repository Text Grab.

PR Checklist

Detailed Description of the Pull Request / Additional comments

Currently this project is just the MVP of a region selection on a single screen. There is much more work needed before this PR is ready to merge
- Work on multi-monitor
- Capture cursor to keep within each monitor
- Add Right-click image option
- Add Tests
- Connect to the installer
- Add settings page
This app is very simple and built in a very straightforward way. Possible considerations
- Change over to MVVM
- Possibly extract some of the bitmap manipulation methods into PowerToys Utils
- Add options at top like Snipping Tool
- I can do some more experimental work on the Windows OCR API to find the ideal text line height

Validation Steps Performed

Tested by building and manually testing

….UnitTests

ghost · 2022-07-05T02:51:41Z

All CLA requirements met.

Aaron-Junker · 2022-07-06T05:50:11Z

What do you mean with mvp. Do you mean this PR is WIP?

TheJoeFin · 2022-07-06T15:04:34Z

@Aaron-Junker by MVP I meant that the region selection OCR fully works just that it is a Minimally Viable Product as it stands currently. If anyone wants to pull my branch and build, it will work for them. I will edit the original comment to be more clear.

This is still a WIP and I will be working to get it where it needs to be for full inclusion into PowerToys.

Aaron-Junker · 2022-07-06T15:40:14Z

@Aaron-Junker by MVP I meant that the region selection OCR fully works just that it is a Minimally Viable Product as it stands currently. If anyone wants to pull my branch and build, it will work for them. I will edit the original comment to be more clear.

This is still a WIP and I will be working to get it where it needs to be for full inclusion into PowerToys.

Oh. Thank you for the explanation. I thought maybe you mixed it up.

Aaron-Junker · 2022-08-23T18:16:18Z

Please don't forget to add this to the issue templates.

jaimecbernardo · 2022-08-23T19:45:12Z

Just did. Going with minimum lovable product here. What's remaining in terms of features would go in different issues.

htcfreek · 2022-08-24T09:03:55Z

@jaimecbernardo
if you don't want to have this PowerToy's name start with the word "Power" what do you think about the name "OCR Copy"?

jaimecbernardo · 2022-08-24T09:06:38Z

@jaimecbernardo if you don't want to have this PowerToy's name start with the word "Power" what do you think about the name "OCR Copy"?

Was about to add a comment about that.
@TheJoeFin, are you ok with us changing the name of the PowerToys in order to not contain the word "Power"? We've been avoiding that on new utilities.
Another internal ideas is naming it "Screen to Text", in the sense that it's descriptive of functionality.

stefansjfw

Everything works nicely on single and multi monitor setup. Nice work!

htcfreek · 2022-08-24T10:21:12Z

@jaimecbernardo if you don't want to have this PowerToy's name start with the word "Power" what do you think about the name "OCR Copy"?

Was about to add a comment about that.
@TheJoeFin, are you ok with us changing the name of the PowerToys in order to not contain the word "Power"? We've been avoiding that on new utilities.
Another internal ideas is naming it "Screen to Text", in the sense that it's descriptive of functionality.

If we add the context menu for images. Then the source is an image and not the screen.

TheJoeFin · 2022-08-24T13:48:42Z

@jaimecbernardo if you don't want to have this PowerToy's name start with the word "Power" what do you think about the name "OCR Copy"?

Was about to add a comment about that. @TheJoeFin, are you ok with us changing the name of the PowerToys in order to not contain the word "Power"? We've been avoiding that on new utilities. Another internal ideas is naming it "Screen to Text", in the sense that it's descriptive of functionality.

I am onboard with changing the name away from PowerOCR. "OCR Copy" is a good name, when thinking up a name I did web searches and found many of the "easy OCR" "quick OCR" "Fast OCR" etc. all already in use.

jaimecbernardo · 2022-08-24T16:43:44Z

So, the plan is to currently merge as "PowerOCR" and change it later if needed.
Thank you!

htcfreek · 2022-08-24T17:33:14Z

@jaimecbernardo
Don't forget the github label.

crutkas · 2022-08-25T04:42:48Z

This is amazing
Lets change keystroke. R just seems too much like 'recording' and would like to keep that open
1. Win+Shift+O?
2. Win+Shift+T?

jaimecbernardo · 2022-08-25T07:26:52Z

Win+Shift+O?

Win+Shift+T?

Win+Shift+O is used by VCM already for camera muting.
Win+Shift+T should be OK. Going to make the change and merge it in.

jaimecbernardo · 2022-08-25T07:30:34Z

Win+Shift+O?

Win+Shift+T?

Win+Shift+O is used by VCM already for camera muting. Win+Shift+T should be OK. Going to make the change and merge it in.

Although, Win+T and Win+Shift+T are shortcuts for cycling forward and backwards on the taskbar icons. Going to check if it can be overridden on Windows 11 as well and then apply the change.

yuyoyuppe · 2022-08-25T16:33:22Z

Tested locally, got a crash once during OCR:

 	[Exception] System.Runtime.InteropServices.Marshal.ThrowExceptionForHR(int errorCode, System.IntPtr errorInfo)	Unknown
 	[Exception] System.Windows.Clipboard.Flush()	Unknown
 	[Exception] System.Windows.Clipboard.CriticalSetDataObject(object data, bool copy)	Unknown
 	[Exception] System.Windows.Clipboard.SetDataInternal(string format, object data)	Unknown
 	[Exception] System.Windows.Clipboard.SetText(string text, System.Windows.TextDataFormat format)	Unknown
 	[Exception] System.Windows.Clipboard.SetText(string text)	Unknown
>	[Exception] PowerOCR.OCROverlay.RegionClickCanvas_MouseUp(object sender, System.Windows.Input.MouseButtonEventArgs e) Line 218	C#
 	[Exception] System.Threading.Tasks.Task.ThrowAsync.AnonymousMethod__128_0(object state)	Unknown
 	[Exception] System.Windows.Threading.ExceptionWrapper.InternalRealCall(System.Delegate callback, object args, int numArgs)	Unknown
 	[Exception] System.Windows.Threading.ExceptionWrapper.TryCatchWhen(object source, System.Delegate callback, object args, int numArgs, System.Delegate catchHandler)	Unknown
 	[Exception] System.Windows.Threading.DispatcherOperation.InvokeImpl()	Unknown
 	[Exception] MS.Internal.CulturePreservingExecutionContext.CallbackWrapper(object obj)	Unknown
 	[Exception] System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)	Unknown
 	00007ffe31344fd9()	Unknown
 	[Managed to Native Transition]	
 	System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()	Unknown
 	MS.Internal.CulturePreservingExecutionContext.Run(MS.Internal.CulturePreservingExecutionContext executionContext, System.Threading.ContextCallback callback, object state)	Unknown
 	System.Windows.Threading.DispatcherOperation.Invoke()	Unknown
 	System.Windows.Threading.Dispatcher.ProcessQueue()	Unknown
 	System.Windows.Threading.Dispatcher.WndProcHook(System.IntPtr hwnd, int msg, System.IntPtr wParam, System.IntPtr lParam, ref bool handled)	Unknown
 	MS.Win32.HwndWrapper.WndProc(System.IntPtr hwnd, int msg, System.IntPtr wParam, System.IntPtr lParam, ref bool handled)	Unknown
 	MS.Win32.HwndSubclass.DispatcherCallbackOperation(object o)	Unknown
 	System.Windows.Threading.ExceptionWrapper.InternalRealCall(System.Delegate callback, object args, int numArgs)	Unknown
 	System.Windows.Threading.ExceptionWrapper.TryCatchWhen(object source, System.Delegate callback, object args, int numArgs, System.Delegate catchHandler)	Unknown
 	System.Windows.Threading.Dispatcher.LegacyInvokeImpl(System.Windows.Threading.DispatcherPriority priority, System.TimeSpan timeout, System.Delegate method, object args, int numArgs)	Unknown
 	MS.Win32.HwndSubclass.SubclassWndProc(System.IntPtr hwnd, int msg, System.IntPtr wParam, System.IntPtr lParam)	Unknown
 	[Native to Managed Transition]	
 	00007ffe3268e858()	Unknown
 	00007ffe3268e299()	Unknown
 	[Managed to Native Transition]	
 	System.Windows.Threading.Dispatcher.PushFrameImpl(System.Windows.Threading.DispatcherFrame frame)	Unknown
 	System.Windows.Threading.Dispatcher.PushFrame(System.Windows.Threading.DispatcherFrame frame)	Unknown
 	System.Windows.Threading.Dispatcher.Run()	Unknown
 	System.Windows.Application.RunDispatcher(object ignore)	Unknown
 	System.Windows.Application.RunInternal(System.Windows.Window window)	Unknown
 	System.Windows.Application.Run()	Unknown
 	PowerOCR.App.Main()	Unknown

TheJoeFin · 2022-08-25T19:45:37Z

@yuyoyuppe We should probably wrap the Clipboard access in a try block

PowerToys/src/modules/PowerOCR/PowerOCR/OCROverlay.xaml.cs

Line 275 in eedea31

Clipboard.SetText(grabbedText);

and probably log an error if access to the clipboard fails.

crutkas · 2022-08-25T21:46:06Z

@yuyoyuppe We should probably wrap the Clipboard access in a try block

PowerToys/src/modules/PowerOCR/PowerOCR/OCROverlay.xaml.cs

Line 275 in eedea31

Clipboard.SetText(grabbedText);

and probably log an error if access to the clipboard fails.

How does color picker safe guard this.

davidegiacometti · 2022-08-25T22:01:43Z

@crutkas try-catch with log.
https://github.com/microsoft/PowerToys/blob/main/src/modules/colorPicker/ColorPickerUI/Helpers/ClipboardHelper.cs

@TheJoeFin I already love this tool! 😃
I have two small nits:

Hide the window from the taskbar
Strange behaviour: background disappear on single click when no text is detected but PowerOCR still opened

PowerToys/src/modules/PowerOCR/PowerOCR/OCROverlay.xaml.cs

Line 265 in eedea31

BackgroundBrush.Opacity = 0;

astandarduser · 2022-08-25T23:04:40Z

Would love the ability to run OCR in bulk on a folder of pictures, and have the ability to search images by text it contains. Probably out of the scope of this but would be incredibly handy for bulk scanning or finding things in a screenshots folder

crutkas · 2022-08-25T23:56:10Z

we should move to clipboard logic to common lib so we reuse same logic.

crutkas · 2022-08-25T23:56:45Z

@astandarduser please create new issue as "new feature" request. Would love to understand more the "why" but very interesting concept

jefflord · 2022-08-26T00:55:19Z

This is SOO nice. I did use Text Grab but having this as part of PowerToys will make it something I know will just always be there for me.

Nits:

The link for "Leam(sic) more about PowerOCR" does not really work. <- Funny, "Leam" is a typo, OCR fail...
Pressing Win before you finish the text selection puts you in a weird state. It should ignore Win or treat it like Esc.
Not selecting anything but just left-click on the mouse does not revert the cursor or mode. It's seems to be still in
OCR mode but the screen is not dimed.
Why not call this "Text Grab" here in PowerToys? Or "PowerTextGrab" (PTG). I think that having "OCR" in the name can lead people to think its a "Powerful OCR tool" that can be use to process files an other things, when it's really just (a fantastic) quick "text grabber" for the screen. Maybe it's "Screen Text Grabber", or "PowerTextPlucker", or "VisualTextCopy"...

Thank you for making the "dimming" be instant and not fade. If you do change to fade, make that an optional thing please.

@crutkas, should the above go in a new issue at this point also?

TheJoeFin added 4 commits July 4, 2022 17:08

Init commit

390d9cf

Fix unintended GUID change of Microsoft.PowerToys.Run.Plugin.TimeZone…

0ddb1c4

….UnitTests

Region and click word working

41699d3

Code style

6b9eba8