Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New PowerToy] Create new OCR PowerToy #19172

Merged
merged 81 commits into from
Aug 25, 2022
Merged

Conversation

TheJoeFin
Copy link
Collaborator

@TheJoeFin TheJoeFin commented Jul 5, 2022

Summary of the Pull Request

This PR introduces a new PowerToy to perform OCR anywhere on screen by selecting a rectangular region, clicking a word, or right-clicking an image file and selecting PowerOCR.

As of the first week in July, there is much more that needs to be done to make this PowerToy ready for prime time, but this is the MVP if anyone wanted to pull and build just this tool. Most of the code is copied from my repository Text Grab.

PR Checklist

Detailed Description of the Pull Request / Additional comments

  • Currently this project is just the MVP of a region selection on a single screen. There is much more work needed before this PR is ready to merge
    • Work on multi-monitor
    • Capture cursor to keep within each monitor
    • Add Right-click image option
    • Add Tests
    • Connect to the installer
    • Add settings page
  • This app is very simple and built in a very straightforward way. Possible considerations
    • Change over to MVVM
    • Possibly extract some of the bitmap manipulation methods into PowerToys Utils
    • Add options at top like Snipping Tool
    • I can do some more experimental work on the Windows OCR API to find the ideal text line height

Validation Steps Performed

  • Tested by building and manually testing

@ghost
Copy link

ghost commented Jul 5, 2022

CLA assistant check
All CLA requirements met.

@github-actions

This comment has been minimized.

@TheJoeFin TheJoeFin marked this pull request as draft July 5, 2022 02:54
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@Aaron-Junker
Copy link
Collaborator

What do you mean with mvp. Do you mean this PR is WIP?

@TheJoeFin
Copy link
Collaborator Author

@Aaron-Junker by MVP I meant that the region selection OCR fully works just that it is a Minimally Viable Product as it stands currently. If anyone wants to pull my branch and build, it will work for them. I will edit the original comment to be more clear.

This is still a WIP and I will be working to get it where it needs to be for full inclusion into PowerToys.

@Aaron-Junker
Copy link
Collaborator

@Aaron-Junker by MVP I meant that the region selection OCR fully works just that it is a Minimally Viable Product as it stands currently. If anyone wants to pull my branch and build, it will work for them. I will edit the original comment to be more clear.

This is still a WIP and I will be working to get it where it needs to be for full inclusion into PowerToys.

Oh. Thank you for the explanation. I thought maybe you mixed it up.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@Aaron-Junker
Copy link
Collaborator

Please don't forget to add this to the issue templates.

@jaimecbernardo
Copy link
Collaborator

Just did. Going with minimum lovable product here. What's remaining in terms of features would go in different issues.

@htcfreek
Copy link
Collaborator

@jaimecbernardo
if you don't want to have this PowerToy's name start with the word "Power" what do you think about the name "OCR Copy"?

@jaimecbernardo
Copy link
Collaborator

@jaimecbernardo if you don't want to have this PowerToy's name start with the word "Power" what do you think about the name "OCR Copy"?

Was about to add a comment about that.
@TheJoeFin, are you ok with us changing the name of the PowerToys in order to not contain the word "Power"? We've been avoiding that on new utilities.
Another internal ideas is naming it "Screen to Text", in the sense that it's descriptive of functionality.

Copy link
Collaborator

@stefansjfw stefansjfw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything works nicely on single and multi monitor setup. Nice work!

@htcfreek
Copy link
Collaborator

@jaimecbernardo if you don't want to have this PowerToy's name start with the word "Power" what do you think about the name "OCR Copy"?

Was about to add a comment about that.
@TheJoeFin, are you ok with us changing the name of the PowerToys in order to not contain the word "Power"? We've been avoiding that on new utilities.
Another internal ideas is naming it "Screen to Text", in the sense that it's descriptive of functionality.

If we add the context menu for images. Then the source is an image and not the screen.

@TheJoeFin
Copy link
Collaborator Author

@jaimecbernardo if you don't want to have this PowerToy's name start with the word "Power" what do you think about the name "OCR Copy"?

Was about to add a comment about that. @TheJoeFin, are you ok with us changing the name of the PowerToys in order to not contain the word "Power"? We've been avoiding that on new utilities. Another internal ideas is naming it "Screen to Text", in the sense that it's descriptive of functionality.

I am onboard with changing the name away from PowerOCR. "OCR Copy" is a good name, when thinking up a name I did web searches and found many of the "easy OCR" "quick OCR" "Fast OCR" etc. all already in use.

@jaimecbernardo
Copy link
Collaborator

So, the plan is to currently merge as "PowerOCR" and change it later if needed.
Thank you!

@htcfreek
Copy link
Collaborator

@jaimecbernardo
Don't forget the github label.

@crutkas
Copy link
Member

crutkas commented Aug 25, 2022

  1. This is amazing
  2. Lets change keystroke. R just seems too much like 'recording' and would like to keep that open
    1. Win+Shift+O?
    2. Win+Shift+T?

@jaimecbernardo
Copy link
Collaborator

  1. Win+Shift+O?
  2. Win+Shift+T?

Win+Shift+O is used by VCM already for camera muting.
Win+Shift+T should be OK. Going to make the change and merge it in.

@jaimecbernardo jaimecbernardo added the Needs-Review This Pull Request awaits the review of a maintainer. label Aug 25, 2022
@jaimecbernardo
Copy link
Collaborator

  1. Win+Shift+O?
  2. Win+Shift+T?

Win+Shift+O is used by VCM already for camera muting. Win+Shift+T should be OK. Going to make the change and merge it in.

Although, Win+T and Win+Shift+T are shortcuts for cycling forward and backwards on the taskbar icons. Going to check if it can be overridden on Windows 11 as well and then apply the change.

@yuyoyuppe
Copy link
Collaborator

Tested locally, got a crash once during OCR:

 	[Exception] System.Runtime.InteropServices.Marshal.ThrowExceptionForHR(int errorCode, System.IntPtr errorInfo)	Unknown
 	[Exception] System.Windows.Clipboard.Flush()	Unknown
 	[Exception] System.Windows.Clipboard.CriticalSetDataObject(object data, bool copy)	Unknown
 	[Exception] System.Windows.Clipboard.SetDataInternal(string format, object data)	Unknown
 	[Exception] System.Windows.Clipboard.SetText(string text, System.Windows.TextDataFormat format)	Unknown
 	[Exception] System.Windows.Clipboard.SetText(string text)	Unknown
>	[Exception] PowerOCR.OCROverlay.RegionClickCanvas_MouseUp(object sender, System.Windows.Input.MouseButtonEventArgs e) Line 218	C#
 	[Exception] System.Threading.Tasks.Task.ThrowAsync.AnonymousMethod__128_0(object state)	Unknown
 	[Exception] System.Windows.Threading.ExceptionWrapper.InternalRealCall(System.Delegate callback, object args, int numArgs)	Unknown
 	[Exception] System.Windows.Threading.ExceptionWrapper.TryCatchWhen(object source, System.Delegate callback, object args, int numArgs, System.Delegate catchHandler)	Unknown
 	[Exception] System.Windows.Threading.DispatcherOperation.InvokeImpl()	Unknown
 	[Exception] MS.Internal.CulturePreservingExecutionContext.CallbackWrapper(object obj)	Unknown
 	[Exception] System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)	Unknown
 	00007ffe31344fd9()	Unknown
 	[Managed to Native Transition]	
 	System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()	Unknown
 	MS.Internal.CulturePreservingExecutionContext.Run(MS.Internal.CulturePreservingExecutionContext executionContext, System.Threading.ContextCallback callback, object state)	Unknown
 	System.Windows.Threading.DispatcherOperation.Invoke()	Unknown
 	System.Windows.Threading.Dispatcher.ProcessQueue()	Unknown
 	System.Windows.Threading.Dispatcher.WndProcHook(System.IntPtr hwnd, int msg, System.IntPtr wParam, System.IntPtr lParam, ref bool handled)	Unknown
 	MS.Win32.HwndWrapper.WndProc(System.IntPtr hwnd, int msg, System.IntPtr wParam, System.IntPtr lParam, ref bool handled)	Unknown
 	MS.Win32.HwndSubclass.DispatcherCallbackOperation(object o)	Unknown
 	System.Windows.Threading.ExceptionWrapper.InternalRealCall(System.Delegate callback, object args, int numArgs)	Unknown
 	System.Windows.Threading.ExceptionWrapper.TryCatchWhen(object source, System.Delegate callback, object args, int numArgs, System.Delegate catchHandler)	Unknown
 	System.Windows.Threading.Dispatcher.LegacyInvokeImpl(System.Windows.Threading.DispatcherPriority priority, System.TimeSpan timeout, System.Delegate method, object args, int numArgs)	Unknown
 	MS.Win32.HwndSubclass.SubclassWndProc(System.IntPtr hwnd, int msg, System.IntPtr wParam, System.IntPtr lParam)	Unknown
 	[Native to Managed Transition]	
 	00007ffe3268e858()	Unknown
 	00007ffe3268e299()	Unknown
 	[Managed to Native Transition]	
 	System.Windows.Threading.Dispatcher.PushFrameImpl(System.Windows.Threading.DispatcherFrame frame)	Unknown
 	System.Windows.Threading.Dispatcher.PushFrame(System.Windows.Threading.DispatcherFrame frame)	Unknown
 	System.Windows.Threading.Dispatcher.Run()	Unknown
 	System.Windows.Application.RunDispatcher(object ignore)	Unknown
 	System.Windows.Application.RunInternal(System.Windows.Window window)	Unknown
 	System.Windows.Application.Run()	Unknown
 	PowerOCR.App.Main()	Unknown

image

@TheJoeFin
Copy link
Collaborator Author

@yuyoyuppe We should probably wrap the Clipboard access in a try block

Clipboard.SetText(grabbedText);

and probably log an error if access to the clipboard fails.

@crutkas
Copy link
Member

crutkas commented Aug 25, 2022

@yuyoyuppe We should probably wrap the Clipboard access in a try block

Clipboard.SetText(grabbedText);

and probably log an error if access to the clipboard fails.

How does color picker safe guard this.

@davidegiacometti
Copy link
Collaborator

@crutkas try-catch with log.
https://github.com/microsoft/PowerToys/blob/main/src/modules/colorPicker/ColorPickerUI/Helpers/ClipboardHelper.cs

@TheJoeFin I already love this tool! 😃
I have two small nits:

  • Hide the window from the taskbar
  • Strange behaviour: background disappear on single click when no text is detected but PowerOCR still opened

BackgroundBrush.Opacity = 0;

@astandarduser
Copy link

Would love the ability to run OCR in bulk on a folder of pictures, and have the ability to search images by text it contains. Probably out of the scope of this but would be incredibly handy for bulk scanning or finding things in a screenshots folder

@crutkas
Copy link
Member

crutkas commented Aug 25, 2022

we should move to clipboard logic to common lib so we reuse same logic.

@crutkas
Copy link
Member

crutkas commented Aug 25, 2022

@astandarduser please create new issue as "new feature" request. Would love to understand more the "why" but very interesting concept

@jefflord
Copy link
Contributor

jefflord commented Aug 26, 2022

This is SOO nice. I did use Text Grab but having this as part of PowerToys will make it something I know will just always be there for me.

Nits:

  • The link for "Leam(sic) more about PowerOCR" does not really work. <- Funny, "Leam" is a typo, OCR fail...
  • Pressing Win before you finish the text selection puts you in a weird state. It should ignore Win or treat it like Esc.
  • Not selecting anything but just left-click on the mouse does not revert the cursor or mode. It's seems to be still in
    OCR mode but the screen is not dimed.
  • Why not call this "Text Grab" here in PowerToys? Or "PowerTextGrab" (PTG). I think that having "OCR" in the name can lead people to think its a "Powerful OCR tool" that can be use to process files an other things, when it's really just (a fantastic) quick "text grabber" for the screen. Maybe it's "Screen Text Grabber", or "PowerTextPlucker", or "VisualTextCopy"...

Thank you for making the "dimming" be instant and not fade. If you do change to fade, make that an optional thing please.

@crutkas, should the above go in a new issue at this point also?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Idea-New PowerToy Suggestion for a PowerToy Needs-Review This Pull Request awaits the review of a maintainer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet