Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text extractor is not working on Japanese despite OCR language pack is installed #22325

Closed
heiyue-an opened this issue Nov 27, 2022 · 13 comments
Closed
Labels
Issue-Docs Documentation issue that needs to be improved Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something Status-No recent activity no activity in the past 5 days when follow up's are needed

Comments

@heiyue-an
Copy link

Provide a description of requested docs changes

image
great support in English but cannot use in Japanese
powertoys v0.64.1 and win11 22623.891

@heiyue-an heiyue-an added Issue-Docs Documentation issue that needs to be improved Needs-Triage For issues raised to be triaged and prioritized by internal Microsoft teams labels Nov 27, 2022
@heiyue-an heiyue-an reopened this Nov 27, 2022
@inetkachev
Copy link

inetkachev commented Jan 8, 2023

Have same problem:

PS C:\Users\Ivan> [Windows.Media.Ocr.OcrEngine]::AvailableRecognizerLanguages


DisplayName     : English (United States)
LanguageTag     : en-US
NativeName      : English (United States)
Script          : Latn
LayoutDirection : Ltr
AbbreviatedName : ENG

DisplayName     : Russian
LanguageTag     : ru
NativeName      : Русский
Script          : Cyrl
LayoutDirection : Ltr
AbbreviatedName : РУС

But Cyrylic text recognised as English

@JeffJacobson
Copy link

I am also having the same problem. I know this tool USED TO WORK* on this page in an earlier version of PowerToys, but it no longer works. (I made sure I had the Japanese IME active when using the tool.)

You can get the what the text should be on that page by running this command in the browser's console.

[...document.body.querySelectorAll("img[src$='.svg'")].map(e => e.alt).join("\n")

However, I noticed that DeepL is also exhibiting similar behavior (see screenshot below), so it might be something that broke in an update to Windows itself rather than PowerToys. (I had never tried DeepL's OCR until today, so I don't know if it ever worked correctly.)

DeepL OCR issues

* When it "worked" it didn't get everything 100% correct, but now it returns complete gibberish.

System Info

Name Value
PowerToys v0.66.0
Display XV273K
Scale 200%
Resolution 3840 x 2160 (Recommended)
Display 1 Connected to NVIDIA GeForce RTX 3080 Ti
Desktop mode 3840 x 2160, 119.91 Hz
Active signal mode 3840 x 2160, 119.91 Hz
Bit depth 8-bit with dithering
Color format RGB
Color space High dynamic range (HDR)
HDR certification Not found More about HDR certification
Peak brightness 409 nits
Edition Windows 11 Home
Version 22H2
Installed on 2022-09-29
OS build 22621.1105
Experience Windows Feature Experience Pack 1000.22638.1000.0
Processor 12th Gen Intel(R) Core(TM) i9-12900KF 3.19 GHz
Installed RAM 64.0 GB (63.9 GB usable)
System type 64-bit operating system, x64-based processor
Pen and touch No pen or touch input is available for this display

@iamenews
Copy link

Still happening on 0.69.1 May 12.
I have JA and zh-CN installed (as shown below) but OCR only works in english on tested websites like Yahoo Japan. I was following this help article btw.
image

@LuisLauM
Copy link

Please devs, don't forget to solve this issue. 🙏

@TheJoeFin
Copy link
Collaborator

Can you confirm this is still an issue with PowerToys v0.72?

/needinfo

@microsoft-github-policy-service microsoft-github-policy-service bot added Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something and removed Needs-Triage For issues raised to be triaged and prioritized by internal Microsoft teams labels Aug 10, 2023
@hockyy
Copy link

hockyy commented Aug 14, 2023

Can you confirm this is still an issue with PowerToys v0.72?

/needinfo

@TheJoeFin yes

@hockyy
Copy link

hockyy commented Aug 14, 2023

Update:

  • This only happens if both en-US and ja-JP is installed
PS C:\Users\hocky> $Capability = Get-WindowsCapability -Online | Where-Object { $_.Name -Like 'Language.OCR*en-US*' }
PS C:\Users\hocky> Get-WindowsCapability -Online | Where-Object { $_.Name -Like 'Language.OCR*' }
PS C:\Users\hocky> $Capability = Get-WindowsCapability -Online | Where-Object { $_.Name -Like 'Language.OCR*ja-JP*' }
PS C:\Users\hocky> $Capability | Add-WindowsCapability -Online

This runs fine if you abolish those en OCR packages

@hockyy
Copy link

hockyy commented Aug 14, 2023

Example result:

よ ー し、さっそく町で調査を始めましよう/

image

@TheJoeFin
Copy link
Collaborator

You have to select the language you want to OCR from the right click menu after you activate Text Extractor. If you don't select a different language, then the keyboard language will be used. By removing the other OCR languages, it seems like that could have the same effect.

Can you confirm some more details:

  • Windows language
  • Keyboard language
  • Have you tried changing language via context menu?

/needinfo

@hockyy
Copy link

hockyy commented Aug 14, 2023

  • Windows Language is English
  • Keyboard language is english (2 keyboards available but english is chosen)
  • Havent tried to change the language in the context menu
    @TheJoeFin

@TheJoeFin
Copy link
Collaborator

I am going to update the UI to make the language changing obvious. Please try changing language with the right click menu when all languages are installed. Comment here if that fixes the issue.

/needinfo

@TheJoeFin
Copy link
Collaborator

/needinfo

@microsoft-github-policy-service
Copy link
Contributor

This issue has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 5 days. It will be closed if no further activity occurs within 5 days of this comment.

@microsoft-github-policy-service microsoft-github-policy-service bot added the Status-No recent activity no activity in the past 5 days when follow up's are needed label Aug 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Docs Documentation issue that needs to be improved Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something Status-No recent activity no activity in the past 5 days when follow up's are needed
Projects
None yet
Development

No branches or pull requests

7 participants