Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading Windows OCR settings panel results in cluttered UI #15017

Closed
cary-rowen opened this issue Jun 16, 2023 · 30 comments · Fixed by #15031
Closed

Loading Windows OCR settings panel results in cluttered UI #15017

cary-rowen opened this issue Jun 16, 2023 · 30 comments · Fixed by #15031
Labels
p3 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority sightedDevIdeal triaged Has been triaged, issue is waiting for implementation. ux
Milestone

Comments

@cary-rowen
Copy link
Contributor

Steps to reproduce:

  1. Press NVDA + N, P, S in turn
  2. Navigate in the Settings category.
  3. Navigate to the WindowsOCR category, press tab to observe the UI

Actual behavior:

Shows some controls for document navigation
log snippet

ERROR - unhandled exception (11:01:35.008) - MainThread (5244):
Traceback (most recent call last):
  File "gui\settingsDialogs.pyc", line 4149, in onCategoryChange
  File "gui\settingsDialogs.pyc", line 667, in onCategoryChange
  File "gui\settingsDialogs.pyc", line 649, in _doCategoryChange
  File "gui\settingsDialogs.pyc", line 577, in _getCategoryPanel
  File "gui\settingsDialogs.pyc", line 335, in __init__
  File "gui\settingsDialogs.pyc", line 345, in _buildGui
  File "gui\settingsDialogs.pyc", line 2598, in makeSettings
  File "gui\guiHelper.pyc", line 332, in addLabeledControl
  File "gui\guiHelper.pyc", line 205, in __init__
TypeError: Item at index 0 has type 'NoneType' but a sequence of bytes or strings is expected

Expected behavior:

Show the settings ui for Windows OCR

NVDA logs, crash dumps and other attachments:

log.txt

System configuration

NVDA installed/portable/running from source:

Installed

NVDA version:

2023.1

Windows version:

Windows 10 1809 (10.0.17763)

Name and version of other software in use when reproducing the issue:

None

Other information about your system:

None

Other questions

Does the issue still occur after restarting your computer?

Yes

Have you tried any other versions of NVDA? If so, please report their behaviors.

No

If NVDA add-ons are disabled, is your problem still occurring?

Yes

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

Yes

@CyrilleB79
Copy link
Collaborator

@cary-rowen do you confirm that you have tested with add-ons disabled? Just because your log shows that NVDA is running with add-ons.

Also is there any reason why you are running Windows 10 1809, which is an old release now unsupported by Microsoft?

I suspect an unknown language to be installed in Windows OCR. Could you run the following command in the console and report here the result?
from contentRecog import uwpOcr;uwpOcr.getLanguages()

@cary-rowen
Copy link
Contributor Author

Hi @CyrilleB79

Yes I tested with the add-on disabled, but since this machine I can only debug remotely, I just saved this log.
In addition, I also executed the code you provided above when I first checked the problem:
I found that the returned result is "zh-Hans-CN", This seems to be correct, NVDA +R works fine too.

I also found that the implementation of winVersion.isUwpOcrAvailable seems not robust enough.

def isUwpOcrAvailable():
	return os.path.isdir(UWP_OCR_DATA_PATH)

But that might not be directly relevant to the issue.
This will cause problems if the OCR component is corrupted.
Thanks

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 16, 2023

Hi!

Just because your log shows that NVDA is running with add-ons.

This is because @cary-rowen is currently utilizing the NVDA remote add-on to inspect the issue that I have encountered.

Also is there any reason why you are running Windows 10 1809, which is an old release now unsupported by Microsoft?

This is the computer provided by my educational institution for me to use during class.

A convenient approach to reproduce this scenario is to create a folder named OCR in the %windir% directory and then examine the settings dialog, provided that the system is either Windows 7 or a system without OCR capability.

The primary reason I am noticing this issue is because it is resulting in damage to the settings dialog.

Thanks!

@CyrilleB79
Copy link
Collaborator

@cary-rowen, when I run the following command
from contentRecog import uwpOcr;uwpOcr.getLanguages()

I get a list (in my case ['en-US', 'fr-FR']).

On your side, you get "zh-Hans-CN", which is only a string value. Could you double check that you have fully copy/pasted the result that you get in the console? Thanks.

@cary-rowen
Copy link
Contributor Author

@CyrilleB79 wrote:

On your side, you get "zh-Hans-CN", which is only a string value. Could you double check that you have fully copy/pasted the result that you get in the console? Thanks.

Yes, same as you, executing the above code I also get a list, I also suspected at first that the problem was here, but it seems to be correct. I believe @hwf1324 can provide more information on the error

Thank you for your attention

@CyrilleB79
Copy link
Collaborator

@CyrilleB79 wrote:

On your side, you get "zh-Hans-CN", which is only a string value. Could you double check that you have fully copy/pasted the result that you get in the console? Thanks.

Yes, same as you, executing the above code I also get a list, I also suspected at first that the problem was here, but it seems to be correct. I believe @hwf1324 can provide more information on the error

Could you provide the full copy of this list please?

@cary-rowen
Copy link
Contributor Author

Could you provide the full copy of this list please?

If you're referring to the NVDA portable version that reproduces the issue, then no problem, it will be available within 24 hours at the latest.

@CyrilleB79
Copy link
Collaborator

Could you provide the full copy of this list please?

If you're referring to the NVDA portable version that reproduces the issue, then no problem, it will be available within 24 hours at the latest.

Not at all! Sorry, we do not seem to understand each other.

I would like you to execute the following command in the Python console of NVDA:
from contentRecog import uwpOcr;uwpOcr.getLanguages()

And then copy the full result that you get, i.e. the square brackets and all the text between them.

For information, this command returns the language codes supported by your system. And I would like to have a look at this exact list of codes.

Thanks.

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 17, 2023

Hi! @CyrilleB79

from contentRecog import uwpOcr;uwpOcr.getLanguages()

>>> from contentRecog import uwpOcr; uwpOcr.getLanguages()
['zh-Hans-CN']

@CyrilleB79
Copy link
Collaborator

Hi @hwf1324 or @cary-rowen

Could you please test this build and send me a log?
This should not fix the issue but I hope to gather more information on it.
Thanks.

@cary-rowen
Copy link
Contributor Author

Hi @CyrilleB79
We investigated this issue in more detail, got some useful information, and will provide a detailed issue report shortly.
cc @hwf1324

Many thanks to @CyrilleB79 for his attention to this issue.

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 19, 2023

After thorough research, it seems that the error might be stemming from the 'LocaleNameToLCID' function in the Win32 API, which is not executing as expected.

Here is an outline of my investigative process:

  1. I utilized the 'isUwpOcrAvailable' function to create a new folder titled 'OCR' in the %windir% directory on Windows 7, aiming to reproduce the error. While this did trigger the error, it was not a complete replication. An unhandled exception error was generated:
ERROR - unhandled exception (01:46:18.170) - MainThread (2612):
Traceback (most recent call last):
  File "gui\settingsDialogs.pyc", line 4242, in onCategoryChange
  File "gui\settingsDialogs.pyc", line 689, in onCategoryChange
  File "gui\settingsDialogs.pyc", line 671, in _doCategoryChange
  File "gui\settingsDialogs.pyc", line 599, in _getCategoryPanel
  File "gui\settingsDialogs.pyc", line 357, in __init__
  File "gui\settingsDialogs.pyc", line 367, in _buildGui
  File "gui\settingsDialogs.pyc", line 2627, in makeSettings
  File "contentRecog\uwpOcr.pyc", line 25, in getLanguages
  File "NVDAHelper.pyc", line 634, in getHelperLocalWin10Dll
  File "ctypes\__init__.pyc", line 439, in __getitem__
  File "ctypes\__init__.pyc", line 434, in __getattr__
  File "ctypes\__init__.pyc", line 364, in __init__
OSError: [WinError 126] The specified module could not be found
  1. Upon comparison of the two logs, the issue appears to lie in the 'makeSettings' section. A detailed comparison can be found in the following link:
    https://github.com/nvaccess/nvda/blob/3d78d701c4779d25ff7fe0dff5644409d171cc9d/source/gui/settingsDialogs.py#LL2628C3-L2633C101

  2. During testing, the list comprehension's result turned out to be 'None'. Therefore, it is plausible that the error arises from the inclusion of a 'None' parameter when adding language descriptions.

  3. The investigation led me to the 'LocaleNameToLCID' function call in 'ctypes' which, surprisingly, returned 0. I couldn't identify why this call failed. Microsoft's documentation on this can be found here: https://learn.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-localenametolcid. Notably, the 'Remarks' section suggests a possible deprecation of 'LCID'?

  4. I made an attempt to capture the error, but my proficiency in C++ is limited, and my C++ environment is MinGW64. The error code was 0x57, and the output using LOCALE_NAME_* was also odd.

Regrettably, my investigation could only go up to this point. As I am about to leave school, replicating the system environment that caused this error may prove challenging.

That being said, my primary concern lies with the damage to the 'settingsDialogs' GUI interface. It is possible that any unhandled exceptions in 'makeSettings' could be causing this. It might be necessary to conduct some tests to confirm this.

@CyrilleB79
Copy link
Collaborator

Thanks for the investigations and the findings that you have reported. The build that I have provided (coming from PR #15031) will not fix the root cause but should fix the GUI issue by handling more robustly the result of the conversion from language code to language name. As a consequence, since your system is not able to convert the language code to a language name, the language code will be listed instead of the language name.

Is this an acceptable workaround for you? I think it's not worth working more on an issue that happens only on a specific system on which Windows is version is not suported anymore.

Regarding LCID deprecation, I do not think that it is the cause given the Windows version is old and given other newer systems have no such issue.

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 19, 2023

@CyrilleB79 Thanks!

I'll test it tomorrow. If the GUI issue can be fixed, then this issue can certainly end here. I'm actually not too concerned about this specific issue, I'm more concerned about more possible GUI corruptions, but until this issue is fixed, everything works fine. And more exception handling would add to the burden.

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 20, 2023

Hi @CyrilleB79

After testing, this particular issue has indeed been resolved.

However an error was still caught with the following message:

Stack trace:
  File "nvda.pyw", line 406, in <module>
  File "core.pyc", line 811, in main
  File "wx\core.pyc", line 2237, in MainLoop
  File "gui\settingsDialogs.pyc", line 4248, in onCategoryChange
  File "gui\settingsDialogs.pyc", line 689, in onCategoryChange
  File "gui\settingsDialogs.pyc", line 671, in _doCategoryChange
  File "gui\settingsDialogs.pyc", line 599, in _getCategoryPanel
  File "gui\settingsDialogs.pyc", line 357, in __init__
  File "gui\settingsDialogs.pyc", line 367, in _buildGui
  File "gui\settingsDialogs.pyc", line 2634, in makeSettings

The detailed log is here.

nvda.log

@CyrilleB79
Copy link
Collaborator

The error message was intended to say: "Something went wrong on that system". The goal is to increase the probability that the user reports the issue in case the system does not know of a specific language as it has been the case in the past with Aragonese.
Let's see what NVAccess will think when they review the PR.

There are other places where language codes are converted to language names:

  • General settings: in the first combo-box
  • In the title of the "Symbol Pronunciation" dialog

What happens on your system in these two cases?

Thanks

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 20, 2023

What happens on your system in these two cases?

I used the previous version of NVDA for testing and everything seemed to be working fine and I doubted my previous findings. But the OCR panel still causes damage.

I'm sorry to say that I've hardly had a chance to do any testing on that system since then.

@CyrilleB79
Copy link
Collaborator

Sorry @hwf1324, your last report is not clear to me.

Did you test with nvda_snapshot_pr15031-28463,68da0242.exe?

  • Are the layout and the content of the combobox OK in the OCR settings panel?
  • What about the languages combobox's content and the title of the symbol Window? Are the names of languages displayed or are there cases where only the code or "None" is displayed?

Note that with this build, even if the panel layout is OK, the following error will be logged:
No description for language: zh_HANS_CN. Using language code instead.
It's not an uncaught exception and after thinking again to it, I should not log the traceback, only the error message; I will fix this.

Let me know if you have few or no possibilities to test in the future; in this case, I may simulate LCID returning no result myself but it's always better to test on the system where the issue really occurs. Moreover, I am wondering if the LCID to language name matching only fails with "zh-Hans-CN" or also for other or all language codes.

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 20, 2023

Hi @CyrilleB79

Did you test with nvda_snapshot_pr15031-28463,68da0242.exe?

Yes.

  • Are the layout and the content of the combobox OK in the OCR settings panel?

Content display language code

  • What about the languages combobox's content and the title of the symbol Window? Are the names of languages displayed or are there cases where only the code or "None" is displayed?

Display language name

Let me know if you have few or no possibilities to test in the future; in this case, I may simulate LCID returning no result myself but it's always better to test on the system where the issue really occurs. Moreover, I am wondering if the LCID to language name matching only fails with "zh-Hans-CN" or also for other or all language codes.

Yes, I still have a day or two to go. Can you give me a test code so I can execute the test?

@CyrilleB79
Copy link
Collaborator

Thanks. It should be OK.

For curiosity, could you give the result of the following commands in the console:

import ctypes
ctypes.windll.kernel32.LocaleNameToLCID('zh-CN',0)
ctypes.windll.kernel32.LocaleNameToLCID('zh-Hans-CN',0)
ctypes.windll.kernel32.LocaleNameToLCID('en',0)

Would be interesting to confirm if only zh-Hans-CN is having issue, so that it can be documented. Thanks.

@seanbudd seanbudd added triaged Has been triaged, issue is waiting for implementation. sightedDevIdeal p3 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority ux labels Jun 20, 2023
@hwf1324
Copy link
Contributor

hwf1324 commented Jun 21, 2023

Hi @CyrilleB79

It seems that the previous investigation was not accurate enough.

>>> import ctypes
>>> ctypes.windll.kernel32.LocaleNameToLCID('zh-CN',0)
2052
>>> ctypes.windll.kernel32.LocaleNameToLCID('zh-Hans-CN',0)
2052
>>> ctypes.windll.kernel32.LocaleNameToLCID('en',0)
1033
>>> import languageHandler
>>> from contentRecog import uwpOcr
>>> languageCodes = uwpOcr.getLanguages()
>>> languageCodes
['zh-Hans-CN']
>>> lang = languageCodes[0]
>>> lang
'zh-Hans-CN'
>>> languageHandler.normalizeLanguage(lang)
'zh_HANS_CN'
>>> languageHandler.getLanguageDescription(languageHandler.normalizeLanguage(lang))
>>> 
>>> languageHandler.getLanguageDescription(languageHandler.normalizeLanguage(lang))
>>> 
>>> languageHandler.getLanguageDescription('zh_HANS_CN')
>>> languageHandler.localeNameToWindowsLCID(lang)
2052
>>> languageHandler.localeNameToWindowsLCID(languageHandler.normalizeLanguage(lang))
0
>>> ctypes.windll.kernel32.LocaleNameToLCID(languageHandler.normalizeLanguage(lang),0)
0
>>> ctypes.windll.kernel32.LocaleNameToLCID(languageHandler.normalizeLanguage("zh-CN"),0)
0
>>> ctypes.windll.kernel32.LocaleNameToLCID(languageHandler.normalizeLanguage("en"),0)
1033
>>> languageHandler.normalizeLanguage("zh-CN")
'zh_CN'
>>> languageHandler.normalizeLanguage("en")
'en'
>>> languageHandler.normalizeLanguage("en-US")
'en_US'
>>> ctypes.windll.kernel32.LocaleNameToLCID("en-US",0)
1033
>>> ctypes.windll.kernel32.LocaleNameToLCID("en_US",0)
0

Based on the results above, the reason seems to be that "LocaleNameToLCID" does not accept parameters with "_". I am not sure about the reason for using the "normalizeLanguage" function to convert the language code when getting the language description.

@CyrilleB79
Copy link
Collaborator

@hwf1324 thanks for the thorough investigation.

However, your system is running an old unsupported version of Window, you may not have access to it in the future and you seem to be the only one to have reported this issue. Moreover, I am not very confident modifying all NVDA functions dealing with language conversion or normalization given the variability of the answers gotten when calling system functions on different systems. For all these reasons, as explained in #15031, I have decided to only make NVDA more robust so that the OCR panel is displayed correctly; but I prefer not try to implement a workaround to get the language name from the language code in your situation.

In the future, if you or anyone else gets the same issue on another more up-to-date system, let us know so that we can deal with it.

@cary-rowen
Copy link
Contributor Author

Hi @CyrilleB79

Thank you for your work on this.
I found another machine running Windows 10 1809 that can reproduce the issue.
But the latest commit #15031 fixes that, showing the language code in the combobox as expected.
@hwf1324 may have new discoveries about this.

Thanks

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 29, 2023

Hi @CyrilleB79

I tested "LocaleNameToLCID("en_US",0)" on Windows 7 and it also returned 0. This proves that the LocaleNameToLCID function in the old Kernel32.dll does not support the use of language codes with "_".

To ensure compatibility, we should not pass parameters like "en_US" with a "_" to the LocaleNameToLCID function.

		languageChoices = [
			languageHandler.getLanguageDescription(lang)
			for lang in self.languageCodes]

This is why I propose to remove the "normalizeLanguage" function directly here. It doesn't make any difference here if we have this function or not.

I have built a version without the "normalizeLanguage" function. @cary-rowen will help test it later.

@nvaccessAuto nvaccessAuto added this to the 2023.2 milestone Jun 30, 2023
seanbudd pushed a commit that referenced this issue Jun 30, 2023
Closes #15017 (work-around)

Summary of the issue:
On a specific system (see #15017), Windows cannot get the language name from LCID. This cause an uncaught error preventing the OCR settings panel to be fully created (and the previous panel to be cleared).

Description of user facing changes
Displaying the OCR settings will not fail anymore on some systems where it is not possible to derive the language name from LCID.

Description of development approach
When listing OCR languages, we use languageHandler.getLanguageDescription to determine the name given the language code. However this function may return None in case it cannot find a language name.
This return value had not been taken into account when listing the OCR languages.

Thus this PR allows to add the language code instead of is name in the OCR list as a fallback if the name cannot be found.

A debugWarning is logged in the function languageHandler.getLanguageDescription if the language cannot be found; note that this function is used in some add-ons.

In the settings panel instead, I log an error so that the situations with unknown languages can be captured more easily.
@cary-rowen
Copy link
Contributor Author

@hwf1324 wrote:

I have built a version without the "normalizeLanguage" function. @cary-rowen will help test it later.

Yes, @hwf1324 built a version that displays user-friendly language descriptions.
Here is the log.txt.

But I see that @seanbudd has already merged #15031, @hwf1324 may open a separate PR if this deserves further improvement
@CyrilleB79 Can you offer some advice?

Thanks

@CyrilleB79
Copy link
Collaborator

#15031 has been merged and it is already an improvement since the OCR panel does not fail anymore to be displayed and is not cluttered anymore with unrelated controls. However, in some cases, the language is not correctly detected and the language code is used in the list instead.

Since comments in closed issues are not easily tracked and may be lost, you should open a new issue describing the remaining issue with last alpha, i.e. language code instead of language name.

And of course, feel free to submit an associated PR to fix it!

Some notes regarding your last comments (you can summarize in your new issue).

I tested "LocaleNameToLCID("en_US",0)" on Windows 7 and it also returned 0. This proves that the LocaleNameToLCID function in the old Kernel32.dll does not support the use of language codes with "_".

To ensure compatibility, we should not pass parameters like "en_US" with a "_" to the LocaleNameToLCID function.

This piece of code should probably be rewritten better. However there is no issue with the OCR panel in Windows 7 since this panel is not present on Windows 7.

If you plan to modify or rewrite these normalization / conversion functions, keep in mind that they are used by OCR panel but also in other places such as NVDA language, symbol window title, etc. The problem is that the language code may come from various sources: NVDA language folders, Windows OCR language folders, language code from TTS (and there are various TTSs)...

At least, keep in mind that NVDA 2024.1 will likely remove Windows 7 support. So I do not think that it's worth to take it into account for your work, provided that you do not degrade the existing experience.

This is why I propose to remove the "normalizeLanguage" function directly here. It doesn't make any difference here if we have this function or not.

I had tested to remove the normalizeLanguage functions following your previous advices and investigations. However, the language name was not the same as shown in the following console commands:

>>> languageHandler.getLanguageDescription(languageHandler.normalizeLanguage('en-us'))
'Anglais (États-Unis)'
>>> languageHandler.getLanguageDescription('en-us')
'Anglais'

Dropping normalizeLanguage also removes the country name; this would probably be an issue in case you have for example UK English and US English installed, since "English" would appear twice in the list without any distinction.

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 30, 2023

Hi @CyrilleB79

Thank you very much for your reply.

It seems that there are still a lot of unclear details about the handling of language codes, which is a very complex issue.

That seems to be all that can be done at the moment, as my knowledge of it is limited. One thing I'm curious about is whether in the future NVDA will standardise the behaviour of all the language descriptions obtained?

@CyrilleB79
Copy link
Collaborator

One thing I'm curious about is whether in the future NVDA will standardise the behaviour of all the language descriptions obtained?

What do you mean? Again, if there is an issue and you wish it to be solved (or if you wonder if it can be solved), please open an issue.

@hwf1324
Copy link
Contributor

hwf1324 commented Jun 30, 2023

What do you mean? Again, if there is an issue and you wish it to be solved (or if you wonder if it can be solved), please open an issue.

For example, what you mentioned before:

There are other places where language codes are converted to language names:

  • General settings: in the first combo-box
  • In the title of the "Symbol Pronunciation" dialog

What happens on your system in these two cases?

Yes, if there is a problem in the future, an issue will be reopened.

@CyrilleB79
Copy link
Collaborator

There are other places where language codes are converted to language names:

  • General settings: in the first combo-box
  • In the title of the "Symbol Pronunciation" dialog

What happens on your system in these two cases?

The language list items in General settings are as follows:

  • "Anglais, en", i.e. "English, en"
  • "Allemand, de", i.e. "German, de"
  • "Allemand (Suisse), de_CH", i.e. "German (Switzerland), de_CH"
    That is: Language name + country name if NVDA support this specific dialect, else language name only (without country name). In the list item, there is first the name, then the language code.

The title of the symbol dialog is:

  • language name + country name if the TTS uses it and if this dialect belongs to NVDA's languages
  • language name only if it belongs to NVDA's language
  • English if the TTS language is not one of NVDA's language; this last point is actually a problem IMO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p3 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority sightedDevIdeal triaged Has been triaged, issue is waiting for implementation. ux
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants