Normalization of unicode cahracter: allow excluding the symbols in the symbols.dic file from the normalization #16624

Adriani90 · 2024-05-28T18:59:45Z

Is your feature request related to a problem? Please describe.

The normalization feature for unicode characters takes priority over symbols in the symbols.dic file. This leads to e.g. following problems

When normalization is on, superscript and subscript characters are not reported as such anymore, they are part of the symbols.dic file though
Characters like ′ (prime), ″ (double prime) or ‴ (triple prime) are all read as "prime" when normalization is on, although they are aproperly added to symbols.dic file.
Possibly other characters of the symbols.dic file are impacted.
So many characters are now reported in math equations, but not with the symbols.dic pronounciation.

Describe the solution you'd like

Exclude always symbols added to symbols.dic file from normalization.

Describe alternatives you've considered

None

Additional context

None

Adriani90 · 2024-05-28T19:01:07Z

cc: @LeonarddeR I hope this will not be a show stopper for this feature, indeed the symbols defined in the symbols file are really crucial to be pronounced as defined there, and not as prescribed by the normalization. This could be tricky to fix.
Sorry I've just discovered this problem right now.

LeonarddeR · 2024-05-28T19:22:54Z

Please provide exact steps to reproduce instead of just summing up what's wrong. IMO the bug report template would be more suitable here, since this is definitely not intentional.

LeonarddeR · 2024-05-28T19:24:11Z

Also, please consider testing with #16622 since I"m pretty sure it is already fixed there.

Adriani90 · 2024-05-28T19:49:36Z

Ah thanks, it seems with #16622 it works properly.

LeonarddeR · 2024-05-29T05:59:59Z

@seanbudd While I understand you reason for closing this, I'm inclined to leave this open and mark this as fixed as soon as #16622 is closed. I think the point raised by @Adriani90 is perfectly valid. It is an expected side effect of the current approach where normalization is only applied to text info and object speech normalization. Character processing and symbol pronunciation is applied thereafter.

seanbudd · 2024-05-29T06:19:32Z

@LeonarddeR - is it possible for the fix for this to be independent to #16622 / #16616 and make it into 2024.3?

LeonarddeR · 2024-05-29T06:25:37Z

Theoretically yes, that is if we normalize, we need to do symbol processing first, i.e. throw the text through processText. However I think that will also create unexpected side effects I can't oversee yet.
In that case, I'd rather change #16622 to stick with default disabled and have that in 2024.3.

Adriani90 · 2024-05-29T06:57:12Z

2024.3 is still far away, and #16622 seem to fix this. I still think it makes sense to have it enabled by default. This will result in broader community awareness. If side egffects appear, the default behavior could be disabled again later on. Having this enabled will definitely not introduce any severe bug, freeze or crash.

seanbudd · 2024-06-13T06:30:21Z

Closing as fixed by #16622

seanbudd closed this as not planned Won't fix, can't repro, duplicate, stale May 28, 2024

LeonarddeR reopened this May 29, 2024

This comment was marked as resolved.

Sign in to view

seanbudd added this to the 2024.4 milestone May 29, 2024

CyrilleB79 mentioned this issue May 30, 2024

Unicode normalization follow up, adding character navigation and several fixes #16622

Merged

7 tasks

seanbudd modified the milestones: 2024.4, 2024.3 Jun 3, 2024

seanbudd added p3 https://github.com/nvaccess/nvda/blob/master/projectDocs/issues/triage.md#priority triaged Has been triaged, issue is waiting for implementation. labels Jun 4, 2024

seanbudd closed this as completed Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization of unicode cahracter: allow excluding the symbols in the symbols.dic file from the normalization #16624

Normalization of unicode cahracter: allow excluding the symbols in the symbols.dic file from the normalization #16624

Adriani90 commented May 28, 2024

Adriani90 commented May 28, 2024

LeonarddeR commented May 28, 2024

LeonarddeR commented May 28, 2024

Adriani90 commented May 28, 2024

LeonarddeR commented May 29, 2024

This comment was marked as resolved.

seanbudd commented May 29, 2024

LeonarddeR commented May 29, 2024

Adriani90 commented May 29, 2024

seanbudd commented Jun 13, 2024

Normalization of unicode cahracter: allow excluding the symbols in the symbols.dic file from the normalization #16624

Normalization of unicode cahracter: allow excluding the symbols in the symbols.dic file from the normalization #16624

Comments

Adriani90 commented May 28, 2024

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Adriani90 commented May 28, 2024

LeonarddeR commented May 28, 2024

LeonarddeR commented May 28, 2024

Adriani90 commented May 28, 2024

LeonarddeR commented May 29, 2024

This comment was marked as resolved.

seanbudd commented May 29, 2024

LeonarddeR commented May 29, 2024

Adriani90 commented May 29, 2024

seanbudd commented Jun 13, 2024