Sentence navigation #16384

mltony · 2024-04-11T21:53:09Z

Link to issue number:

Closes #8518.

Summary of the issue:

Sentence navigation.

Description of user facing changes

Alt+Down/UpArrow now navigates by sentence in editables and in browse mode.
Sentence reconstruction option added to document navigation panel in settings. This defines to what extent sentence navigation can look into adjacent paragraphs when trying to reconstruct a sentence.
Alsoadded an option to specify non-breaking prefixes for current language.

Description of development approach

Sentence splitting algorithm is implemented in documentNavigation\sentenceNavigation.py. It is generic enough and works with different types of textInfos. I left some comments throughout to explain inner workings, but LMK if more clarification/ explanation is needed. I ended up rewriting the algorithm (compared to SentenceNav add-on) from scratch - now the code should be cleaner and more elegant.
In editableText.py I added speakCurrentSentence script - not assigned any default keystroke. Hope that's ok - since SentenceNav users requested that in the past.
In OffsetsTextInfo updated _getUnitOffsets() function to deal with UNIT_SENTENCE.
In UIATextInfo updated expand() and move() methods to work with UNIT_SENTENCE.
In browseMode.py updated script_collapseOrExpandControl: if current item is not collapsable, then treat alt+Up/Down as sentence navigation commands.
Existing sentence navigation logic for MSWord is not affected.

Testing strategy:

Manual testing in:
- Notepad
- Notepad++
- Visual Studio
- VSCode
- Chrome - browse mode and textArea
- Firefox browse mode and TextArea
Unit tests

Known issues with pull request:

Ready for review.
~~This is a draft, but fully working. Feel free to test or even start reviewing. Things I plan to address in the coming few days:~~

~~Figure out how to deal with language-specific non-breaking prefixes.~~
~~Add docuemntation~~
~~Add unit tests~~

Code Review Checklist:

Documentation:
- Change log entry
- User Documentation
- Developer / Technical Documentation
- Context sensitive help for GUI changes
Testing:
- Unit tests
- System (end to end) tests
- Manual testing
UX of all users considered:
- Speech
- Braille
- Low Vision
- Different web browsers
- Localization in other languages / culture than English
API is compatible with existing add-ons.
Security precautions taken.

AppVeyorBot · 2024-04-11T22:29:29Z

PASS: Unit tests.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/n0j629p6l7j97var/artifacts/output/nvda_snapshot_pr16384-31584,a1f18e45.exe
PASS: Translation comments check.
FAIL: Lint check. See test results for more information.
PASS: System tests (tags: installer NVDA).
CI timing (mins):
INIT 0.0,
INSTALL_START 1.0,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 30.0,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 2.4,
FINISH_END 0.2

See test results for failed build of commit a1f18e45a7

mltony · 2024-04-11T23:19:12Z

I have a technical question regarding how to deal with non-breaking prefixes.
Define non-breaking prefix is a word that goes right before period, but that suggests that the period doesn't serve as a sentence breaker. Here are some examples in English:

Titles like Dr., Mr, Ms, Prof.
Some abbreviations like i.e., s.t.
Single capital letters. E.g. "George H. W. Bush"
As a counterexample, not all abbreviations should be included. For example, "etc." should not be included, since a sentence can end with "etc.". In this case we should prefer to have a false positive rather than a false negative.

Non-breaking prefixes are obviously language-dependent. So far I just hardcoded English non-breaking prefixes, but that obviously needs to be changed before merging. I think we should make non-breaking prefixes to go through translation system to have it translated to all NVDA languages. But I see a few issues here to call out:

This is not a typical string to translate and it would require some research of the language. I will write a long #translators message explaining what is non-breaking prefixes and what's the expected format.
That string must be |-separated list of valid fixed-length regular expressions. Fixed length requirement is coming from the fact that each regex in the list is packed into a negative look-behind clause, which can only accept fixed length regex. So for example S* would be invalid for NLB. I expect some translators will not be familiar either with regexes in general or this specific requirement, so mistakes will be made. I haven't dealt with NVDA translation system before, so wondering, is there a way to set up a custom validator for a translateable string? If not, then can it be arranged to manually check all incoming translations of the string and invalidate those that will turn out to be invalid?
Another issue is identifying current language. Often times people use NVDA to read context in multiple languages, so perhaps a good idea would be to take the current language of synthesizer and retrieve non-breaking prefixes in that language, instead of default NVDA language. There are two problems here:
- How to map current synth language onto NVDA languages? In the past I remember for Russian some synthesizers return "ru_ru", whereas in NVDA I only see "ru". An obvious solution would be to try to identify language by the first two letters, but then I see some country-specific languages in user_docs directory, such as de_CH. How big deal would it be to ignore such country-specific languages?
- What is API call to retrieve a string in another language? I bet it must be possible, but I looked through NVDA source code and didn't find any examples - would be great if someone knows.
Do we need to create a setting to specify non-breaking prefixes? In the past I've been given feedback that I tend to expose too many settings and it's confusing to the users - so asking here to get community's opinion.

cary-rowen · 2024-04-12T00:29:17Z

Thanks @mltony,

Based on Sentence-Nav I want to know:

Will phrase navigation be included here?
Will there also be a user-defined regular expression for sentence splitting?
How do we deal with MSWord? There is an option in the Sentence-Nav add-on to override the default sentence navigation of MSWord. It is very useful not only for MSWord but also for Bookworm.

mltony · 2024-04-12T00:59:58Z

Will phrase navigation be included here?

No. But it should be very straightforward to add it later provided NVAccess is aligned.

Will there also be a user-defined regular expression for sentence splitting?

It currently uses the same regex as for text paragraph navigation, which is configurable. But its only a single regex.

How do we deal with MSWord? There is an option in the Sentence-Nav add-on to override the default sentence navigation of MSWord. It is very useful not only for MSWord but also for Bookworm.

I don't override MSWord sentence navigation here. It should be easy to override later - again as long as NVAccess is aligned.

In general I think getting this PR approved and merged is going to be somewhat challenging because there are many unknowns regarding non-breaking prefixes. So at this point I would like to focus on core sentence navigation, and postpone bells and whistels to later PRs.

XLTechie · 2024-04-12T02:00:31Z

@mltony wrote:

Also, not sure how to fix lint error in configSpec.py - line is indeed to long, but trying to break it up leads to invalid config.

At end of line, after two spaces:

# noqa: E501 Breaking this line makes invalid config

mltony · 2024-04-12T02:45:26Z

Can you use a tuple of strings

How can we send that tuple to translators? If translation system supports tuples of variable length then should be doable, but please show me an example.

XLTechie · 2024-04-14T02:43:19Z

@mltony I have no idea. Nor do I know how you'll move a long and fragile regular expression string through the translation process. Especially with the translation system the way it is now, I know even less about it than I did before. Apologies for the inapplicable idea.

mltony · 2024-04-17T21:34:58Z

This is now ready for review.

cary-rowen · 2024-04-17T23:14:48Z

Hi @mltony
I'm getting the following error when trying to do sentence navigation using the latest build.

IO - speech.speech.speak (07:12:27.736) - MainThread (11956):
Speaking ['日志片段开始点已标记，再按一次复制到剪贴板']
IO - inputCore.InputManager.executeGesture (07:12:28.564) - winInputHook (3860):
Input: kb(laptop):alt+downArrow
ERROR - scriptHandler.executeScript (07:12:28.582) - MainThread (11956):
error executing script: <bound method BrowseModeDocumentTreeInterceptor.script_collapseOrExpandControl of <NVDAObjects.IAccessible.chromium.ChromeVBuf object at 0x0B435150>> with gesture 'Alt+下光标'
Traceback (most recent call last):
  File "scriptHandler.pyc", line 295, in executeScript
  File "browseMode.pyc", line 1689, in script_collapseOrExpandControl
  File "cursorManager.pyc", line 157, in _caretMovementScriptHelper
  File "textInfos\offsets.pyc", line 538, in expand
  File "virtualBuffers\__init__.pyc", line 405, in _getUnitOffsets
  File "textInfos\offsets.pyc", line 512, in _getUnitOffsets
  File "documentNavigation\sentenceNavigation.pyc", line 210, in __init__
  File "documentNavigation\sentenceNavigation.pyc", line 145, in getSentenceStopRegex
KeyError: '1,3'
IO - inputCore.InputManager.executeGesture (07:12:29.599) - winInputHook (3860):
Input: kb(laptop):control+shift+NVDA+f1

There seems to be no language difference, I used the English and Chinese versions of the NVDA User Guide for testing.

source/config/configDefaults.py

mltony · 2024-04-18T21:34:16Z

@cary-rowen, try resetting your value for text paragraph regex. It should work with the default value.

AppVeyorBot · 2024-04-18T22:10:42Z

PASS: Translation comments check.
PASS: Unit tests.
FAIL: Lint check. See test results for more information.
FAIL: System tests (tags: installer NVDA). See test results for more information.
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/c5vaeu214msswiba/artifacts/output/nvda_snapshot_pr16384-31654,24be7c84.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.1,
INSTALL_END 1.0,
BUILD_START 0.0,
BUILD_END 31.1,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 2.4,
FINISH_END 0.2

See test results for failed build of commit 24be7c8425

Adriani90 · 2024-04-18T22:29:44Z

@mltony did you test this in browse mode in pdf documents, with Adobe reader, outlook messages in MS outlook and browse mode in MS Word? Does it give an error in browse mode in MS Excel? Or did you implement a message that this is not supported there?

Adriani90 · 2024-04-18T23:21:49Z

@mltony Re non breaking prefixes, it might be worth to add all so called "period words" in a database first and import them from there. Maybe there could be a file similar to symbols.dic called periodwords.dic. At least with such a file you could pass it to the translation system and we would get all of the international prefixes translated while translators can also add their language specific period words to their own language's periodwords.dic file.
Ofcourse the correct sentence navigation would depend on translation work.

mltony · 2024-04-18T23:47:14Z

@Adriani90

Browse mode in browsers: tested, works fine
PDF reader browse mode: tested, works fine
MSWord and Outlook: legacy behavior overrides functionality oof my PR, so no change here
In MS Excel alt+Down arrow seems to trigger expand action, so seems no regressions here.
Re symbols file - that's a reasonable idea. I guess I'll just wait for a few days before working on that just in case anyone else has objections or better ideas.

AppVeyorBot · 2024-04-19T00:27:07Z

PASS: Translation comments check.
PASS: Unit tests.
FAIL: Lint check. See test results for more information.
PASS: System tests (tags: installer NVDA).
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/4rd4tj4u94poj6kt/artifacts/output/nvda_snapshot_pr16384-31656,ee7ed21a.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.1,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 29.2,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 2.3,
FINISH_END 0.2

See test results for failed build of commit ee7ed21aad

cary-rowen · 2024-04-19T04:43:41Z

Hi @mltony

Thank you for your efforts on this. I will continue to pay attention to this feature and provide the following test work for your reference.

Test in Windows 10 Notepad:

Steps

Write the following Chinese content : "今天天气很好。我想去爬山。" These are two sentences containing two periods.
Press Alt+Up/DownArrow.
Add a space after the first period.
Place the caret at the beginning of the text and press Alt+Downarrow repeatedly.

Actual behavior:

Unable to navigate by sentence when performing step (2).
NVDA reports the second sentence when performing step (4).

expected behavior

Perform step 2 to navigate by sentence. No matter whether there is a space after the period, because in Chinese writing, there is generally no space after the period.
When performing step (4), "blank" should be reported, consistent with the behavior in MS Word.

Test in NVDA Speech Viewer or Bookworm:

Sentence navigation is not working. If you use the Sentence-nav add-on, you need to enable overriding MS Word's sentence navigation behavior in settings.
So, as far as this PR is concerned, I think we should have a solution for similar situations like the Bookworm rich text control.

Thanks

mltony · 2024-04-19T18:11:56Z

@cary-rowen

Re step 2: fixed regexp, try again.
Also did I get Chinese question mark and exclamation mark correctly in the regexp?
Re step 4: This one is complicated. So I implement in this PR functions like UIATextInfo.expand(UNIT_SENTENCE). If you want NVDA to report blank in this case, that means that when the cursor is at the last position of the document, it should expand into empty sentence. I actually designed sentences to be non-empty. Trying to redesign just to account for this case is going to be difficult and any attempt to redesign is likely to cause bugs in many other edge cases.
Trying to analyze behavior with different textInfo units, I played with UNIT_WORD. If you type: "Hello world" without quotes into Notepad on Windows 11 and try to press control+rightArrow repeatedly, the cursor would stay at the word "world" and not find a blank word in the end. Although in notepad++ it does find the blank word. So behavior is not consistent here.
So given the fact that there is already some inconsistency present with other textInfo units and given how hard and hacky it's going to be to add fake empty sentence at the end, I would ask whether it's even worth it to tackle this issue. WDYT?
Re NVDA Speech Viewer : acctually it works on my side. It doesn't appear to be using MSWord control inside.
Re Bookworm: I prefer to postpone this until after this PR is merged. I'm worried that the more features we add here, the more likely we'll get bogged down with discussions on implementation details.

AppVeyorBot · 2024-04-19T19:26:31Z

PASS: Translation comments check.
PASS: Unit tests.
FAIL: Lint check. See test results for more information.
PASS: System tests (tags: installer NVDA).
Build (for testing PR): https://ci.appveyor.com/api/buildjobs/3n1p8rlwadg4r911/artifacts/output/nvda_snapshot_pr16384-31657,cf90f526.exe
CI timing (mins):
INIT 0.0,
INSTALL_START 1.1,
INSTALL_END 0.9,
BUILD_START 0.0,
BUILD_END 28.9,
TESTSETUP_START 0.0,
TESTSETUP_END 0.4,
TEST_START 0.0,
TEST_END 2.3,
FINISH_END 0.2

See test results for failed build of commit cf90f52694

cary-rowen · 2024-04-19T20:24:34Z

Re step 2: fixed regexp, try again.
Yes, sentence navigation now works as expected.

Also did I get Chinese question mark and exclamation mark correctly in the regexp?

Yes.

So given the fact that there is already some inconsistency present with other textInfo units and given how hard and hacky it's going to be to add fake empty sentence at the end, I would ask whether it's even worth it to tackle this issue. WDYT?

OK, regarding this behavior, if it is very difficult then I don't think there is a need to put too much effort into it.

If you type: "Hello world" without quotes into Notepad on Windows 11 and try to press control+rightArrow repeatedly, the cursor would stay at the word "world" and not find a blank word in the end.

Yes, I reproduce this on Notepad on Windows 10 as well.

As an improvement on step 4, I think another behavior might be better:
NVDA will report the last sentence repeatedly when we press Alt+DownArrow repeatedly, And the caret has been moved to the end of the last sentence. This means that if the user presses Alt+UpArrow once you will hear NVDA continue reporting the last sentence.

Personally, I would like it to report the first sentence, ie: when there are no more sentences our caret should be placed at the beginning of the last sentence, no matter how many times I press Alt+DownArrow.

This is consistent with the paragraph navigation pattern in document navigation.
This is also consistent with the behavior of the sentence-nav add-on, and users of the sentence navigation add-on will have a consistent experience as before.

Re NVDA Speech Viewer : acctually it works on my side. It doesn't appear to be using MSWord control inside.

That's weird, can you confirm this again, it seems to me that Speech Viewer behaves the same as Bookworm, i.e. sentence navigation doesn't work at all.

The test I performed was with Speech Viewer open: I asked NVDA to report the Chinese sentences given in the comments above. Then try sentence navigation in Speech Viewer.
NVDA seems to treat the entire Speech Viewer as one sentence.

cary-rowen · 2024-04-22T02:24:40Z

Hi @mltony

While reminding you not to forget the above comment, I thought of another question.

I'm wondering, is it necessary that we share the regex option with paragraph navigation?
The current regular expression options are very complex and may be completely uncustomizable by novice users. Even someone like me who can read regular expressions may need to spend a lot of time understanding it.

Is there a way we can make the regex options more understandable?
If not, that’s no problem. Our goal is to provide users with an experience that they can use right out of the box, rather than requiring users to modify it before they can use it.

Thanks

CyrilleB79 · 2024-04-22T09:05:08Z

I have not reviewed this PR.

But, on contrary to what I wrote earlier, I understand now why the regexp for text paragraph navigation and the one for sentence navigation may be separately configurable.
For now, the criterion to identify a text paragraph is if it contains a sentence. However, we may imagine in the future that we use a different criterion (even if I do not have it in mind now).

Of course, since we do not have another criterion now, we may also wait for a real need before duplicating these fields.
@cary-rowen, have you a real use case / interest to configure differently text paragraph navigation and sentence navigation?

Re the possibility to better understand the regexp, we may use a verbose regexp in the field, rather than a normal one. Verbose regexps ignore comments, spaces and newline characters (unless escaped). We should however take into account that verbose regexps are less known than normal ones; I have discovered this option only some weeks ago...

Adriani90 · 2024-04-24T15:12:08Z

I agree in case of sentence navigation, the carret should always stay at the beginning of the sentence as it does with paragraph navigation when the paragraph is handled by NVDA in document navigation settings, no matter if it is the first or the last sentence or any senence in the middle.
Instead of repeating the first or the last sentence, there could be a message reporting "no next sentence" or "no previous sentence". This is also consistent with NVDA paragraph navigation.

cary-rowen · 2024-04-24T15:28:15Z

@CyrilleB79

Have you a real use case / interest to configure differently text paragraph navigation and sentence navigation?

No, I haven't yet. I just have the above thoughts from the perspective of ease of understanding.
However, we can focus on the core implementation of this PR, and put other things into separate PRs to discuss or implement.

cary-rowen · 2024-04-24T15:35:45Z

Instead of repeating the first or the last sentence, there could be a message reporting "no next sentence" or "no previous sentence". This is also consistent with NVDA paragraph navigation.

To be honest, I don’t really like using speech to indicate boundaries.
At this point, NVDA's paragraph navigation is inconsistent with other behaviors, such as paragraph navigation in Word.

I prefer to use beeps or play sound to indicate boundaries.

A similar request has already been made by #13612.

cary-rowen · 2024-04-24T15:39:07Z

By the way, sentence-nav add-on is better and there are also beep when navigating from one paragraph to another.

Adriani90 · 2024-04-24T15:44:12Z

There is no consistency in paragraph navigation between apps. Libreoffice behaves differently compared to ms Word for example. Nvda paragraph navigation is the most consistent paragraph handling we have so far.Von meinem iPhone gesendetAm 24.04.2024 um 17:36 schrieb Rowen ***@***.***>: Instead of repeating the first or the last sentence, there could be a message reporting "no next sentence" or "no previous sentence". This is also consistent with NVDA paragraph navigation. To be honest, I don’t really like using speech to indicate boundaries. At this point, NVDA's paragraph navigation is inconsistent with other behaviors, such as paragraph navigation in Word. I prefer to use beeps or play sound to indicate boundaries. A similar request has already been made by #13612. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

Adriani90 · 2024-04-24T15:45:43Z

Indicating boundaries by sounds is a valid issue but out of scope for this PR I guess.Von meinem iPhone gesendetAm 24.04.2024 um 17:36 schrieb Rowen ***@***.***>: Instead of repeating the first or the last sentence, there could be a message reporting "no next sentence" or "no previous sentence". This is also consistent with NVDA paragraph navigation. To be honest, I don’t really like using speech to indicate boundaries. At this point, NVDA's paragraph navigation is inconsistent with other behaviors, such as paragraph navigation in Word. I prefer to use beeps or play sound to indicate boundaries. A similar request has already been made by #13612. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

cary-rowen · 2024-04-24T22:15:52Z

Nvda paragraph navigation is the most consistent paragraph handling we have so far.

You confuse me that this scheme has never been done elsewhere except for what NVDA implemented in #13798 - using speech to indicate the last and first paragraphs. Where does your so-called "consistency" come from in this case?

Indicating boundaries by sounds is a valid issue but out of scope for this PR I guess.

What we have been discussing is how to prompt the user when navigating to the first and last sentences during sentence navigation. Speech or play sound are the two options under this issue.

Like it was done in #13798, so I don't understand even more what you mean by out of scope.

Adriani90 · 2024-04-25T07:33:11Z

What I wanted to say is that when paragraphs are handled by applications, in some applications NVDA repeats the first or last paragraph, in some applications NVDA is silent, in some applications a windows bing sound is heared, etc.
NVDA has a good basis for reporting the boundaries, and this can ofcourse be improved with sounds or what ever, but this does not affect only paragraphs and is in my view at least still out of scope for this PR.

For now The boundary reporting for paragraphs in NVDA is consistent all over the place when paragraphs are handled by NVDA.
Tony can make use of the internal paragraph handling to design the boundaries for documents and if we improve the boundary reporting, then this should be done for both in a separate PR.
But this affects also other navigation paterns handled by NVDA such as line by line navigation, every kind of review cursor or object navigation that reaches the beginning or the end of an element etc.

Qchristensen

Minor recommended change to user guide text, otherwise docs read well.

Qchristensen · 2024-04-26T04:46:23Z

user_docs/en/userGuide.t2t

+
+This combo box allows you to select whether NVDA will try to identify and reconstruct sentences spanning across multiple paragraphs when navigating by sentence via``alt+upArrow`` and ``alt+downArrow``.
+Sentence reconstruction comes in handy in two typical use cases:
+- In most Most PDF files text is incorrectly split into paragraphs. Every line typically denotes a paragraph and thus most sentences span many adjacent paragraphs and in order to identify complete sentences NVDA must look into adjacent paragraphs.


Duplicate word: most (In most Most PDF files...). Also "Every line typically denotes a...." should be on a new line. I would also split into two sentences as:

Every line typically denotes a paragraph and thus most sentences span many adjacent paragraphs.
In order to identify complete sentences NVDA must look into adjacent paragraphs.

CyrilleB79

I have done a first review of a part of the code and listed comments/suggestions here.

However, I'll also provide more important remarks in my next comment.

CyrilleB79 · 2024-05-03T12:55:52Z

source/gui/settingsDialogs.py

@@ -2645,9 +2647,60 @@ def makeSettings(self, settingsSizer: wx.BoxSizer) -> None:
 		)
 		self.bindHelpEvent("ParagraphStyle", self.paragraphStyleCombo)

+		# Translators: This is a label for sentence reconstruction in the document navigation dialog
+		label = _("&Sentencew Sentence reconstruction:")


Fix typo here:

Suggested change

label = _("&Sentencew Sentence reconstruction:")

label = _("&Sentence reconstruction:")

CyrilleB79 · 2024-05-03T13:16:32Z

source/config/configDefaults.py

 		# Look behind clause ensures that we have a text character before text punctuation mark.
 		# We have a positive lookBehind \w that resolves to a text character in any language,
 		# coupled with negative lookBehind \d that excludes digits.
-		lookBehind=r'(?<=\w)(?<!\d)',
+		plb=r"(?<=\w)(?<!\d)",


nlb is quite unclear. Could we use a more explicit token name?

E.g. letterCharBefore indicates that it is a look behind assertion (with Before) and what it should contain (a letter character).

Suggested change

plb=r"(?<=\w)(?<!\d)",

letterCharBefore=r"(?<=\w)(?<!\d)",

Also, is the negative look behind assertion needed? If we already know that the character before is a letter (\w), we probably do not need to test that it is not a digit (\d). Or do I miss something?
So the expression would rather become as follows:

Suggested change

plb=r"(?<=\w)(?<!\d)",

letterCharBefore=r"(?<=\w)",

CyrilleB79 · 2024-05-03T13:25:02Z

source/config/configDefaults.py

+		# Language-specific exceptions: characters suggesting that the following dot is not indicative
+		# of a sentence stop.
+		# This is a negative look-behind and will be inserted later when language is specified.
+		nlb="{nonBreakingRegex}",


Probably sem here: better something more explicit indicating what the expression matches.

Suggested change

nlb="{nonBreakingRegex}",

nonBreakingRegex="{nonBreakingRegex}",

CyrilleB79 · 2024-05-03T13:26:14Z

source/gui/settingsDialogs.py

+		try:
+			sentenceNavigation.getSentenceStopRegex(nonBreakingPrefixes=nbp)
+		except re.error as e:
+			log.debugWarning("Failed to compile text paragraph regex", exc_info=True)


Suggested change

log.debugWarning("Failed to compile text paragraph regex", exc_info=True)

log.debugWarning("Failed to compile sentence stop regex", exc_info=True)

CyrilleB79 · 2024-05-03T13:29:26Z

source/gui/settingsDialogs.py

+		except re.error as e:
+			log.debugWarning("Failed to compile text paragraph regex", exc_info=True)
+			gui.messageBox(
+				# Translators: Message shown when invalid text paragraph regex entered


Suggested change

# Translators: Message shown when invalid text paragraph regex entered

# Translators: Message shown when invalid sentence stop regex entered

CyrilleB79 · 2024-05-03T13:34:56Z

user_docs/en/userGuide.t2t

@@ -2502,6 +2502,26 @@ Note that this paragraph style cannot be used in Microsoft Word or Microsoft Out

 You may toggle through the available paragraph styles from anywhere by assigning a key in the [Input Gestures dialog #InputGestures].

+==== Sentence reconstruction = ====[SentenceReconstruction]


Typo:

Suggested change

==== Sentence reconstruction = ====[SentenceReconstruction]

==== Sentence reconstruction ====[SentenceReconstruction]

CyrilleB79 · 2024-05-03T13:43:08Z

user_docs/en/userGuide.t2t

+
+This combo box allows you to select whether NVDA will try to identify and reconstruct sentences spanning across multiple paragraphs when navigating by sentence via``alt+upArrow`` and ``alt+downArrow``.
+Sentence reconstruction comes in handy in two typical use cases:
+- In most Most PDF files text is incorrectly split into paragraphs. Every line typically denotes a paragraph and thus most sentences span many adjacent paragraphs and in order to identify complete sentences NVDA must look into adjacent paragraphs.


I have already seen such PDFs, not "most of them" though.

My question is: is this issue coming from the PDF only or also from how NVDA renders it in the virtual buffer? Said otherwise: Are the words also split into separate lines also with Jaws and Narrator? If not, could NVDA implement a fix tha would even not be visible at all from a user point of view?

Paragraph reconstruction is a nice work around provided for Sentence Nav (add-on or core feature). But have we investigated the root cause of this and decided that we cannot provide better than this workaround?

CyrilleB79 · 2024-05-03T13:49:09Z

source/config/featureFlagEnums.py

+	def _displayStringLabels(self):
+		return {
+			# Translators: Sentence reconstruction mode
+			SentenceReconstructionFlag.NEVER: _("Never reconstruct sentences across paragraphs"),


In the GUI, the label is already "Sentence reconstruction"; thus, there is no need to duplicate these words in the combo-box's values.

Suggested change

SentenceReconstructionFlag.NEVER: _("Never reconstruct sentences across paragraphs"),

SentenceReconstructionFlag.NEVER: _("Never"),

CyrilleB79 · 2024-05-03T13:50:07Z

source/config/featureFlagEnums.py

+			# Translators: Sentence reconstruction mode
+			SentenceReconstructionFlag.NEVER: _("Never reconstruct sentences across paragraphs"),
+			# Translators: Sentence reconstruction mode
+			SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Reconstruct sentences across same style paragraphs"),


Idem:

Suggested change

SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Reconstruct sentences across same style paragraphs"),

SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Across same style paragraphs"),

CyrilleB79 · 2024-05-03T13:50:40Z

source/config/featureFlagEnums.py

+			# Translators: Sentence reconstruction mode
+			SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Reconstruct sentences across same style paragraphs"),
+			# Translators: Sentence reconstruction mode
+			SentenceReconstructionFlag.ANY_PARAGRAPHS: _("Reconstruct sentences across any paragraphs"),


Idem:

Suggested change

SentenceReconstructionFlag.ANY_PARAGRAPHS: _("Reconstruct sentences across any paragraphs"),

SentenceReconstructionFlag.ANY_PARAGRAPHS: _("Across any paragraphs"),

CyrilleB79 · 2024-05-03T15:05:37Z

Here are some more general remarks.
Note that the feature does not seem to work on my end.

Test in Chrome's browse mode

Pressing Alt+downArrow in this GitHub's issue:

IO - inputCore.InputManager.executeGesture (16:05:38.597) - winInputHook (9664):
Input: kb(desktop):alt+downArrow
DEBUG - config.featureFlag._validateConfig_featureFlag (16:05:38.643) - MainThread (1980):
Validating feature flag: DEFAULT, optionsEnum: SentenceReconstructionFlag, behaviorOfDefault: SAME_STYLE_PARAGRAPHS
ERROR - scriptHandler.executeScript (16:05:38.657) - MainThread (1980):
error executing script: <bound method BrowseModeDocumentTreeInterceptor.script_collapseOrExpandControl of <NVDAObjects.IAccessible.chromium.ChromeVBuf object at 0x0B7F3FF0>> with gesture 'alt+flèche bas'
Traceback (most recent call last):
  File "documentNavigation\sentenceNavigation.py", line 146, in getSentenceStopRegex
    regex = regex.format(nonBreakingRegex=nonBreakingNLBs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '1,3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scriptHandler.py", line 295, in executeScript
    script(gesture)
  File "browseMode.py", line 1689, in script_collapseOrExpandControl
    return self._caretMovementScriptHelper(gesture, textInfos.UNIT_SENTENCE, direction)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "cursorManager.py", line 157, in _caretMovementScriptHelper
    info.expand(unit)
  File "textInfos\offsets.py", line 538, in expand
    self._startOffset,self._endOffset=self._getUnitOffsets(unit,self._startOffset)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "virtualBuffers\__init__.py", line 405, in _getUnitOffsets
    return super(VirtualBufferTextInfo, self)._getUnitOffsets(unit, offset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "textInfos\offsets.py", line 512, in _getUnitOffsets
    context = SentenceContext(textInfo)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "documentNavigation\sentenceNavigation.py", line 213, in __init__
    regex = getSentenceStopRegex()
            ^^^^^^^^^^^^^^^^^^^^^^
  File "documentNavigation\sentenceNavigation.py", line 148, in getSentenceStopRegex
    raise RuntimeError("Failed to substitute non-breaking prefixes into sentence regular expression. Please make sure your sentence regular expression is valid.", e)
RuntimeError: ('Failed to substitute non-breaking prefixes into sentence regular expression. Please make sure your sentence regular expression is valid.', KeyError('1,3'))

Documentation navigation panel

Open this panel
Validate without changing anything

Same style of error as above.

In Notepad

The feature is not working with nothing logged in Notepad (Windows 10). Is it expected to work there?

Exception listing

The exceptions for sentence stop are listed for all expressions in all languages.

I am not totally convinced of this way to do. However, if we do not find a better solution and if it is validated by NV Access:

If this field remains in this panel, I'd prefer a comma separated list rather than a regex; it's probably easier for all users. In advanced settings panel, a regex is OK though.
May we consider a default exception list for all languages, e.g. a single letter followed by a dot?
We should ask translators to provide exceptions for their language as much as possible. Maybe after having merged this PR, translators can test sentence nav in NVDA alpha; it will be easy to provide one or more subsequent PRs to add language specific exceptions.
I did not check it. But how have you handled language variants such as "pt" vs "pt_PT" / "pt_BR" or "en" vs "en_AU"?

Unknown languages

The following languages are recognized as unknown in the settings panel:

many eSpeak languages such as (list not complete):
- English (Landcaster)
- Esperanto
- Vietnamese (Central)
- Chinese (Mandarin, latin as Pinyin)
- Chinese (Mandarin, latin as English)
- Chinese (Cantonese)
- Armenian (West Armenia)
- Haitian Creole
IBMTTS: English UK

IMO, something should be done to have less unrecognized languages. At least English variants should fallback to English (neutral) and main Chinese languages should be recognized.

More generally, if possible, we may log a warning and display the language code in case the language name is not recognized by Windows/NVDA, but not default to Unknown language.

CyrilleB79 · 2024-05-03T15:10:42Z

source/gui/settingsDialogs.py

+		prettyLanguage = prettyLanguage or _("Unknown language")
+		# Translators: This is the label for a textfield in the
+		# Document Navigation settings panel.
+		label = _("Non-breaking prefixes [%s]")


Why is it called prefix?

Suggested change

label = _("Non-breaking prefixes [%s]")

label = _("Sentence stop exceptions [%s]")

or something else similar or even more suitable.

mltony · 2024-05-05T16:57:36Z

@CyrilleB79, thanks for comments, I'll address them soon;
If you are getting exception with message:

KeyError: '1,3'

You would need to delete textParagraphRegex item in [virtualBuffers] section from your nvda.ini file to revert to the default value.
Although since we missed the window to merge this into v2024.2, most likely I'd need to update this PR to use a separate option for sentence regexp.
As for notepad being silent - there is legacy code that swallows any when trying to navigate by unit - most likely it's just swallowing the same KeyError in your case.

seanbudd

Hi @mltony - this is just a partial review, as I've already accumulated a large amount of feedback

seanbudd · 2024-04-19T02:46:30Z

source/browseMode.py

-				return punctuationMarksRegex.search(info.text) is not None
+				# sentence regex always matches beginning and end of string. We add some words at the end to test
+				# whether there is third match in the middle, that would be suggestive of a complete sentence.
+				text = info.text + " traiiling word"


I would encourage investigating alternative approaches here. is there a case where this isn't just +1? can you describe it in more detail in a comment.

otherwise, fixing the typo would be good

Suggested change

text = info.text + " traiiling word"

text = info.text + " trailing word"

seanbudd · 2024-04-19T02:46:53Z

user_docs/en/userGuide.t2t

+| Next sentence | alt+downArrow | alt+downArrow | Moves the caret to the next sentence and announces it. |
+| Previous sentence | alt+upArrow | alt+upArrow | Moves the caret to the previous sentence and announces it. |


please add code formatting to the shortcuts

seanbudd · 2024-04-19T02:47:26Z

user_docs/en/changes.t2t

@@ -8,6 +8,7 @@ What's New in NVDA
 == New Features ==
 - New key commands:
  - New Quick Navigation command ``p`` for jumping to next/previous text paragraph in browse mode. (#15998, @mltony)
+  - Sentence navigation via ``alt+Up/DownArrow``. (#16384, @mltony)


this will need to be moved when the 2024.3 change log is created

seanbudd · 2024-04-19T03:04:03Z

source/NVDAObjects/UIA/__init__.py

+			self._rangeObj = sentence._rangeObj
+		else:
+			UIAUnit = UIAHandler.NVDAUnitsToUIAUnits[unit]
+			self._rangeObj.ExpandToEnclosingUnit(UIAUnit)

 	def move(
 			self,
 			unit: str,
 			direction: int,
 			endPoint: Optional[str] = None,
 	):


please add a return type here and docstring

seanbudd · 2024-04-19T03:09:59Z

source/browseMode.py

+				# whether there is third match in the middle, that would be suggestive of a complete sentence.
+				text = info.text + " traiiling word"
+				matches = punctuationMarksRegex.findall(text)
+				return len(matches) >= 3


is this a change of behaviour to 3 sentences are required to be determined a paragraph, rather than just 1?

seanbudd · 2024-04-19T03:38:13Z

source/editableText.py

@@ -253,6 +253,11 @@ def script_caret_nextSentence(self,gesture):
 		self._caretMoveBySentenceHelper(gesture, 1)
 	script_caret_nextSentence.resumeSayAllMode = sayAll.CURSOR.CARET

+	def script_speakCurrentSentence(self, gesture):


please add a type hint

seanbudd · 2024-04-19T03:39:15Z

source/documentNavigation/sentenceNavigation.py

+		"""
+			This function either retrieves current sentence when direction = 0, or
+			moves to previous/next sentence when direction is not 0.
+			Please note that this function can only retrieve immediate previous/immediate next sentences.


Suggested change

"""

This function either retrieves current sentence when direction = 0, or

moves to previous/next sentence when direction is not 0.

Please note that this function can only retrieve immediate previous/immediate next sentences.

"""

This function either retrieves current sentence when direction = 0, or

moves to previous/next sentence when direction is not 0.

Please note that this function can only retrieve immediate previous/immediate next sentences.

seanbudd · 2024-04-19T03:39:40Z

source/documentNavigation/sentenceNavigation.py

+		)
+
+	def expandCurrentSentence(self, direction: int = 0) -> FindSentenceResult:
+		"""


please de indent this docstring

seanbudd · 2024-04-19T03:40:24Z

source/gui/settingsDialogs.py

@@ -2645,9 +2647,60 @@ def makeSettings(self, settingsSizer: wx.BoxSizer) -> None:
 		)
 		self.bindHelpEvent("ParagraphStyle", self.paragraphStyleCombo)

+		# Translators: This is a label for sentence reconstruction in the document navigation dialog
+		label = _("&Sentencew Sentence reconstruction:")


Suggested change

label = _("&Sentencew Sentence reconstruction:")

label = _("&Sentence reconstruction:")

seanbudd · 2024-04-19T05:28:52Z

tests/unit/test_sentenceNavigation.py

@@ -0,0 +1,126 @@
+# tests/unit/test_sentenceNavigation.py


Suggested change

# tests/unit/test_sentenceNavigation.py

CyrilleB79 · 2024-05-06T09:24:19Z

@CyrilleB79, thanks for comments, I'll address them soon; If you are getting exception with message:
KeyError: '1,3'
You would need to delete textParagraphRegex item in [virtualBuffers] section from your nvda.ini file to revert to the default value.

OK. It is working now.
Maybe it was not a good idea to put the text paragraph nav regexp in the advanced settings panel; since when you want to update it, you may create API-breaking changes...

As for notepad being silent - there is legacy code that swallows any when trying to navigate by unit - most likely it's just swallowing the same KeyError in your case.

Would it be the opportunity to fix this? Or do you think it's better in a separate PR? In the latter case, it would be worth to open an issue describing your finding.

Adriani90 · 2024-05-06T10:35:29Z

Would it be the opportunity to fix this? Or do you think it's better in a separate PR? In the latter case, it would be worth to open an issue describing your finding.

I would advise to fix this as part of this PR, otherwise users will be confused because sentence navigation will be promoted as a global feature.

Also @michaelDCurran, @seanbudd I really hope you have a careful review on this before merging.
As I said in #16271, this Regexes can really have bad effects when users change them accidentally. I hope you reconsider giving #16271 a higher priority.

Also when reviewing following should be taken into account:

Does the caret / virtual cursor always land on the beginning of a sentence? This is what we expect.
Is the beginning and the end of a document indicated via "no previous sentence" or "no next sentence"? This is what we expect and is in line with paragraph navigation in document navigation settings when paragraph is handled by NVDA.

This comment was marked as outdated.

Sign in to view

seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Apr 16, 2024

mltony force-pushed the sentenceNav branch from ea1c418 to 14db8c0 Compare April 17, 2024 21:24

Sentence navigation

c98b81a

mltony force-pushed the sentenceNav branch from 14db8c0 to c98b81a Compare April 17, 2024 21:33

mltony marked this pull request as ready for review April 17, 2024 21:34

mltony requested review from a team as code owners April 17, 2024 21:34

mltony requested review from Qchristensen and seanbudd April 17, 2024 21:34

cary-rowen reviewed Apr 17, 2024

View reviewed changes

source/config/configDefaults.py Outdated Show resolved Hide resolved

seanbudd added this to the 2024.3 milestone Apr 18, 2024

seanbudd added the merge-early Merge Early in a developer cycle label Apr 18, 2024

mltony added 2 commits April 18, 2024 14:30

Better exception message

da913b5

Chinese period

fb61999

minor

d4ed490

Fix CJK regex

e6ed323

Qchristensen requested changes Apr 26, 2024

View reviewed changes

CyrilleB79 reviewed May 3, 2024

View reviewed changes

seanbudd reviewed May 6, 2024

View reviewed changes

seanbudd marked this pull request as draft May 6, 2024 06:03

seanbudd removed this from the 2024.3 milestone May 24, 2024

Adriani90 mentioned this pull request May 25, 2024

Use custom extensions to move by sentence in Microsoft Word controls #13517

Open

	label = _("&Sentencew Sentence reconstruction:")
	label = _("&Sentence reconstruction:")

	nlb="{nonBreakingRegex}",
	nonBreakingRegex="{nonBreakingRegex}",

	log.debugWarning("Failed to compile text paragraph regex", exc_info=True)
	log.debugWarning("Failed to compile sentence stop regex", exc_info=True)

	# Translators: Message shown when invalid text paragraph regex entered
	# Translators: Message shown when invalid sentence stop regex entered

		@@ -2502,6 +2502,26 @@ Note that this paragraph style cannot be used in Microsoft Word or Microsoft Out

		You may toggle through the available paragraph styles from anywhere by assigning a key in the [Input Gestures dialog #InputGestures].

		==== Sentence reconstruction = ====[SentenceReconstruction]

	==== Sentence reconstruction = ====[SentenceReconstruction]
	==== Sentence reconstruction ====[SentenceReconstruction]

	SentenceReconstructionFlag.NEVER: _("Never reconstruct sentences across paragraphs"),
	SentenceReconstructionFlag.NEVER: _("Never"),

	SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Reconstruct sentences across same style paragraphs"),
	SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Across same style paragraphs"),

	SentenceReconstructionFlag.ANY_PARAGRAPHS: _("Reconstruct sentences across any paragraphs"),
	SentenceReconstructionFlag.ANY_PARAGRAPHS: _("Across any paragraphs"),

	label = _("Non-breaking prefixes [%s]")
	label = _("Sentence stop exceptions [%s]")

	text = info.text + " traiiling word"
	text = info.text + " trailing word"

		\| Next sentence \| alt+downArrow \| alt+downArrow \| Moves the caret to the next sentence and announces it. \|
		\| Previous sentence \| alt+upArrow \| alt+upArrow \| Moves the caret to the previous sentence and announces it. \|

Sentence navigation #16384

Are you sure you want to change the base?

Sentence navigation #16384

Conversation

mltony commented Apr 11, 2024 • edited

Link to issue number:

Summary of the issue:

Description of user facing changes

Description of development approach

Testing strategy:

Known issues with pull request:

Code Review Checklist:

AppVeyorBot commented Apr 11, 2024

mltony commented Apr 11, 2024

cary-rowen commented Apr 12, 2024

mltony commented Apr 12, 2024

XLTechie commented Apr 12, 2024

This comment was marked as outdated.

mltony commented Apr 12, 2024

XLTechie commented Apr 14, 2024 via email • edited

mltony commented Apr 17, 2024

cary-rowen commented Apr 17, 2024

mltony commented Apr 18, 2024

AppVeyorBot commented Apr 18, 2024

Adriani90 commented Apr 18, 2024

Adriani90 commented Apr 18, 2024

mltony commented Apr 18, 2024

AppVeyorBot commented Apr 19, 2024

cary-rowen commented Apr 19, 2024 • edited

Test in Windows 10 Notepad:

Steps

Actual behavior:

expected behavior

Test in NVDA Speech Viewer or Bookworm:

mltony commented Apr 19, 2024

AppVeyorBot commented Apr 19, 2024

cary-rowen commented Apr 19, 2024

cary-rowen commented Apr 22, 2024

CyrilleB79 commented Apr 22, 2024

Adriani90 commented Apr 24, 2024

cary-rowen commented Apr 24, 2024

cary-rowen commented Apr 24, 2024

cary-rowen commented Apr 24, 2024

Adriani90 commented Apr 24, 2024 via email

Adriani90 commented Apr 24, 2024 via email

cary-rowen commented Apr 24, 2024

Adriani90 commented Apr 25, 2024

Qchristensen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CyrilleB79 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CyrilleB79 commented May 3, 2024

Test in Chrome's browse mode

Documentation navigation panel

In Notepad

Exception listing

Unknown languages

Choose a reason for hiding this comment

mltony commented May 5, 2024

seanbudd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mltony commented Apr 11, 2024 •

edited

XLTechie commented Apr 14, 2024 via email •

edited

cary-rowen commented Apr 19, 2024 •

edited