Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sentence navigation #16384

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Sentence navigation #16384

wants to merge 5 commits into from

Conversation

mltony
Copy link
Contributor

@mltony mltony commented Apr 11, 2024

Link to issue number:

Closes #8518.

Summary of the issue:

Sentence navigation.

Description of user facing changes

  1. Alt+Down/UpArrow now navigates by sentence in editables and in browse mode.
  2. Sentence reconstruction option added to document navigation panel in settings. This defines to what extent sentence navigation can look into adjacent paragraphs when trying to reconstruct a sentence.
  3. Alsoadded an option to specify non-breaking prefixes for current language.

Description of development approach

  1. Sentence splitting algorithm is implemented in documentNavigation\sentenceNavigation.py. It is generic enough and works with different types of textInfos. I left some comments throughout to explain inner workings, but LMK if more clarification/ explanation is needed. I ended up rewriting the algorithm (compared to SentenceNav add-on) from scratch - now the code should be cleaner and more elegant.
  2. In editableText.py I added speakCurrentSentence script - not assigned any default keystroke. Hope that's ok - since SentenceNav users requested that in the past.
  3. In OffsetsTextInfo updated _getUnitOffsets() function to deal with UNIT_SENTENCE.
  4. In UIATextInfo updated expand() and move() methods to work with UNIT_SENTENCE.
  5. In browseMode.py updated script_collapseOrExpandControl: if current item is not collapsable, then treat alt+Up/Down as sentence navigation commands.
  6. Existing sentence navigation logic for MSWord is not affected.

Testing strategy:

  • Manual testing in:
    • Notepad
    • Notepad++
    • Visual Studio
    • VSCode
    • Chrome - browse mode and textArea
    • Firefox browse mode and TextArea
  • Unit tests

Known issues with pull request:

Ready for review.
This is a draft, but fully working. Feel free to test or even start reviewing. Things I plan to address in the coming few days:

  • Figure out how to deal with language-specific non-breaking prefixes.
  • Add docuemntation
  • Add unit tests

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

@AppVeyorBot
Copy link

See test results for failed build of commit a1f18e45a7

@mltony
Copy link
Contributor Author

mltony commented Apr 11, 2024

I have a technical question regarding how to deal with non-breaking prefixes.
Define non-breaking prefix is a word that goes right before period, but that suggests that the period doesn't serve as a sentence breaker. Here are some examples in English:

  • Titles like Dr., Mr, Ms, Prof.
  • Some abbreviations like i.e., s.t.
  • Single capital letters. E.g. "George H. W. Bush"
  • As a counterexample, not all abbreviations should be included. For example, "etc." should not be included, since a sentence can end with "etc.". In this case we should prefer to have a false positive rather than a false negative.

Non-breaking prefixes are obviously language-dependent. So far I just hardcoded English non-breaking prefixes, but that obviously needs to be changed before merging. I think we should make non-breaking prefixes to go through translation system to have it translated to all NVDA languages. But I see a few issues here to call out:

  1. This is not a typical string to translate and it would require some research of the language. I will write a long #translators message explaining what is non-breaking prefixes and what's the expected format.
  2. That string must be |-separated list of valid fixed-length regular expressions. Fixed length requirement is coming from the fact that each regex in the list is packed into a negative look-behind clause, which can only accept fixed length regex. So for example S* would be invalid for NLB. I expect some translators will not be familiar either with regexes in general or this specific requirement, so mistakes will be made. I haven't dealt with NVDA translation system before, so wondering, is there a way to set up a custom validator for a translateable string? If not, then can it be arranged to manually check all incoming translations of the string and invalidate those that will turn out to be invalid?
  3. Another issue is identifying current language. Often times people use NVDA to read context in multiple languages, so perhaps a good idea would be to take the current language of synthesizer and retrieve non-breaking prefixes in that language, instead of default NVDA language. There are two problems here:
    • How to map current synth language onto NVDA languages? In the past I remember for Russian some synthesizers return "ru_ru", whereas in NVDA I only see "ru". An obvious solution would be to try to identify language by the first two letters, but then I see some country-specific languages in user_docs directory, such as de_CH. How big deal would it be to ignore such country-specific languages?
    • What is API call to retrieve a string in another language? I bet it must be possible, but I looked through NVDA source code and didn't find any examples - would be great if someone knows.
  4. Do we need to create a setting to specify non-breaking prefixes? In the past I've been given feedback that I tend to expose too many settings and it's confusing to the users - so asking here to get community's opinion.

@cary-rowen
Copy link
Contributor

Thanks @mltony,

Based on Sentence-Nav I want to know:

  1. Will phrase navigation be included here?
  2. Will there also be a user-defined regular expression for sentence splitting?
  3. How do we deal with MSWord? There is an option in the Sentence-Nav add-on to override the default sentence navigation of MSWord. It is very useful not only for MSWord but also for Bookworm.

@mltony
Copy link
Contributor Author

mltony commented Apr 12, 2024

  1. Will phrase navigation be included here?

No. But it should be very straightforward to add it later provided NVAccess is aligned.

  1. Will there also be a user-defined regular expression for sentence splitting?

It currently uses the same regex as for text paragraph navigation, which is configurable. But its only a single regex.

  1. How do we deal with MSWord? There is an option in the Sentence-Nav add-on to override the default sentence navigation of MSWord. It is very useful not only for MSWord but also for Bookworm.

I don't override MSWord sentence navigation here. It should be easy to override later - again as long as NVAccess is aligned.

In general I think getting this PR approved and merged is going to be somewhat challenging because there are many unknowns regarding non-breaking prefixes. So at this point I would like to focus on core sentence navigation, and postpone bells and whistels to later PRs.

@XLTechie
Copy link
Collaborator

@mltony wrote:

Also, not sure how to fix lint error in configSpec.py - line is indeed to long, but trying to break it up leads to invalid config.

At end of line, after two spaces:

# noqa: E501 Breaking this line makes invalid config

@XLTechie

This comment was marked as outdated.

@mltony
Copy link
Contributor Author

mltony commented Apr 12, 2024

Can you use a tuple of strings

How can we send that tuple to translators? If translation system supports tuples of variable length then should be doable, but please show me an example.

@XLTechie
Copy link
Collaborator

XLTechie commented Apr 14, 2024 via email

@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Apr 16, 2024
@mltony mltony marked this pull request as ready for review April 17, 2024 21:34
@mltony mltony requested review from a team as code owners April 17, 2024 21:34
@mltony
Copy link
Contributor Author

mltony commented Apr 17, 2024

This is now ready for review.

@cary-rowen
Copy link
Contributor

Hi @mltony
I'm getting the following error when trying to do sentence navigation using the latest build.

IO - speech.speech.speak (07:12:27.736) - MainThread (11956):
Speaking ['日志片段开始点已标记,再按一次复制到剪贴板']
IO - inputCore.InputManager.executeGesture (07:12:28.564) - winInputHook (3860):
Input: kb(laptop):alt+downArrow
ERROR - scriptHandler.executeScript (07:12:28.582) - MainThread (11956):
error executing script: <bound method BrowseModeDocumentTreeInterceptor.script_collapseOrExpandControl of <NVDAObjects.IAccessible.chromium.ChromeVBuf object at 0x0B435150>> with gesture 'Alt+下光标'
Traceback (most recent call last):
  File "scriptHandler.pyc", line 295, in executeScript
  File "browseMode.pyc", line 1689, in script_collapseOrExpandControl
  File "cursorManager.pyc", line 157, in _caretMovementScriptHelper
  File "textInfos\offsets.pyc", line 538, in expand
  File "virtualBuffers\__init__.pyc", line 405, in _getUnitOffsets
  File "textInfos\offsets.pyc", line 512, in _getUnitOffsets
  File "documentNavigation\sentenceNavigation.pyc", line 210, in __init__
  File "documentNavigation\sentenceNavigation.pyc", line 145, in getSentenceStopRegex
KeyError: '1,3'
IO - inputCore.InputManager.executeGesture (07:12:29.599) - winInputHook (3860):
Input: kb(laptop):control+shift+NVDA+f1

There seems to be no language difference, I used the English and Chinese versions of the NVDA User Guide for testing.

@seanbudd seanbudd added this to the 2024.3 milestone Apr 18, 2024
@seanbudd seanbudd added the merge-early Merge Early in a developer cycle label Apr 18, 2024
@mltony
Copy link
Contributor Author

mltony commented Apr 18, 2024

@cary-rowen, try resetting your value for text paragraph regex. It should work with the default value.

@AppVeyorBot
Copy link

See test results for failed build of commit 24be7c8425

@Adriani90
Copy link
Collaborator

@mltony did you test this in browse mode in pdf documents, with Adobe reader, outlook messages in MS outlook and browse mode in MS Word? Does it give an error in browse mode in MS Excel? Or did you implement a message that this is not supported there?

@Adriani90
Copy link
Collaborator

@mltony Re non breaking prefixes, it might be worth to add all so called "period words" in a database first and import them from there. Maybe there could be a file similar to symbols.dic called periodwords.dic. At least with such a file you could pass it to the translation system and we would get all of the international prefixes translated while translators can also add their language specific period words to their own language's periodwords.dic file.
Ofcourse the correct sentence navigation would depend on translation work.

@mltony
Copy link
Contributor Author

mltony commented Apr 18, 2024

@Adriani90

  • Browse mode in browsers: tested, works fine
  • PDF reader browse mode: tested, works fine
  • MSWord and Outlook: legacy behavior overrides functionality oof my PR, so no change here
  • In MS Excel alt+Down arrow seems to trigger expand action, so seems no regressions here.
    Re symbols file - that's a reasonable idea. I guess I'll just wait for a few days before working on that just in case anyone else has objections or better ideas.

@AppVeyorBot
Copy link

See test results for failed build of commit ee7ed21aad

@cary-rowen
Copy link
Contributor

cary-rowen commented Apr 19, 2024

Hi @mltony

Thank you for your efforts on this. I will continue to pay attention to this feature and provide the following test work for your reference.

Test in Windows 10 Notepad:

Steps

  1. Write the following Chinese content : "今天天气很好。我想去爬山。" These are two sentences containing two periods.
  2. Press Alt+Up/DownArrow.
  3. Add a space after the first period.
  4. Place the caret at the beginning of the text and press Alt+Downarrow repeatedly.

Actual behavior:

  • Unable to navigate by sentence when performing step (2).
  • NVDA reports the second sentence when performing step (4).

expected behavior

  • Perform step 2 to navigate by sentence. No matter whether there is a space after the period, because in Chinese writing, there is generally no space after the period.
  • When performing step (4), "blank" should be reported, consistent with the behavior in MS Word.

Test in NVDA Speech Viewer or Bookworm:

Sentence navigation is not working. If you use the Sentence-nav add-on, you need to enable overriding MS Word's sentence navigation behavior in settings.
So, as far as this PR is concerned, I think we should have a solution for similar situations like the Bookworm rich text control.

Thanks

@mltony
Copy link
Contributor Author

mltony commented Apr 19, 2024

@cary-rowen

  • Re step 2: fixed regexp, try again.
  • Also did I get Chinese question mark and exclamation mark correctly in the regexp?
  • Re step 4: This one is complicated. So I implement in this PR functions like UIATextInfo.expand(UNIT_SENTENCE). If you want NVDA to report blank in this case, that means that when the cursor is at the last position of the document, it should expand into empty sentence. I actually designed sentences to be non-empty. Trying to redesign just to account for this case is going to be difficult and any attempt to redesign is likely to cause bugs in many other edge cases.
    Trying to analyze behavior with different textInfo units, I played with UNIT_WORD. If you type: "Hello world" without quotes into Notepad on Windows 11 and try to press control+rightArrow repeatedly, the cursor would stay at the word "world" and not find a blank word in the end. Although in notepad++ it does find the blank word. So behavior is not consistent here.
    So given the fact that there is already some inconsistency present with other textInfo units and given how hard and hacky it's going to be to add fake empty sentence at the end, I would ask whether it's even worth it to tackle this issue. WDYT?
  • Re NVDA Speech Viewer : acctually it works on my side. It doesn't appear to be using MSWord control inside.
  • Re Bookworm: I prefer to postpone this until after this PR is merged. I'm worried that the more features we add here, the more likely we'll get bogged down with discussions on implementation details.

@AppVeyorBot
Copy link

See test results for failed build of commit cf90f52694

@cary-rowen
Copy link
Contributor

Re step 2: fixed regexp, try again.
Yes, sentence navigation now works as expected.


Also did I get Chinese question mark and exclamation mark correctly in the regexp?

Yes.


So given the fact that there is already some inconsistency present with other textInfo units and given how hard and hacky it's going to be to add fake empty sentence at the end, I would ask whether it's even worth it to tackle this issue. WDYT?

OK, regarding this behavior, if it is very difficult then I don't think there is a need to put too much effort into it.


If you type: "Hello world" without quotes into Notepad on Windows 11 and try to press control+rightArrow repeatedly, the cursor would stay at the word "world" and not find a blank word in the end.

Yes, I reproduce this on Notepad on Windows 10 as well.


As an improvement on step 4, I think another behavior might be better:
NVDA will report the last sentence repeatedly when we press Alt+DownArrow repeatedly, And the caret has been moved to the end of the last sentence. This means that if the user presses Alt+UpArrow once you will hear NVDA continue reporting the last sentence.

Personally, I would like it to report the first sentence, ie: when there are no more sentences our caret should be placed at the beginning of the last sentence, no matter how many times I press Alt+DownArrow.

This is consistent with the paragraph navigation pattern in document navigation.
This is also consistent with the behavior of the sentence-nav add-on, and users of the sentence navigation add-on will have a consistent experience as before.


Re NVDA Speech Viewer : acctually it works on my side. It doesn't appear to be using MSWord control inside.

That's weird, can you confirm this again, it seems to me that Speech Viewer behaves the same as Bookworm, i.e. sentence navigation doesn't work at all.

The test I performed was with Speech Viewer open: I asked NVDA to report the Chinese sentences given in the comments above. Then try sentence navigation in Speech Viewer.
NVDA seems to treat the entire Speech Viewer as one sentence.

@cary-rowen
Copy link
Contributor

Hi @mltony

While reminding you not to forget the above comment, I thought of another question.

I'm wondering, is it necessary that we share the regex option with paragraph navigation?
The current regular expression options are very complex and may be completely uncustomizable by novice users. Even someone like me who can read regular expressions may need to spend a lot of time understanding it.

Is there a way we can make the regex options more understandable?
If not, that’s no problem. Our goal is to provide users with an experience that they can use right out of the box, rather than requiring users to modify it before they can use it.

Thanks

@CyrilleB79
Copy link
Collaborator

I have not reviewed this PR.

But, on contrary to what I wrote earlier, I understand now why the regexp for text paragraph navigation and the one for sentence navigation may be separately configurable.
For now, the criterion to identify a text paragraph is if it contains a sentence. However, we may imagine in the future that we use a different criterion (even if I do not have it in mind now).

Of course, since we do not have another criterion now, we may also wait for a real need before duplicating these fields.
@cary-rowen, have you a real use case / interest to configure differently text paragraph navigation and sentence navigation?

Re the possibility to better understand the regexp, we may use a verbose regexp in the field, rather than a normal one. Verbose regexps ignore comments, spaces and newline characters (unless escaped). We should however take into account that verbose regexps are less known than normal ones; I have discovered this option only some weeks ago...

@Adriani90
Copy link
Collaborator

I agree in case of sentence navigation, the carret should always stay at the beginning of the sentence as it does with paragraph navigation when the paragraph is handled by NVDA in document navigation settings, no matter if it is the first or the last sentence or any senence in the middle.
Instead of repeating the first or the last sentence, there could be a message reporting "no next sentence" or "no previous sentence". This is also consistent with NVDA paragraph navigation.

@cary-rowen
Copy link
Contributor

@CyrilleB79

Have you a real use case / interest to configure differently text paragraph navigation and sentence navigation?

No, I haven't yet. I just have the above thoughts from the perspective of ease of understanding.
However, we can focus on the core implementation of this PR, and put other things into separate PRs to discuss or implement.

@cary-rowen
Copy link
Contributor

Instead of repeating the first or the last sentence, there could be a message reporting "no next sentence" or "no previous sentence". This is also consistent with NVDA paragraph navigation.

To be honest, I don’t really like using speech to indicate boundaries.
At this point, NVDA's paragraph navigation is inconsistent with other behaviors, such as paragraph navigation in Word.

I prefer to use beeps or play sound to indicate boundaries.

A similar request has already been made by #13612.

@cary-rowen
Copy link
Contributor

By the way, sentence-nav add-on is better and there are also beep when navigating from one paragraph to another.

@Adriani90
Copy link
Collaborator

Adriani90 commented Apr 24, 2024 via email

@Adriani90
Copy link
Collaborator

Adriani90 commented Apr 24, 2024 via email

@cary-rowen
Copy link
Contributor

Nvda paragraph navigation is the most consistent paragraph handling we have so far.

You confuse me that this scheme has never been done elsewhere except for what NVDA implemented in #13798 - using speech to indicate the last and first paragraphs. Where does your so-called "consistency" come from in this case?

Indicating boundaries by sounds is a valid issue but out of scope for this PR I guess.

What we have been discussing is how to prompt the user when navigating to the first and last sentences during sentence navigation. Speech or play sound are the two options under this issue.

Like it was done in #13798, so I don't understand even more what you mean by out of scope.

@Adriani90
Copy link
Collaborator

What I wanted to say is that when paragraphs are handled by applications, in some applications NVDA repeats the first or last paragraph, in some applications NVDA is silent, in some applications a windows bing sound is heared, etc.
NVDA has a good basis for reporting the boundaries, and this can ofcourse be improved with sounds or what ever, but this does not affect only paragraphs and is in my view at least still out of scope for this PR.

For now The boundary reporting for paragraphs in NVDA is consistent all over the place when paragraphs are handled by NVDA.
Tony can make use of the internal paragraph handling to design the boundaries for documents and if we improve the boundary reporting, then this should be done for both in a separate PR.
But this affects also other navigation paterns handled by NVDA such as line by line navigation, every kind of review cursor or object navigation that reaches the beginning or the end of an element etc.

Copy link
Member

@Qchristensen Qchristensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor recommended change to user guide text, otherwise docs read well.


This combo box allows you to select whether NVDA will try to identify and reconstruct sentences spanning across multiple paragraphs when navigating by sentence via``alt+upArrow`` and ``alt+downArrow``.
Sentence reconstruction comes in handy in two typical use cases:
- In most Most PDF files text is incorrectly split into paragraphs. Every line typically denotes a paragraph and thus most sentences span many adjacent paragraphs and in order to identify complete sentences NVDA must look into adjacent paragraphs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate word: most (In most Most PDF files...). Also "Every line typically denotes a...." should be on a new line. I would also split into two sentences as:

Every line typically denotes a paragraph and thus most sentences span many adjacent paragraphs.
In order to identify complete sentences NVDA must look into adjacent paragraphs.

Copy link
Collaborator

@CyrilleB79 CyrilleB79 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done a first review of a part of the code and listed comments/suggestions here.

However, I'll also provide more important remarks in my next comment.

@@ -2645,9 +2647,60 @@ def makeSettings(self, settingsSizer: wx.BoxSizer) -> None:
)
self.bindHelpEvent("ParagraphStyle", self.paragraphStyleCombo)

# Translators: This is a label for sentence reconstruction in the document navigation dialog
label = _("&Sentencew Sentence reconstruction:")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix typo here:

Suggested change
label = _("&Sentencew Sentence reconstruction:")
label = _("&Sentence reconstruction:")

# Look behind clause ensures that we have a text character before text punctuation mark.
# We have a positive lookBehind \w that resolves to a text character in any language,
# coupled with negative lookBehind \d that excludes digits.
lookBehind=r'(?<=\w)(?<!\d)',
plb=r"(?<=\w)(?<!\d)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nlb is quite unclear. Could we use a more explicit token name?

E.g. letterCharBefore indicates that it is a look behind assertion (with Before) and what it should contain (a letter character).

Suggested change
plb=r"(?<=\w)(?<!\d)",
letterCharBefore=r"(?<=\w)(?<!\d)",

Also, is the negative look behind assertion needed? If we already know that the character before is a letter (\w), we probably do not need to test that it is not a digit (\d). Or do I miss something?
So the expression would rather become as follows:

Suggested change
plb=r"(?<=\w)(?<!\d)",
letterCharBefore=r"(?<=\w)",

# Language-specific exceptions: characters suggesting that the following dot is not indicative
# of a sentence stop.
# This is a negative look-behind and will be inserted later when language is specified.
nlb="{nonBreakingRegex}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably sem here: better something more explicit indicating what the expression matches.

Suggested change
nlb="{nonBreakingRegex}",
nonBreakingRegex="{nonBreakingRegex}",

try:
sentenceNavigation.getSentenceStopRegex(nonBreakingPrefixes=nbp)
except re.error as e:
log.debugWarning("Failed to compile text paragraph regex", exc_info=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.debugWarning("Failed to compile text paragraph regex", exc_info=True)
log.debugWarning("Failed to compile sentence stop regex", exc_info=True)

except re.error as e:
log.debugWarning("Failed to compile text paragraph regex", exc_info=True)
gui.messageBox(
# Translators: Message shown when invalid text paragraph regex entered
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Translators: Message shown when invalid text paragraph regex entered
# Translators: Message shown when invalid sentence stop regex entered

@@ -2502,6 +2502,26 @@ Note that this paragraph style cannot be used in Microsoft Word or Microsoft Out

You may toggle through the available paragraph styles from anywhere by assigning a key in the [Input Gestures dialog #InputGestures].

==== Sentence reconstruction = ====[SentenceReconstruction]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo:

Suggested change
==== Sentence reconstruction = ====[SentenceReconstruction]
==== Sentence reconstruction ====[SentenceReconstruction]


This combo box allows you to select whether NVDA will try to identify and reconstruct sentences spanning across multiple paragraphs when navigating by sentence via``alt+upArrow`` and ``alt+downArrow``.
Sentence reconstruction comes in handy in two typical use cases:
- In most Most PDF files text is incorrectly split into paragraphs. Every line typically denotes a paragraph and thus most sentences span many adjacent paragraphs and in order to identify complete sentences NVDA must look into adjacent paragraphs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already seen such PDFs, not "most of them" though.

My question is: is this issue coming from the PDF only or also from how NVDA renders it in the virtual buffer? Said otherwise: Are the words also split into separate lines also with Jaws and Narrator? If not, could NVDA implement a fix tha would even not be visible at all from a user point of view?

Paragraph reconstruction is a nice work around provided for Sentence Nav (add-on or core feature). But have we investigated the root cause of this and decided that we cannot provide better than this workaround?

def _displayStringLabels(self):
return {
# Translators: Sentence reconstruction mode
SentenceReconstructionFlag.NEVER: _("Never reconstruct sentences across paragraphs"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the GUI, the label is already "Sentence reconstruction"; thus, there is no need to duplicate these words in the combo-box's values.

Suggested change
SentenceReconstructionFlag.NEVER: _("Never reconstruct sentences across paragraphs"),
SentenceReconstructionFlag.NEVER: _("Never"),

# Translators: Sentence reconstruction mode
SentenceReconstructionFlag.NEVER: _("Never reconstruct sentences across paragraphs"),
# Translators: Sentence reconstruction mode
SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Reconstruct sentences across same style paragraphs"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem:

Suggested change
SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Reconstruct sentences across same style paragraphs"),
SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Across same style paragraphs"),

# Translators: Sentence reconstruction mode
SentenceReconstructionFlag.SAME_STYLE_PARAGRAPHS: _("Reconstruct sentences across same style paragraphs"),
# Translators: Sentence reconstruction mode
SentenceReconstructionFlag.ANY_PARAGRAPHS: _("Reconstruct sentences across any paragraphs"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem:

Suggested change
SentenceReconstructionFlag.ANY_PARAGRAPHS: _("Reconstruct sentences across any paragraphs"),
SentenceReconstructionFlag.ANY_PARAGRAPHS: _("Across any paragraphs"),

@CyrilleB79
Copy link
Collaborator

Here are some more general remarks.
Note that the feature does not seem to work on my end.

Test in Chrome's browse mode

Pressing Alt+downArrow in this GitHub's issue:

IO - inputCore.InputManager.executeGesture (16:05:38.597) - winInputHook (9664):
Input: kb(desktop):alt+downArrow
DEBUG - config.featureFlag._validateConfig_featureFlag (16:05:38.643) - MainThread (1980):
Validating feature flag: DEFAULT, optionsEnum: SentenceReconstructionFlag, behaviorOfDefault: SAME_STYLE_PARAGRAPHS
ERROR - scriptHandler.executeScript (16:05:38.657) - MainThread (1980):
error executing script: <bound method BrowseModeDocumentTreeInterceptor.script_collapseOrExpandControl of <NVDAObjects.IAccessible.chromium.ChromeVBuf object at 0x0B7F3FF0>> with gesture 'alt+flèche bas'
Traceback (most recent call last):
  File "documentNavigation\sentenceNavigation.py", line 146, in getSentenceStopRegex
    regex = regex.format(nonBreakingRegex=nonBreakingNLBs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '1,3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "scriptHandler.py", line 295, in executeScript
    script(gesture)
  File "browseMode.py", line 1689, in script_collapseOrExpandControl
    return self._caretMovementScriptHelper(gesture, textInfos.UNIT_SENTENCE, direction)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "cursorManager.py", line 157, in _caretMovementScriptHelper
    info.expand(unit)
  File "textInfos\offsets.py", line 538, in expand
    self._startOffset,self._endOffset=self._getUnitOffsets(unit,self._startOffset)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "virtualBuffers\__init__.py", line 405, in _getUnitOffsets
    return super(VirtualBufferTextInfo, self)._getUnitOffsets(unit, offset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "textInfos\offsets.py", line 512, in _getUnitOffsets
    context = SentenceContext(textInfo)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "documentNavigation\sentenceNavigation.py", line 213, in __init__
    regex = getSentenceStopRegex()
            ^^^^^^^^^^^^^^^^^^^^^^
  File "documentNavigation\sentenceNavigation.py", line 148, in getSentenceStopRegex
    raise RuntimeError("Failed to substitute non-breaking prefixes into sentence regular expression. Please make sure your sentence regular expression is valid.", e)
RuntimeError: ('Failed to substitute non-breaking prefixes into sentence regular expression. Please make sure your sentence regular expression is valid.', KeyError('1,3'))

Documentation navigation panel

  • Open this panel
  • Validate without changing anything

Same style of error as above.

In Notepad

The feature is not working with nothing logged in Notepad (Windows 10). Is it expected to work there?

Exception listing

The exceptions for sentence stop are listed for all expressions in all languages.

I am not totally convinced of this way to do. However, if we do not find a better solution and if it is validated by NV Access:

  • If this field remains in this panel, I'd prefer a comma separated list rather than a regex; it's probably easier for all users. In advanced settings panel, a regex is OK though.
  • May we consider a default exception list for all languages, e.g. a single letter followed by a dot?
  • We should ask translators to provide exceptions for their language as much as possible. Maybe after having merged this PR, translators can test sentence nav in NVDA alpha; it will be easy to provide one or more subsequent PRs to add language specific exceptions.
  • I did not check it. But how have you handled language variants such as "pt" vs "pt_PT" / "pt_BR" or "en" vs "en_AU"?

Unknown languages

The following languages are recognized as unknown in the settings panel:

  • many eSpeak languages such as (list not complete):
    • English (Landcaster)
    • Esperanto
    • Vietnamese (Central)
    • Chinese (Mandarin, latin as Pinyin)
    • Chinese (Mandarin, latin as English)
    • Chinese (Cantonese)
    • Armenian (West Armenia)
    • Haitian Creole
  • IBMTTS: English UK

IMO, something should be done to have less unrecognized languages. At least English variants should fallback to English (neutral) and main Chinese languages should be recognized.

More generally, if possible, we may log a warning and display the language code in case the language name is not recognized by Windows/NVDA, but not default to Unknown language.

prettyLanguage = prettyLanguage or _("Unknown language")
# Translators: This is the label for a textfield in the
# Document Navigation settings panel.
label = _("Non-breaking prefixes [%s]")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called prefix?

Suggested change
label = _("Non-breaking prefixes [%s]")
label = _("Sentence stop exceptions [%s]")

or something else similar or even more suitable.

@mltony
Copy link
Contributor Author

mltony commented May 5, 2024

@CyrilleB79, thanks for comments, I'll address them soon;
If you are getting exception with message:

KeyError: '1,3'

You would need to delete textParagraphRegex item in [virtualBuffers] section from your nvda.ini file to revert to the default value.
Although since we missed the window to merge this into v2024.2, most likely I'd need to update this PR to use a separate option for sentence regexp.
As for notepad being silent - there is legacy code that swallows any when trying to navigate by unit - most likely it's just swallowing the same KeyError in your case.

Copy link
Member

@seanbudd seanbudd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mltony - this is just a partial review, as I've already accumulated a large amount of feedback

return punctuationMarksRegex.search(info.text) is not None
# sentence regex always matches beginning and end of string. We add some words at the end to test
# whether there is third match in the middle, that would be suggestive of a complete sentence.
text = info.text + " traiiling word"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would encourage investigating alternative approaches here. is there a case where this isn't just +1? can you describe it in more detail in a comment.

otherwise, fixing the typo would be good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
text = info.text + " traiiling word"
text = info.text + " trailing word"

Comment on lines +597 to +598
| Next sentence | alt+downArrow | alt+downArrow | Moves the caret to the next sentence and announces it. |
| Previous sentence | alt+upArrow | alt+upArrow | Moves the caret to the previous sentence and announces it. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add code formatting to the shortcuts

@@ -8,6 +8,7 @@ What's New in NVDA
== New Features ==
- New key commands:
- New Quick Navigation command ``p`` for jumping to next/previous text paragraph in browse mode. (#15998, @mltony)
- Sentence navigation via ``alt+Up/DownArrow``. (#16384, @mltony)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need to be moved when the 2024.3 change log is created

self._rangeObj = sentence._rangeObj
else:
UIAUnit = UIAHandler.NVDAUnitsToUIAUnits[unit]
self._rangeObj.ExpandToEnclosingUnit(UIAUnit)

def move(
self,
unit: str,
direction: int,
endPoint: Optional[str] = None,
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a return type here and docstring

# whether there is third match in the middle, that would be suggestive of a complete sentence.
text = info.text + " traiiling word"
matches = punctuationMarksRegex.findall(text)
return len(matches) >= 3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a change of behaviour to 3 sentences are required to be determined a paragraph, rather than just 1?

@@ -253,6 +253,11 @@ def script_caret_nextSentence(self,gesture):
self._caretMoveBySentenceHelper(gesture, 1)
script_caret_nextSentence.resumeSayAllMode = sayAll.CURSOR.CARET

def script_speakCurrentSentence(self, gesture):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a type hint

Comment on lines +351 to +354
"""
This function either retrieves current sentence when direction = 0, or
moves to previous/next sentence when direction is not 0.
Please note that this function can only retrieve immediate previous/immediate next sentences.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
This function either retrieves current sentence when direction = 0, or
moves to previous/next sentence when direction is not 0.
Please note that this function can only retrieve immediate previous/immediate next sentences.
"""
This function either retrieves current sentence when direction = 0, or
moves to previous/next sentence when direction is not 0.
Please note that this function can only retrieve immediate previous/immediate next sentences.

)

def expandCurrentSentence(self, direction: int = 0) -> FindSentenceResult:
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please de indent this docstring

@@ -2645,9 +2647,60 @@ def makeSettings(self, settingsSizer: wx.BoxSizer) -> None:
)
self.bindHelpEvent("ParagraphStyle", self.paragraphStyleCombo)

# Translators: This is a label for sentence reconstruction in the document navigation dialog
label = _("&Sentencew Sentence reconstruction:")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
label = _("&Sentencew Sentence reconstruction:")
label = _("&Sentence reconstruction:")

@@ -0,0 +1,126 @@
# tests/unit/test_sentenceNavigation.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# tests/unit/test_sentenceNavigation.py

@seanbudd seanbudd marked this pull request as draft May 6, 2024 06:03
@CyrilleB79
Copy link
Collaborator

@CyrilleB79, thanks for comments, I'll address them soon; If you are getting exception with message:

KeyError: '1,3'

You would need to delete textParagraphRegex item in [virtualBuffers] section from your nvda.ini file to revert to the default value.

OK. It is working now.
Maybe it was not a good idea to put the text paragraph nav regexp in the advanced settings panel; since when you want to update it, you may create API-breaking changes...

As for notepad being silent - there is legacy code that swallows any when trying to navigate by unit - most likely it's just swallowing the same KeyError in your case.

Would it be the opportunity to fix this? Or do you think it's better in a separate PR? In the latter case, it would be worth to open an issue describing your finding.

@Adriani90
Copy link
Collaborator

Would it be the opportunity to fix this? Or do you think it's better in a separate PR? In the latter case, it would be worth to open an issue describing your finding.

I would advise to fix this as part of this PR, otherwise users will be confused because sentence navigation will be promoted as a global feature.

Also @michaelDCurran, @seanbudd I really hope you have a careful review on this before merging.
As I said in #16271, this Regexes can really have bad effects when users change them accidentally. I hope you reconsider giving #16271 a higher priority.

Also when reviewing following should be taken into account:

  • Does the caret / virtual cursor always land on the beginning of a sentence? This is what we expect.
  • Is the beginning and the end of a document indicated via "no previous sentence" or "no next sentence"? This is what we expect and is in line with paragraph navigation in document navigation settings when paragraph is handled by NVDA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. merge-early Merge Early in a developer cycle
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: Include Sentence Nav as part of NVDA
8 participants