Say all without unnatural pauses #149

Closed
nvaccessAuto opened this Issue Jan 1, 2010 · 21 comments

Projects

None yet

2 participants

@nvaccessAuto

Reported by aleksey_s on 2008-07-31 10:16
Currently, when reading text in e.g. notepad, nvda sends text to the synth by line. So, most synths decide this is an end of text and make end of sentence inflection. its ofcourse sounds bad when there are a long paragraph of text. So nvda must send text to the synth by another chunk of text (sentence or paragraph).

@nvaccessAuto

Attachment paragraphOffsets.py added by pvagner on 2008-08-01 06:23
Description:
Implementation of the _getParagraphOffsets() method for the NVDA textInfo class. This assumes blank lines are splitting the text into paragrapsh and paragraph can't start with space or a punctuation symbols. For testing purposes you can add it to any of these classes as long as you can test it.

@nvaccessAuto

Comment 1 by jteh on 2008-07-31 22:19
Changes:
Milestone changed from 0.6p2 to 0.6

@nvaccessAuto

Comment 2 by jteh on 2008-07-31 22:59
Sentence is probably better than paragraph, as paragraphs can potentially be quite large, which means increased search time and larger chunks of text being sent to the synth.

Unfortunately, determining where sentences begin and end poses a localisation problem. Different languages have different indications of sentence boundaries and some languages don't have a concept of a sentence at all. Aside from the problem of gathering rules for different languages, as usual, we can't make these determinations based on the NVDA language, as the user might be reading in a language other than their NVDA language at any given time.

There is another option aside from reading by sentence or paragraph. Note that reading by sentence can introduce pauses that the synth would not otherwise introduce. For example, if the synth would normally not have paused after a full stop (.) for some reason, reading by sentence would introduce a pause which the synth would not otherwise have made. An alternative used by some screen readers is to end the current chunk of text only if that chunk ended with characters which would have indicated a pause. For example, if there are three sentences across two lines where the third sentence ends at the end of the second line, the chunk would end only at the end of the second line. This is rather complicated and would result in larger chunks of text than reading by sentence. It still suffers from the localisation problem above, as the pause characters would be specific to each language.

@nvaccessAuto

Comment 3 by jteh on 2008-08-01 05:45
This is an enhancement, not a defect.

On further discussion, although less efficient, doing this by paragraph is simpler, less prone to error and does not suffer from the localisation problem I described.

@nvaccessAuto

Comment 4 by pvagner on 2008-08-01 06:29
Hello,
I am playing with this for a while. I have attached what I have so far. It can detect paragraph boundaries.
I thing because of the performance by default we'll be using lines anyway. So I don't see a problem also adding ability to read by sentence. I have came up with a regular expression which searches for sentence endings and it is really working fine for me. I will post a method to get sentence offsets as I'll complete it.
Oh BTW I am changing component from speech to core because this really affects speech but the implementation it-self will go to the core (textInfo class for the baseNVDAObject)

@nvaccessAuto

Comment 5 by aleksey_s on 2008-11-26 08:38
i think sending long chunks of text with splitting they somehow is required not only when say all is performed. looks as some synths might lag on long portion of text (1 kb and more). so when speaking selection and in other cases when long portion of text goes to be spoken nvda must split it by sentences or whatever else. it might become as user definable option.

@nvaccessAuto

Comment 6 by aleksey_s on 2009-02-25 20:53
earlier i noticed an improvement with this issue, especeally in akelpad editor. For now it is again broken :-(
I infuse we must decide what we want to have with this, because comfort reading texts with NVDA becomes impossible.

@nvaccessAuto

Comment 7 by pvagner on 2009-04-13 06:33
The thing is we had this working fine before a while in richedit controls using ITextDocument com object. This is why I have abandoned my dirty implementation of getting the paragraph and sentence offsets.
Do we any estimates if we can hope of any improvements to ITextDocument support or shal we revive this issue?

@nvaccessAuto

Comment 8 by jteh on 2009-06-23 05:33
Changes:
Milestone changed from 0.6 to 0.7

@nvaccessAuto

Comment 9 by mdcurran on 2009-12-08 02:01
ITextDocument support will always be slow out of process. Its possible we can speed it up a little in-process, though this won't be happening for a little while still.
I think this ticket is still important, lets not forget it, but moving out of 2010.1 for now.

Changes:
Milestone changed from 2010.1 to None

@nvaccessAuto

Comment 10 by pvagner on 2009-12-12 10:40
What about adding at least reading by paragraphs for now with some extra option so improved reading experience can be at least partially possible?
I think my dirty implementation still works.

@nvaccessAuto

Comment 11 by briang1 on 2010-10-23 13:45
I was reading the various comments here and as I have seen a steady stream of requests for better say all reading I got to thinking, maybe it is in a way also allied to the discussion on punctuation, and also possibly guis as well, in how to allow configuration.
It would also be really neat if an alternative say all synth was possible as I think some of the natural sounding voices would be much better with properly read sentences than, say Espeak will.
Myself, I don't notice the problem untill i switch to a real sounding voice presumably because I'm not expecting any inflection of great usefulnesse!

What we don't want is to have it made really sluggish due to all the pre processing though.

@nvaccessAuto

Comment 12 by jteh on 2010-12-28 23:46
The current idea is to use the new symbol framework to determine sentence endings. Text will be pulled in by line, but it will be buffered somewhere until the end of a sentence is reached.

However, I've just realised that this will cause problems with regard to indexing. We're moving by line, so the indexes need to be for each line. Because sentences may cross multiple lines, this means that the indexes need to be inserted in the middle of an utterance. While synths supporting markup do allow this, NVDA doesn't currently support speech markup.

Unfortunately, this means we probably won't be able to implement better say all until we implement speech markup. There may also be some synths that don't support markup (eSpeak, sapi4 and sapi5 do, but I'm not sure about newfon and audiologic for example). If this is the case, say all by sentence won't be possible for these synths.

@nvaccessAuto

Comment 13 by aleksey_s (in reply to comment 12) on 2010-12-29 19:24
Replying to jteh:

The current idea is to use the new symbol framework to determine sentence endings. Text will be pulled in by line, but it will be buffered somewhere until the end of a sentence is reached.

However, I've just realised that this will cause problems with regard to indexing. We're moving by line, so the indexes need to be for each line. Because sentences may cross multiple lines, this means that the indexes need to be inserted in the middle of an utterance.

Why we can't move by sentence? Then markup support will not be required. This one should be configurable though.

Unfortunately, this means we probably won't be able to implement better say all until we implement speech markup. There may also be some synths that don't support markup (eSpeak, sapi4 and sapi5 do, but I'm not sure about newfon and audiologic for example). If this is the case, say all by sentence won't be possible for these synths.

Newfon currently does not support markup , but it is almost indifferent about say all by line or by sentence, since it doesn't assume sentence ending without full stop.

@nvaccessAuto

Comment 14 by jteh (in reply to comment 13) on 2010-12-29 20:03
Replying to aleksey_s:

Why we can't move by sentence?

The same reason we don't already: most controls don't support it natively. For controls that use !OffsetsTextInfo, we could provide a base implementation of sentence offsets that uses the symbol framework, just like we do for line offsets. Unfortunately, this is not particularly efficient. At worst, it will require the entire text to be retrieved when calculating sentence offsets like we do for the base implementation of line offsets (which we thankfully rarely use). It can be optimised such that it might need to retrieve the text for multiple lines instead of the entire text, but in some cases, this could be worse, since it means multiple calls.

Also, not all controls use offsets, and in these cases, there's nothing we can do.

@nvaccessAuto

Comment 15 by aleksey_s (in reply to comment 14) on 2010-12-29 21:16
Replying to jteh:
If we already can do it for lines, we might do it for sentences as well. If Control supports retrieving by line, then we can request lines until we find the end of the sentence.
About performance: say all chunk should be configurable, so user can disable sentence detection when it is not needed.
I'd argue that ammount of calls will not increase dramatically.

Also, not all controls use offsets, and in these cases, there's nothing we can do.

Again, if we can somehow distinguish lines in those controls, we can to it for sentences as well. If we can't, then really nothing we can do :-)

@nvaccessAuto

Comment 16 by jteh (in reply to comment 15) on 2010-12-29 21:45
Replying to aleksey_s:

If we already can do it for lines, we might do it for sentences as well. If Control supports retrieving by line, then we can request lines until we find the end of the sentence.

Note that you also have to move to the point where the sentence begins/ends, which might be in the middle of the line.

I'd argue that ammount of calls will not increase dramatically.

There are several reasons calls will increase:

  • Every time we retrieve a sentence, all lines associated with that sentence have to be retrieved. We need to search for the end of the previous sentence as well as the end of the current one, which potentially means fetching lines both before and after. Note that if the sentence begins at the start of the line, we must retrieve the previous line to check for the sentence ending.
  • We have to fetch the text to calculate sentence endings. Most controls implement retrieval of lines, so we don't need to do this for those controls. This means at least two calls for every line covered by a sentence.
  • If multiple sentences are requested, we will retrieve the same lines multiple times. For example, if you have sentence 1 covering lines 1 and 2 and sentence 2 covering lines 2 and 3, we'll actually retrieve line 2 twice. I'm not sure we can cache this without introducing problems.

Again, if we can somehow distinguish lines in that don't support offsets, we can to it for sentences as well.

No, we can't. As I said above, we have to move to the point where the sentence begins/ends. For controls that don't support offsets, there's no way to do this. It is incorrect to assume that characters and offsets are equivalent; there are actually real world cases where this is not true.

@nvaccessAuto

Comment 17 by jteh on 2011-04-14 07:34
This probably won't use the symbol code after all.

@nvaccessAuto

Comment 18 by jteh on 2011-04-17 21:35
Changes:
Changed title from "Improve SayAll reading" to "Say all without unnatural pauses"
Milestone changed from None to 2011.2

@nvaccessAuto

Comment 19 by mdcurran on 2011-04-25 05:40
Initial work has been started in speechCommands branch at: http://bzr.nvaccess.org/nvda/speechCommands
The idea is to have a speech function which rather than just speaking exactly what it is given, instead collects the text until it detects a sentence ending, and only then sends that to the synth. Any remaining speech is stored for later use. It is also possible to flush the current speech buffer, to force this function to send all pending speech to the synth now (e.g. when sayAll gets too far ahead of itself).
The current implementation seems to have some major performance issues though, which must be sorted.

@nvaccessAuto

Comment 20 by jteh on 2011-04-28 06:38
Implemented in e13e64a.
Changes:
State: closed

@nvaccessAuto nvaccessAuto added this to the 2011.2 milestone Nov 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment