Problem of word typing anouncing in kannada language with NVDA ESpeak tts. #4254

Closed
nvaccessAuto opened this Issue Jul 6, 2014 · 32 comments

2 participants

@nvaccessAuto

Reported by Siddu on 2014-07-06 05:20
None

@nvaccessAuto

Comment 1 by Siddu on 2014-07-06 05:33
My problem is, I am using kannada language TTS e Speak with NVDA. But while typing Kannada, I am unable to hear, words by word. I mean, After pressing space, NVDA not announcing word by word. I already kept set words onn and letters onn. typing echo option say character and words. By pressing key command of NVDA+2 and NVDA+3. And, this problem is not with english language. English is working good with NVDA ESpeak. There is no problem of letter anouncing in kannada language. Only word anouncing problem. I think, The silence is, prevails if a vowel sign is typed. If I type the words without vowel signs like ka, kha, ga, gha etc. (comma) it says the word. Mean, It says all letters after the vowel signs!

@nvaccessAuto

Comment 2 by Siddu on 2014-07-06 05:35
My problem is, I am using kannada language TTS e Speak with NVDA. But while typing Kannada, I am unable to hear, words by word. I mean, After pressing space, NVDA not announcing word by word. I already kept set words onn and letters onn. typing echo option say character and words. By pressing key command of NVDA+2 and NVDA+3. And, this problem is not with english language. English is working good with NVDA ESpeak. There is no problem of letter anouncing in kannada language. Only word anouncing problem. I think, The silence is, prevails if a vowel sign is typed. If I type the words without vowel signs like ka, kha, ga, gha etc. (comma) it says the word. Mean, It says all letters after the vowel signs!

@nvaccessAuto

Comment 3 by dhankuta on 2014-07-06 07:04
Hi,
This problem is not only concerns Kannada. Same issue prevails in all languages of indic origin having joining characteristics viz. Devanagari (Hindi and Nepali), Bengali, Gujarati, Kannada, Malayalam, Punjabi, Oriya, Tamil, Telugu, Sinhala and few other.
The problem persists Not only in vowel indicating letters but in few other letters with diacritic characteristics.
The key spot of this problem is:
In case of typing a diacritic letter, the event function is hinting that it is a word delimiter and break the appending process of

previously typed characters.
Not only in reporting typed words; The issue is frequently encountered in announcing the texts of many objects including the spell checker.

@nvaccessAuto

Comment 4 by jteh on 2014-07-06 09:00
Can you please provide examples of words that get "broken" incorrectly as well as some words that work correctly? We need this so we can diagnose the problem. Thanks.

@nvaccessAuto

Comment 5 by Siddu on 2014-07-06 10:32
for example, I am pasting some words in kannada here. Kā kha GA gha prasād ಕಾ ಖ GA ghaಪ್ರಸಾದ್

@nvaccessAuto

Comment 6 by Siddu (in reply to comment 4) on 2014-07-06 10:37
Replying
to jteh:

Can you please provide examples of words that get "broken" incorrectly as well as some words that work correctly? We need this so we can diagnose the problem. Thanks.

For example, ಕಾ ಖ GA ghaಪ್ರಸಾದ್

@nvaccessAuto

Comment 7 by Siddu on 2014-07-06 12:34
Ok here is some info about hindi language with NVDA. I think, hindi and kannada are, similar problem. . Pasting. If a vowel (maatra only) is typed after a consonant then NVDA says the typed vowel matra and then the consonant, thinking that the vowel, maatraa is some sort of a space. If in the above observation, the vowel maatraa is replaced by a, complete vowel then NVDA works fine. If the word comprises of only consonants then again NVDA slash E Speak works correctly. If the word contains of only consonants with the ending alphabet as a complete vowel then NVDA slash E Speak works correctly.

@nvaccessAuto

Comment 8 by nvdakor on 2014-07-06 13:35
Hi,
Okay, is this only with eSpeak? Do you have access to other synthesizers which contain Hindi voice? If it is strictly eSpeak, then I guess we need to ask eSpeak developers to take a look at this ticket.

@nvaccessAuto

Comment 9 by Siddu on 2014-07-06 13:41
yes sir. Problem with only ESpeak. Please ask them to see this ticket. I am not using other tts.

@nvaccessAuto

Comment 10 by Siddu on 2014-07-06 13:58
In this comment I will be using Hindi as my medium of, explanation.. As someone rightly said, this issue lies with all languages with indic origin. Not particular for kannada. If a word comprises purely of consonants for e.g. कलम, सम, झलक etc and if the word starts with a vowel but the rest of its composition and if the word starts with a vowel but the rest of its composition. is consonants for e.g. अमर, उचल, ऐनक etc. then everything is perfectly read out. However, when a maatraa (half vowel) is present in the middle or the end of the word there is an error. When a maatraa is typed NVDA says the preceding part of the word for e.g. when the ई is typed in कली, NVDA says कल pause ई. Thereby, one can conclude that NVDA considers the maatraa (half vowels) as a spacebar or as the end of the word.. This should actually not happen. The complete word should be said when the user presses spacebar and not when a maatraa is typed.

@nvaccessAuto

Comment 11 by Siddu on 2014-07-06 14:39
I don't think this issue lies with E Speak. It lies with NVDA. Because NVDA, provides the Speak typed words feature and not E Speak. Experts will be the final judge though. Do tell me if my previous example made sense and if so, will they be fized?

@nvaccessAuto

Comment 12 by jteh on 2014-07-06 22:57
This is definitely not an issue with eSpeak. I suspect the problem is that these characters aren't considered alphanumeric and we currently consider non-alphanumeric characters to be outside of a word. The question is what test we can use instead.

@nvaccessAuto

Comment 13 by dhankuta on 2014-07-07 05:32
hi,
I have highlighted the core spot. Let me clear again.
1. This issue does not concern to synthesizer.
2. This is not a language specific.
3. It occurs in all characters of any language having combining characteristics (diacritic letters).
4. In navigation (wd_word i.e control+right/left arrows or control+shift+left/right arrows); windows/nvda take These characters neither as word delimiter nor as non-alphanumeric.
5. But in say character/word functionality; it is taking as word delimiter.
6. For sited persons; these characters if standalone, appear with a dotted circle. The dotted circle means that there must be an alphabet in the place of the circle. Grammatically, these characters can not be written standalone. They represent some other letters specially vowels if preceded/followed by a consonant. As a result in literature they are address by the word 'sign' but they are purely alphabets not signs.
7. In Arabic languages too, characters with similar characteristics prevail.
8. The event handler which determines whether to continue to concatenate the just typed character or not; is wrongly considering that these characters are word delimiters.
I guess the problem lies in speak module in speakTypedCharacters(ch) function. Once I had looked on it but could not understand what this line means:
if ch.isalnum():
Then left the fixing attempt.
Anyway, it is a serious issue.
Him Prasad Gautam.

@nvaccessAuto

Comment 14 by dhankuta on 2014-07-07 06:35
Hi again,
My guess seems right. The spot of problem is the line as I had mentioned.
ch.isalnum():
is the bug.
But can any one expert hint me from where is the function isalnum()?
Is it from python core or from nvda module.
If any one can hint me the exact module isalnum() I will try to fix permanently.

redefining the diacritic characters of Kannada and Nepali as non alphanumeric; I right now temporarily fixed the bug.
However, the code is very primitive. Just as a test, it worked well.

@nvaccessAuto

Comment 15 by Siddu on 2014-07-07 11:12
thankyou him prasad gautam sir. Friends, please reply to his querry.

@nvaccessAuto

Comment 16 by jteh on 2014-07-07 11:19
As I explained in comment:12, we need to figure out what test to use to work out what characters are considered as part of a word. Right now, we only consider alphanumeric characters to be part of a word, but these characters obviously aren't treated as alphanumeric, which I guess makes sense. The question is what test we can use which covers everything nicely.

Btw, the isalnum method is called on a Python unicode object.

@nvaccessAuto

Comment 17 by Siddu on 2014-07-07 11:23
ok sir. I am using unicode format text to type kannada language.

@nvaccessAuto

Comment 18 by Siddu on 2014-07-07 13:01
ok. I am pasting here, kannada unicode text. Read with ESpeak. Text below. ಎಲ್ಲರೊಳಗೊಂದಾಗು ಮಂಕುತಿಮ್ಮ

@nvaccessAuto

Comment 19 by Siddu on 2014-07-07 13:50
kannada unicode, Download This file contains an excerpt from the character code tables and list of character names ... http://www.unicode.org/charts/PDF/U0C80.pdf

@nvaccessAuto

Comment 20 by dhankuta on 2014-07-08 01:33
hi,
I am working in this line.
1. prepared a list of all characters of concern belonging to those languages which I had mentioned in my first comment.
2. add few new conditional lines in the sayTypedWord function which will consider these characters as alphanumeric.
Let me finish these two tasks and get feed back from the users of language in concern.
I had already tested in my language and the issue has gone. Now I am Checking in other features.
However,
This option will solve the bug of one spot only, if the same isalnum object is used somewhere else, the issue remains. Finding out all the isalnum object in nvda modules is not possible by me.
What about if I provide all such characters to you? we can add the list of similar characters of rest languages in future.
next way: or provide the spots of isalnum object use to me?

@nvaccessAuto

Comment 21 by jteh on 2014-07-08 02:28
I'd prefer to avoid matching against specific characters. It'd be better to find one or more rules that match all appropriate characters. They aren't alphanumeric, but they must fit into some category or another.

@nvaccessAuto

Comment 22 by dhankuta on 2014-07-08 03:00
Hi,
Within ten minutes, With a tricky idea; I exactly located all the spots where the isalnum objects is used in whole nvda sources.
It is not much. Just speak and textinfo/offset!
Thanks to notepad ++,

Yes, I agree that check of conditionality of each character is improper. That is why I had already said that the code of the first fix is very primitive.
hope I will be able to adopt a proper way.
However, I will try to fix the case first. and share. We will have discussion then. Right now let us pause.
Anyway, the big issue of many languages is resolved!

@nvaccessAuto

Comment 23 by jteh on 2014-07-08 03:37
It seems all of the affected characters have a Unicode category of mark (M). Therefore, we should be able to use Unicode categories instead of isalnum and check for letter (L), mark (M) and number (N).

For reference: Unicode category values
Changes:
Milestone changed from None to next

@nvaccessAuto

Comment 24 by James Teh <jamie@... on 2014-07-08 03:53
In [713d98e]:
```CommitTicketReference repository="" revision="713d98e9347f02cf55c983c0a859bbbb0c5262f9"
Fix incorrect breaking of words at marks such as vowel signs and virama in Indic languages.

speech.speakTypedCharacters and textInfos.offsets.find{Start,End}OfWord were using unicode.isalnum to check for characters that are part of a word, but this only covers alphanumeric characters. The marks in question aren't alphanumeric, but should still be considered part of a word.
Therefore, use the Unicode category of the character and include letters, marks and numbers.
Re #4254.

@nvaccessAuto

Comment 25 by jteh on 2014-07-08 04:11
It'd be great if affected users could test this try build and report whether it fixes the problem. If it does, I'll merge it into next for wider testing. Thanks.

@nvaccessAuto

Comment 26 by dhankuta on 2014-07-08 07:49
hi Jamie,
I tested in three languages.
the fix is ok.
No more unusual breaking in saying typed word.
Go ahead om merging.
I am in contact with users of the rest languages. Will report if any issue arose.
Thanks a lot.

@nvaccessAuto

Comment 27 by James Teh <jamie@... on 2014-07-08 07:55
In [66ab0df]:
```CommitTicketReference repository="" revision="66ab0df8a67e80f0fa2a79ea4e8064e1f9a1fcb0"
Merge branch 't4254' into next

Incubates #4254.

Changes:
Added labels: incubating
@nvaccessAuto

Comment 29 by Siddu on 2014-07-14 09:08
finally kannada language problem got solve! Yes, after checking master version of NVDA now, NVDA is anouncing word by word. I am thankful to all developers! For solving this problem. Ticket#4254. Special thanks to him gautam prasad for making new development. I am happy now. Test master version of NVDA is available to download from, http://dl.dropboxusercontent.com/s/ah109slknykpgt0/nvda_source-master-09b564e.exe

@nvaccessAuto

Comment 30 by Siddu on 2014-07-14 09:18
finally kannada language problem got solve! Yes, after checking master version of NVDA now, NVDA is anouncing word by word. I am thankful to all developers! For solving this problem. Ticket#4254. URL to visit, http://community.nvda-project.org/ticket/4254 Special thanks to him gautam prasad sir. From Nepal. Great man! for making new development. I am happy now. Test master version of NVDA is available to download from, http://dl.dropboxusercontent.com/s/ah109slknykpgt0/nvda_source-master-09b564e.exe

@nvaccessAuto

Comment 31 by jteh on 2014-07-15 03:44
My fix has already been merged into next for wider testing, so there's no need to use a custom build.

@nvaccessAuto

Comment 32 by James Teh <jamie@... on 2014-08-05 00:44
In [7ffeb00]:
```CommitTicketReference repository="" revision="7ffeb00bbe6333117008e2b51d65e62610a7f0e3"
Fix incorrect breaking of words at marks such as vowel signs and virama in Indic languages.

speech.speakTypedCharacters and textInfos.offsets.find{Start,End}OfWord were using unicode.isalnum to check for characters that are part of a word, but this only covers alphanumeric characters. The marks in question aren't alphanumeric, but should still be considered part of a word.
Therefore, use the Unicode category of the character and include letters, marks and numbers.
Fixes #4254.

Changes:
Removed labels: incubating
State: closed
@nvaccessAuto

Comment 33 by jteh on 2014-08-05 00:45
Changes:
Milestone changed from next to 2014.3

@jcsteh jcsteh was assigned by nvaccessAuto Nov 10, 2015
@nvaccessAuto nvaccessAuto added this to the 2014.3 milestone Nov 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment