On using NUMPAD 2(twice) in desktop layout for character description, the compound character is not described instead only its first contained character is described #4582

Closed
nvaccessAuto opened this Issue Oct 28, 2014 · 22 comments

1 participant

@nvaccessAuto

Reported by sumandogra on 2014-10-28 10:31
On using NUMPAD 2(twice) in desktop layout for character description, the compound character is not described instead only its first contained character is described. For example, for Hindi character, क्ष the character description for क्ष is not announced, instead क is described which is the first character out of three characters that form the compound character क्ष.

Steps:
1. Open MS Word and shift to Hindi voice in NVDA from voices dialog.
2. Move to a compound character such as श्श्क्ष.
3. Now use NUMPAD 2(twice) in desktop layout, to get its character description.

@nvaccessAuto

Comment 1 by siddhartha_iitd on 2014-12-16 18:59
A first draft of the fix for this issue is available in branch in_t4582_splitOnWordsAndCheck at url: https://bitbucket.org/manish_agrawal/nvda
Please review it and suggest improvements and modifications needed.
Thanks.

@nvaccessAuto

Comment 4 by mdcurran on 2015-01-26 04:25
Some code review:

  • Would it be possible to somehow have speech.getCharacterListFromText actually return the descriptions themselves, rather than the original characters, so that _speakSpellingGen doesn't again have to look them up?
  • The call to getCharacterListFromText in _speakSpellingGen will definitely have to be only done if the language is one of the Indian languages that needs this. Otherwise there is a greater performance hit for many languages that will never require this. The algorithm itself looks okay to me.
@nvaccessAuto

Comment 5 by siddhartha_iitd on 2015-01-26 14:31
Thanks for code review.

Would it be possible to somehow have speech.getCharacterListFromText actually return the descriptions themselves?

Yes. It returns character descriptions now.

The call to getCharacterListFromText in _speakSpellingGen will definitely have to be only done if the language is one of the Indian languages that needs this.

Agreed. Check for this implemented.

Please find the modified code in branch in_t4582_splitOnWordsAndCheck at following url:
https://bitbucket.org/manish_agrawal/nvda

@nvaccessAuto

Comment 6 by siddhartha_iitd on 2015-01-27 06:59
Modified the function name to getCharDescListFromText.
Comment changed to suit the current functionality.

@nvaccessAuto

Comment 7 by mdcurran on 2015-01-30 05:57
I get an error when running this branch and moving to the example Hindi characters in this ticket, with eSpeak set to Hindi, and pressing numpad2 twice quickly. There traceback I get is:

Traceback (most recent call last):
  File "scriptHandler.py", line 176, in executeScript
    script(gesture)
  File "globalCommands.py", line 709, in script_review_currentCharacter
    speech.spellTextInfo(info,useCharacterDescriptions=True)
  File "speech.py", line 143, in spellTextInfo
    speakSpelling(field,curLanguage,useCharacterDescriptions=useCharacterDescriptions)
  File "speech.py", line 177, in speakSpelling
    next(_speakSpellingGenerator)
  File "speech.py", line 211, in _speakSpellingGen
    charDesc= list(item[if locale.startswith("hi") else characterProcessing.getCharacterDescription(locale,item[0](1])).lower())
TypeError: 'NoneType' object is not iterable
@nvaccessAuto

Attachment hi_characterDescriptions.dic added by siddhartha_iitd on 2015-01-30 09:58
Description:
Hindi Character Descriptions File

@nvaccessAuto

Comment 8 by siddhartha_iitd on 2015-01-30 10:00
This error occurred because the code didn't have the case for non-availability of character descriptions for a locale. It's added now.
The algorithm identifies a character as conjunct based on its presence in the character descriptions file of that locale. Therefore, for this algorithm to give desired result, it needs a character descriptions file. I've used the attached file hi_characterDescriptions.dic for Hindi locale.
Please find the modified code in branch in_t4582_splitOnWordsAndCheck at following url:
https://bitbucket.org/manish_agrawal/nvda

@nvaccessAuto

Comment 9 by jteh on 2015-02-03 06:22
Thanks! Code review:

In the getCharDescListFromText method:

nit: Please remove the tabs on the blank lines above and below the function.

The comment explaining what the function does could be a docstring rather than a comment, as docstrings are the preferred way of documenting what a function or class does in Python. For example:

def getCharDescListFromText(text,locale):
    """his method prepares a list... etc. etc.
    """
    # code goes here...
+     charDesc = characterProcessing.getCharacterDescription(locale,text[```
You use ```text[:i](:i])
)``` a few times, so it'd be best to assign it to a variable (e.g. char), as string slicing could get fairly expensive in a loop like this with a long string.

  • charDescList.append([```

It's better to append a tuple rather than a list (parentheses instead of square brackets), as you aren't mutating it and a tuple is lighter.

+     hiCharDescList = getCharDescListFromText(text,locale) if locale.startswith("hi") else None
+     charDescList = list(text) if hiCharDescList is None else hiCharDescList
+     for item in charDescList:
  1. The possible locales where this might be necessary should probably be a constant set outside the function so it can be easily modified. Perhaps it could be called LANGS_WITH_CONJUNCT_CHARS. I realise there's only one for now, but this will be extended in future.
  2. It's probably better, then, to split the locale at "" and just look up the first part, rather than looping and using startswith for each language. For example: ```locale.split("", 1) in LANGS_WITH_CONJUNCT_CHARS```
  3. The locale check is performed both inside and outside the loop. It'd be better to do it once and assign it to a bool variable.
  4. The hiCharDescList variable isn't necessary; you could just set charDescList conditionally.
  5. In the loop, the Indian specific script stuff is mixed in with the rest of the logic. I think this could be simplified/made more readable.
  6. There are commented out pieces of code in this loop. These should be removed.

Here's a rough idea of what I'd recommend (haven't tested this though):

        hasConjuncts = locale.split("_", 1) in LANGS_WITH_CONJUNCT_CHARS
        if hasConjuncts:
            charDescList = getCharDescListFromText(text,locale)
        else:
            charDescList = text
        for item in charDescList:
            if hasConjuncts:
                # item is a tuple containing character and its description
                char = item[0](text[:i],charDesc])
>)

                charDesc = item[1]
            else:
                # item is just a character.
                char = item
                if useCharacterDescriptions:
                    charDesc=characterProcessing.getCharacterDescription(locale,char.lower())
            uppercase=char.isupper()
            if useCharacterDescriptions and charDesc:
... rest of the code stays the same ...
@nvaccessAuto

Comment 10 by siddhartha_iitd on 2015-02-06 07:48
Thanks James for such a detailed review!
I've made the changes suggested in the review. Please check the updated code in branch in_t4582_splitOnWordsAndCheck at following url:
​​https://bitbucket.org/manish_agrawal/nvda

@nvaccessAuto

Comment 11 by jteh on 2015-02-10 01:53
Thanks! This looks pretty good, but:

@@ -36,7 +36,8 @@ speechMode_beeps_ms=15
...
+#Tuple containing locale codes for languages supporting conjunct characters
+LANGS_WITH_CONJUNCT_CHARS = ('hi', 'as', 'bn', 'gu', 'kn', 'kok', 'ml', 'mni', 'mr', 'pa', 'te', 'ur')

This would be better as a set rather than a tuple (change parentheses to braces). The reason is that the "in" operator has to do a linear search on a tuple, whereas it can do a hash lookup for a set, which is much faster.

@@ -162,7 +163,6 @@ def speakSpelling(text,locale=None,useCharacterDescriptions=False):

nit: Your patch removes a blank line in the "speak" function here. Is there a reason for that?

In _speakSpellingGen:


+     charDesc = [...

+                 charDesc.insert(0,item[1](]

)[```

I don't quite understand why you do this. Why not just assign charDesc to item1

)? The way you're doing it at present, you're continually expanding the list, but as I understand it, the earlier items aren't needed. Also, you're losing character descriptions other than the first. If for some reason you do only want the first, you could do:

charDesc = [item[1][0]]

but I'd want a comment explaining why you only want the first.

Thanks.

@nvaccessAuto

Comment 12 by siddhartha_iitd on 2015-02-11 06:36
Thanks James! As per our discussion and your comments above, I made the changes and pushed the code to branch in_t4582_splitOnWordsAndCheck at following url:
​​​https://bitbucket.org/manish_agrawal/nvda

@nvaccessAuto

Comment 13 by James Teh <jamie@... on 2015-02-12 07:08
In [f5b75d2]:
```CommitTicketReference repository="" revision="f5b75d2384f1876cfc9e1dc313fdb396ae5d855a"
Merge branch 't4582' into next

Incubates #4582.

Changes:
Added labels: incubating
@nvaccessAuto

Comment 14 by jteh on 2015-02-12 07:13
Thanks. I made a few minor tweaks before merging to next.

  • charDesc doesn't need to be initialised outside the loop.
  • Even if item[1] is empty, charDesc should still be set, as otherwise, you could get a charDesc from a previous iteration of the loop.
  • There's no need to set charDesc for non-conjunct languages if useCharacterDescriptions is False.

Thanks for your work!

@nvaccessAuto

Comment 15 by siddhartha_iitd on 2015-02-13 04:57
Thanks James for your support! This was the first issue assigned to me when I joined NVDA team :)

@nvaccessAuto

Comment 16 by James Teh <jamie@... on 2015-03-26 05:27
In [0aae7ef]:
```CommitTicketReference repository="" revision="0aae7ef21fb60f0e849ad5212650497c2dea6f5d"
Character descriptions are now handled correctly for conjunct characters in certain Indian languages.

Fixes #4582.

Changes:
Removed labels: incubating
State: closed
@nvaccessAuto

Comment 17 by jteh on 2015-03-26 05:28
Changes:
Milestone changed from None to 2015.2

@nvaccessAuto

Comment 18 by td.dinakar on 2015-05-31 14:15
I have just downloaded and tested nvda 2015.2 version and find that this feature does not work in Tamil language. Since Tamil language also contains compound characters, how to enable this feature for Tamil language?

@nvaccessAuto

Comment 20 by siddhartha_iitd on 2015-05-31 16:53
This feature will work for a language only if characterDescriltions.dic file is available for that language and the file must contain these conjunct characters.

@nvaccessAuto

Comment 21 by td.dinakar (in reply to comment 20) on 2015-06-01 13:53
CharacterDescriptions.dic file is available for Tamil language. I added the two compound characters க்ஷ and ஸ்ரீ. I restarted NVDA and checked. But, it doesn't work. Should these two compound characters be included in your file?

Replying to siddhartha_iitd:

This feature will work for a language only if characterDescriltions.dic file is available for that language and the file must contain these conjunct characters.

@nvaccessAuto

Comment 22 by siddhartha_iitd on 2015-06-01 13:59
No, not the characters. The locale name for Tamil must be included in the code. The module name is speech.py

@nvaccessAuto

Comment 23 by td.dinakar (in reply to comment 22) on 2015-06-01 14:58
The locale name for Tamil is ta. I request either yourself or Jamie to include this in the code.

Replying to siddhartha_iitd:

No, not the characters. The locale name for Tamil must be included in the code. The module name is speech.py

@nvaccessAuto

Comment 24 by jteh on 2015-06-03 01:01
Please file a new ticket for Tamil, referencing this one (#4582). We can't get this change into 2015.2, but we should be able to get it into 2015.3. Thanks.

@nvaccessAuto nvaccessAuto added the bug label Nov 10, 2015
@nvaccessAuto nvaccessAuto added this to the 2015.2 milestone Nov 10, 2015
@jcsteh jcsteh added a commit that referenced this issue Nov 23, 2015
@siddhartha-iitd siddhartha-iitd Character descriptions are now handled correctly for conjunct charact…
…ers in certain Indian languages.

Fixes #4582.
0aae7ef
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment