Match whole word in pronunciation dictionaries #1704

Closed
nvaccessAuto opened this Issue Aug 3, 2011 · 14 comments

1 participant

@nvaccessAuto

Reported by dczajka on 2011-08-03 18:34
There should be a "match whole word only" option when creating entries in the pronunciation dictionaries. It is possible to achieve this result if desired by using regular expressions, but it seems a common enough requirement even for people not well-versed in regex syntax that it should be provided for in the dialog. A temporary patch would be to provide the regex syntax to achieve this in the userguide, as a brief example of basic regex use.
Blocking #2220, #4450

@nvaccessAuto

Comment 4 by jteh on 2014-09-10 22:17
Suggested implementation for anyone who wants to give this a go:

  • Currently, the speech dict format contains two numeric fields at the end. One is a flag indicating case sensitivity and the other is a flag indicating whether the pattern is a regular expression. Rather than adding a new field, I suggest the last field accept an additional value (2) indicating a word match. In the code, you'll obviously need to treat this as an int instead of a bool and there should be int constants for the three types of pattern.
  • When a word pattern is handled, you'll need to build a regular expression which only matches at word boundaries. This should be simple enough.
  • In the GUI, the Regular expression check box should become a radio button to choose from the three types of pattern.
@nvaccessAuto

Comment 5 by blindbhavya on 2014-09-11 10:21
Hi.
It would be great if someone could implement this.
I have felt its need quite a few times. I will CC myself so that I receive updates on any progress made on this ticket.
Thanks.

@nvaccessAuto

Attachment 0001-Added-Whole-Word-option-to-speech-dictionary.-1704.patch added by cannona on 2014-09-12 21:23
Description:
First patch.

@nvaccessAuto

Comment 8 by cannona on 2014-09-12 21:26
Would appreciate any feedback that anyone might have.

@nvaccessAuto

Comment 9 by cannona on 2014-09-12 21:33
Sorry, just realized that there are still a couple bugs in this. Will fix and then upload an amended patch.

@nvaccessAuto

Attachment 0001-Added-Whole-Word-option-to-speech-dictionary.-1704.2.patch added by cannona on 2014-09-12 21:44
Description:
Second patch.

@nvaccessAuto

Comment 10 by cannona on 2014-09-12 21:46
Fixed. Sorry again for the confusion.

@nvaccessAuto

Comment 11 by jteh on 2014-09-15 22:18
Thanks! Very nice work... again! Code review:

+++ b/source/gui/settingsDialogs.py
@@ -1174,16 +1174,36 @@ class DictionaryEntryDialog(wx.Dialog):
...
> +       self.typeLabels = {
  • I'd probably make this a constant on the class, since it'll be used every time the dialog is used. You could call it TYPE_LABELS.
  • It'd be nice if the labels had keyboard accelerators; e.g. "&Anywhere", "Whole &word", "Regular &expression".
+     self.typeLabelsOrdering = (speechDictHandler.ENTRY_TYPE_ANYWHERE, speechDictHandler.ENTRY_TYPE_WORD, speechDictHandler.ENTRY_TYPE_REGEXP)

This could be a class constant; same as above.

+ def getType(self):
...
+     if typeRadioValue == wx.NOT_FOUND:

It never hurts to be safe, but should this ever happen?


@@ -1209,11 +1229,16 @@ class DictionaryDialog(SettingsDialog):

...

> +       self.typeLabels = {

To avoid duplication, this could be a class constant based on DictionaryEntryDialog.TYPE_LABELS. However, you'll need to remove the accelerator bits. You could do something like this just below the class statement for DictionaryDialog:

    TYPE_LABELS = {t: l.replace("&", "") for t, l in DictionaryEntryDialog.TYPE_LABELS.iteritems()}

+         self.dictList.Append((entry.comment,entry.pattern,entry.replacement,self.offOn[int(entry.caseSensitive)],self.typeLabels[int(entry.type)]))

nit: Unless I'm missing something, you shouldn't need the int() around entry.type, as it should already be an int.

Thanks again!

@nvaccessAuto

Attachment 0001-Added-Whole-Word-option-to-speech-dictionary.-1704.3.patch added by cannona on 2014-10-14 20:03
Description:
Patch 3.

@nvaccessAuto

Comment 12 by cannona (in reply to comment 11) on 2014-10-14 20:07
Replying to jteh:

+   def getType(self):
...
+       if typeRadioValue == wx.NOT_FOUND:

It never hurts to be safe, but should this ever happen?

I wasn't sure if it was ever possible for a user to deselect all radio buttons somehow, hence this bit of defensive programming.

@nvaccessAuto

Comment 14 by James Teh <jamie@... on 2014-10-21 04:48
In [159b456]:
```CommitTicketReference repository="" revision="159b456b75d67ab022c85018abe0ca74caf73b2d"
In speech dictionaries, it is now possible to specify that a pattern should only match if it is a whole word; i.e. it does not occur as part of a larger word.

Re #1704.

@nvaccessAuto

Comment 15 by James Teh <jamie@... on 2014-10-21 04:49
In [e14c2ff]:
```CommitTicketReference repository="" revision="e14c2ffaa390731ea0c9b785d272d549867e5050"
Merge branch 't1704' into next

Incubates #1704.

Changes:
Added labels: incubating
@nvaccessAuto

Comment 16 by James Teh <jamie@... on 2014-11-06 04:46
In [2c5f529]:
```CommitTicketReference repository="" revision="2c5f5299e7b9d896015305ff9a00e86fa0b50192"
In speech dictionaries, it is now possible to specify that a pattern should only match if it is a whole word; i.e. it does not occur as part of a larger word.

Fixes #1704.

Changes:
Removed labels: incubating
State: closed
@nvaccessAuto

Comment 17 by jteh on 2014-11-06 04:47
Changes:
Milestone changed from None to 2014.4

@nvaccessAuto nvaccessAuto added this to the 2014.4 milestone Nov 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment