New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match whole word in pronunciation dictionaries #1704

Closed
nvaccessAuto opened this Issue Aug 3, 2011 · 14 comments

Comments

Projects
None yet
1 participant
@nvaccessAuto

nvaccessAuto commented Aug 3, 2011

Reported by dczajka on 2011-08-03 18:34
There should be a "match whole word only" option when creating entries in the pronunciation dictionaries. It is possible to achieve this result if desired by using regular expressions, but it seems a common enough requirement even for people not well-versed in regex syntax that it should be provided for in the dialog. A temporary patch would be to provide the regex syntax to achieve this in the userguide, as a brief example of basic regex use.
Blocking #2220, #4450

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Sep 10, 2014

Comment 4 by jteh on 2014-09-10 22:17
Suggested implementation for anyone who wants to give this a go:

  • Currently, the speech dict format contains two numeric fields at the end. One is a flag indicating case sensitivity and the other is a flag indicating whether the pattern is a regular expression. Rather than adding a new field, I suggest the last field accept an additional value (2) indicating a word match. In the code, you'll obviously need to treat this as an int instead of a bool and there should be int constants for the three types of pattern.
  • When a word pattern is handled, you'll need to build a regular expression which only matches at word boundaries. This should be simple enough.
  • In the GUI, the Regular expression check box should become a radio button to choose from the three types of pattern.
@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Sep 11, 2014

Comment 5 by blindbhavya on 2014-09-11 10:21
Hi.
It would be great if someone could implement this.
I have felt its need quite a few times. I will CC myself so that I receive updates on any progress made on this ticket.
Thanks.

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Sep 12, 2014

Attachment 0001-Added-Whole-Word-option-to-speech-dictionary.-1704.patch added by cannona on 2014-09-12 21:23
Description:
First patch.

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Sep 12, 2014

Comment 8 by cannona on 2014-09-12 21:26
Would appreciate any feedback that anyone might have.

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Sep 12, 2014

Comment 9 by cannona on 2014-09-12 21:33
Sorry, just realized that there are still a couple bugs in this. Will fix and then upload an amended patch.

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Sep 12, 2014

Attachment 0001-Added-Whole-Word-option-to-speech-dictionary.-1704.2.patch added by cannona on 2014-09-12 21:44
Description:
Second patch.

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Sep 12, 2014

Comment 10 by cannona on 2014-09-12 21:46
Fixed. Sorry again for the confusion.

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Sep 15, 2014

Comment 11 by jteh on 2014-09-15 22:18
Thanks! Very nice work... again! Code review:

+++ b/source/gui/settingsDialogs.py
@@ -1174,16 +1174,36 @@ class DictionaryEntryDialog(wx.Dialog):
...
> +       self.typeLabels = {
  • I'd probably make this a constant on the class, since it'll be used every time the dialog is used. You could call it TYPE_LABELS.
  • It'd be nice if the labels had keyboard accelerators; e.g. "&Anywhere", "Whole &word", "Regular &expression".
+     self.typeLabelsOrdering = (speechDictHandler.ENTRY_TYPE_ANYWHERE, speechDictHandler.ENTRY_TYPE_WORD, speechDictHandler.ENTRY_TYPE_REGEXP)

This could be a class constant; same as above.

+ def getType(self):
...
+     if typeRadioValue == wx.NOT_FOUND:

It never hurts to be safe, but should this ever happen?


@@ -1209,11 +1229,16 @@ class DictionaryDialog(SettingsDialog):

...

> +       self.typeLabels = {

To avoid duplication, this could be a class constant based on DictionaryEntryDialog.TYPE_LABELS. However, you'll need to remove the accelerator bits. You could do something like this just below the class statement for DictionaryDialog:

    TYPE_LABELS = {t: l.replace("&", "") for t, l in DictionaryEntryDialog.TYPE_LABELS.iteritems()}

+         self.dictList.Append((entry.comment,entry.pattern,entry.replacement,self.offOn[int(entry.caseSensitive)],self.typeLabels[int(entry.type)]))

nit: Unless I'm missing something, you shouldn't need the int() around entry.type, as it should already be an int.

Thanks again!

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Oct 14, 2014

Attachment 0001-Added-Whole-Word-option-to-speech-dictionary.-1704.3.patch added by cannona on 2014-10-14 20:03
Description:
Patch 3.

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Oct 14, 2014

Comment 12 by cannona (in reply to comment 11) on 2014-10-14 20:07
Replying to jteh:

+   def getType(self):
...
+       if typeRadioValue == wx.NOT_FOUND:

It never hurts to be safe, but should this ever happen?

I wasn't sure if it was ever possible for a user to deselect all radio buttons somehow, hence this bit of defensive programming.

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Oct 21, 2014

Comment 14 by James Teh <jamie@... on 2014-10-21 04:48
In [159b456]:

In speech dictionaries, it is now possible to specify that a pattern should only match if it is a whole word; i.e. it does not occur as part of a larger word.

Re #1704.

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Oct 21, 2014

Comment 15 by James Teh <jamie@... on 2014-10-21 04:49
In [e14c2ff]:

Merge branch 't1704' into next

Incubates #1704.

Changes:
Added labels: incubating

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Nov 6, 2014

Comment 16 by James Teh <jamie@... on 2014-11-06 04:46
In [2c5f529]:

In speech dictionaries, it is now possible to specify that a pattern should only match if it is a whole word; i.e. it does not occur as part of a larger word.

Fixes #1704.

Changes:
Removed labels: incubating
State: closed

@nvaccessAuto

This comment has been minimized.

nvaccessAuto commented Nov 6, 2014

Comment 17 by jteh on 2014-11-06 04:47
Changes:
Milestone changed from None to 2014.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment