Permalink
Browse files

Set WordPairScore to prefer maximum scoring pairs (rather than random).

Before, WordPairScore will take the last value of the potential translation, in the event of clashes.
This didn't seem correct.

For example (using the dictionary.csv from the tutorial):

He abstained from any further comments.
Se abstuvo de hacer mas comentarios.

The words 'abstained' and 'any' both can map to 'se', but 'abstained' score is 0.0138 while 'any' is 0.0015.
The current code will return the smallest value because 'any' appears later in the sentence.

This commit fixes this issue, by updating the values to keep the maximum score registered within the sentence.
  • Loading branch information...
DrDub committed Dec 24, 2013
1 parent db67578 commit 23a50512155bcb65fb138cfacd0dd364b924da37
Showing with 2 additions and 1 deletion.
  1. +2 −1 yalign/wordpairscore.py
View
@@ -51,7 +51,8 @@ def __call__(self, sentence_a, sentence_b):
word_a = word_a.lower()
set_a.add(word_a)
if word_a in self.translations:
values.update(self.translations[word_a])
for w, v in self.translations[word_a].iteritems():
values[w] = max(v, values.get(w, 0.0))
for word_b in sentence_b:
word_b = word_b.lower()
if word_b in values:

0 comments on commit 23a5051

Please sign in to comment.