New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wordnet lowest_common_hypernyms() returning different results in Python 2.7 vs 3.3 (doctest failing in py 3.3 on OS X) #458

Closed
dougalg opened this Issue Aug 12, 2013 · 7 comments

Comments

Projects
None yet
5 participants
@dougalg
Contributor

dougalg commented Aug 12, 2013

As in pull request 451, sometimes wordnet lch returns differently ordered lists in python 2.7 vs 3.3 on OS X.

... line 45, in wordnet_lch.doctest
Failed example:
    wn.synset('woman.n.01').lowest_common_hypernyms(wn.synset('girlfriend.n.02'), use_min_depth=True)
Expected:
    [Synset('woman.n.01'), Synset('organism.n.01')]
Got:
    [Synset('organism.n.01'), Synset('woman.n.01')]

Possible fix is to return a set rather than a list, but that seems to be a major change for a small problem. Any suggestions? I'll try finding the source of the difference, anyway.

@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike

kmike Aug 12, 2013

Member

It is most likely caused by iteration over dict or set somewhere. I think this can be solved either by sorting the results or by using OrderedDict instead of dict where it happens.

Member

kmike commented Aug 12, 2013

It is most likely caused by iteration over dict or set somewhere. I think this can be solved either by sorting the results or by using OrderedDict instead of dict where it happens.

@fcbond

This comment has been minimized.

Show comment
Hide comment
@fcbond

fcbond Aug 20, 2013

Contributor

I think that conceptually, it should be a set: there is no specified order. Therefore making it a set is a better solution than arbitrarily ordering them: the same solution as in 451.

Contributor

fcbond commented Aug 20, 2013

I think that conceptually, it should be a set: there is no specified order. Therefore making it a set is a better solution than arbitrarily ordering them: the same solution as in 451.

@pquentin

This comment has been minimized.

Show comment
Hide comment
@pquentin

pquentin Aug 21, 2013

Contributor

Lines 511 and 514 indeed iterate over a set.

As said by @fcbond, @stevenbird suggested that the function should return a set in #451, but @kmike suggests sorting. In both cases, it's a simple fix: what should be done?

Since the behavior has changed (see use_min_depth), it could be a good idea to explicitely change the interface too.

Contributor

pquentin commented Aug 21, 2013

Lines 511 and 514 indeed iterate over a set.

As said by @fcbond, @stevenbird suggested that the function should return a set in #451, but @kmike suggests sorting. In both cases, it's a simple fix: what should be done?

Since the behavior has changed (see use_min_depth), it could be a good idea to explicitely change the interface too.

@kmike

This comment has been minimized.

Show comment
Hide comment
@kmike

kmike Aug 21, 2013

Member

I'm fine with lowest_common_hypernyms returning set. I think it is better to be consistent: if we return sets then also return sets in Synset.common_hypernyms and maybe other methods.

Member

kmike commented Aug 21, 2013

I'm fine with lowest_common_hypernyms returning set. I think it is better to be consistent: if we return sets then also return sets in Synset.common_hypernyms and maybe other methods.

@stevenbird

This comment has been minimized.

Show comment
Hide comment
@stevenbird

stevenbird Aug 22, 2013

Member

Yes, all of the synset and lemma methods would need to return sets. In most cases, the sequence comes straight from the wordnet file, so the above-mentioned problem would never arise, and we are doing extra processing for no reason (other than theoretical correctness). A downside is that user code would become non-deterministic. I prefer a more conservative fix, namely sorting the result of lowest_common_hypernyms().

Member

stevenbird commented Aug 22, 2013

Yes, all of the synset and lemma methods would need to return sets. In most cases, the sequence comes straight from the wordnet file, so the above-mentioned problem would never arise, and we are doing extra processing for no reason (other than theoretical correctness). A downside is that user code would become non-deterministic. I prefer a more conservative fix, namely sorting the result of lowest_common_hypernyms().

@fcbond

This comment has been minimized.

Show comment
Hide comment
@fcbond

fcbond Aug 22, 2013

Contributor

On Thu, Aug 22, 2013 at 4:46 PM, Steven Bird notifications@github.com wrote:

Yes, all of the synset and lemma methods would need to return sets.

Do you really mean all? Synsets and lemmas are ordered in wordnet by
frequency and various heuristics. So wn.synsets('dog') is definitely
a list.

For obvious reasons, hypernym_paths is definitely a list.

In most cases, the sequence comes straight from the wordnet file, so the above-mentioned problem would never arise, and we are doing extra processing for no reason (other than theoretical correctness).

In cases where the ordering is meaningful, of course we should not change it.

A downside is that user code would become non-deterministic.

Code that iterates through lowest_common_hypernyms() would possibly change.

I prefer a more conservative fix, namely sorting the result of lowest_common_hypernyms().

We could impose an order and document it (maybe depth, then frequency,
then alphabetical). I don't feel we gain so much from this, but there
may well be use cases I haven't thought of :-). If we change our unit
test to compare sets we can lose the 2.7/3 difference.

Francis Bond http://www3.ntu.edu.sg/home/fcbond/
Division of Linguistics and Multilingual Studies
Nanyang Technological University

Contributor

fcbond commented Aug 22, 2013

On Thu, Aug 22, 2013 at 4:46 PM, Steven Bird notifications@github.com wrote:

Yes, all of the synset and lemma methods would need to return sets.

Do you really mean all? Synsets and lemmas are ordered in wordnet by
frequency and various heuristics. So wn.synsets('dog') is definitely
a list.

For obvious reasons, hypernym_paths is definitely a list.

In most cases, the sequence comes straight from the wordnet file, so the above-mentioned problem would never arise, and we are doing extra processing for no reason (other than theoretical correctness).

In cases where the ordering is meaningful, of course we should not change it.

A downside is that user code would become non-deterministic.

Code that iterates through lowest_common_hypernyms() would possibly change.

I prefer a more conservative fix, namely sorting the result of lowest_common_hypernyms().

We could impose an order and document it (maybe depth, then frequency,
then alphabetical). I don't feel we gain so much from this, but there
may well be use cases I haven't thought of :-). If we change our unit
test to compare sets we can lose the 2.7/3 difference.

Francis Bond http://www3.ntu.edu.sg/home/fcbond/
Division of Linguistics and Multilingual Studies
Nanyang Technological University

@stevenbird

This comment has been minimized.

Show comment
Hide comment
@stevenbird

stevenbird Sep 19, 2013

Member

So, we will go with sorting the result of lowest_common_hypernyms().

Member

stevenbird commented Sep 19, 2013

So, we will go with sorting the result of lowest_common_hypernyms().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment