Words.contains returns wrong answers #15481

nathanncohen · 2013-12-03T14:32:42Z

sage: Word('12') in Words('12')
False
sage: Words(2,5).random_element() in Words(2, finite=False)
True
sage: words.FibonacciWord() in Words([0,1], infinite=False)
True
sage: Words('123')('121212') in Words(2)
False
sage: Words('123')('121212') in Words('1234')('121212')
False
sage: Words('12B')('121212') in Words('1234')('121212')
False
sage: words.FibonacciWord() in Words([0,1,2])
False

I have no idea how this can be fixed, so if a Word guy can look at this.. O_o

Nathann

CC: @seblabbe @videlec @sagetrac-sage-combinat @saliola @sagetrac-jakobkroeker

Component: combinatorics

Issue created by migration from https://trac.sagemath.org/ticket/15481

The text was updated successfully, but these errors were encountered:

seblabbe · 2013-12-12T10:59:08Z

Replying to @nathanncohen:

First, I want to say that I think all these problems could be solved or get more clean by first finishing the ticket at #12224. It is a big patch, but I am ready to put time on it to continue the review. Vincent : what's the status of it? Do you think we will manage to get your great improvement in Sage anytime soon?

Now, I am answering to each of the problem mention in the ticket according to the choices that we made some years ago. If any of the choice we made is bad, then please tell!

sage: Word('12') in Words('12')
False

The default alphabet when using just Word('12') is the set of Python object, because we wanted any list to be turned into a word:

sage: Word('12').parent().alphabet()
Set of Python objects of type 'object'

Of course, by default, we could have computed the alphabet from the letters appearing in the word, but this is not efficient when the word is very long or when the word is infinite. So, we decided it is to the user to specify the alphabet. Also, sometimes the word lives in a larger alphabet, so the alphabet can not be computed from the word.

In short, the user may specify the alphabet. And this fixes the above problem:

sage: Word('12', alphabet=['1','2']) in Words('12')
True
sage: Word('12', alphabet=['1','2', '3']) in Words('123')
True

sage: Words(2,5).random_element() in Words(2, finite=False)
True

I agree, this is a bug: a finite word must not be in a set of infinite words.

sage: Words(2,5)
Words of length 5 over {1, 2}
sage: Words(2, finite=False)
Infinite Words over {1, 2}

sage: words.FibonacciWord() in Words([0,1], infinite=False)
True

I agree this is a bug: it seems the infinite argument and sometimes the finite argument is not taken into account:

sage: Words(2,infinite=True)
Words over {1, 2}
sage: Words(2,infinite=False)
Words over {1, 2}
sage: Words(2,finite=True)
Words over {1, 2}
sage: Words(2,finite=False)
Infinite Words over {1, 2}

sage: Words('123')('121212') in Words(2)
False

This is OK. The alphabet of integers and of str are considered different.

sage: Words('12')
Words over {'1', '2'}
sage: Words(2)
Words over {1, 2}
sage: Words(2) == Words('12')
False

sage: Words('123')('121212') in Words('1234')('121212')
False
sage: Words('12B')('121212') in Words('1234')('121212')
False

I don't know if the above test what you meant. The above in tests whether the left is a letter of the right:

sage: w = Word('123')
sage: w.__contains__?
...
Definition: w.__contains__(self, a)
Docstring:
   Test whether "a" is a letter of "self".
...

Maybe you meant this :

sage: Words('12B')('121212') in Words('1234')
False
sage: Words('123')('121212') in Words('1234')
False
sage: Words('1234')('121212') in Words('1234')
True

Yes, this is a choice we made. Maybe it is wrong. We decided w in W if and only if W is the parent of w. It makes the code more efficient but also less flexible... Testing if some words is in some set of words appears so often in the code that we decided to make it like that. I think the __contains__ could be changed to be more flexible because we don't have to use the __contains__ to test for the parent of some words.

sage: words.FibonacciWord() in Words([0,1,2])
False

Same comment as above: w in W if and only if W is the parent of w. Should this be fixed?

sage: words.FibonacciWord() in Words([0,1])
True
sage: words.FibonacciWord() in Words([0,1,2])
False

I have no idea how this can be fixed, so if a Word guy can look at this.. O_o

SUMMARY

So, in summary, things must be fixed according to

the behavior of the input finite=False, etc. and
the behavior of __contains__.

Do I miss something?

About the behavior of contains

So, what should __contains__ be?

It seems you would like:

def __contains__(self, word):
    A = self.alphabet()
    return all(letter in A for letter in word)

This is much too long computations for words we know the alphabet. Also, it doesn't work for infinite words. But if it is what the user expect, why not?

Maybe something more wise could be:

def __contains__(self, word):
    A = self.alphabet()
    B = word.parent().alphabet()
    return B is subset of A

This will make code more efficient, but the user must provide an alphabet to the word. Thus, this won't work (if the default alphabet of Word remains the set of all Python objects):

sage: Word('111223') in Words('123')

Also, I remember we had this idea of using subset because of the fact that alphabet are ordered and in some cases (I don't remember exactly which case), testing is subset was not enough. But, I think we can forget about this.

For reference, the actual code is something like this :

def __contains__(self, word):
    A = self.alphabet()
    B = word.parent().alphabet()
    return B is A

So are you happy with option 2? Any other options?

nathanncohen · 2013-12-12T12:43:02Z

comment:3

Yoooooooooooooooo !!

First, I want to say that I think all these problems could be solved or get more clean by first finishing the ticket at #12224.

Wow. And it is only 2 years old !!!

It is a big patch, but I am ready to put time on it to continue the review. Vincent : what's the status of it? Do you think we will manage to get your great improvement in Sage anytime soon?

Yeah. Like this months ?.. Those are wrong results, guys !

The default alphabet when using just Word('12') is the set of Python object, because we wanted any list to be turned into a word:

sage: Word('12') in Words('12')
False

Who cares ? Anybody expects to see "True", there. The question is "how", not what it should return.

sage: Word('12').parent().alphabet()
Set of Python objects of type 'object'

Very cool. Now if it makes Sage return answers like that, it should be discarded as a bad design choice.

Of course, by default, we could have computed the alphabet from the letters appearing in the word, but this is not efficient when the word is very long or when the word is infinite.

Well, I see no problem in having sage raise a "IDontKnowException" when asked to tell if an infinite (i.e. lazy) word belongs to a specific set of words.

But once more, I don't care whether something is slow or fast. The default behaviour should be mathematically correct, and you are free to add one thousand parameters if you want to make it faster at the cost of correction. For as long as there are warnings everywhere. Those results are mathematically wrong, do we at least agree on that ?

So, we decided it is to the user to specify the alphabet.

Very cool. Now 1) the user has no way to know that this is what he is expected to do 2) the answer raised by the default behaviour are still wrong.

Come on guys ! Let's even do that : if the alphabet is not explicitly given by the user, you HAVE to return mathematically sound answers. If the user gives the alphabet explicitly, then you can do whatever you want (add warnings everywhere) to return more complicated and faster results if you need. But you can't make word equality depend on something which is NOT canonical and NOT explicitly given.

In short, the user may specify the alphabet. And this fixes the above problem:
sage: Word('12', alphabet=['1','2']) in Words('12')
True
sage: Word('12', alphabet=['1','2', '3']) in Words('123')
True

Guys. You are comparing WORDS. Not pairs "word, alphabet". So compare words, unless explicitly asked to do something different.

sage: Words(2,5).random_element() in Words(2, finite=False)
True
I agree, this is a bug: a finite word must not be in a set of infinite words.

Yeah, that's because of the class hierarchy. It's not very easy to work with :-/

I agree this is a bug: it seems the infinite argument and sometimes the finite argument is not taken into account:
sage: Words(2,infinite=True)
Words over {1, 2}
sage: Words(2,infinite=False)
Words over {1, 2}
sage: Words(2,finite=True)
Words over {1, 2}
sage: Words(2,finite=False)
Infinite Words over {1, 2}

Note that #15479 fixed something here. Just adding a word to the __repr__ function.

sage: Words('123')('121212') in Words(2)
False
This is OK. The alphabet of integers and of str are considered different.

Oh. Right. Tricky, but right. You can have two words that are displayed as "123" but the letters of one are integers, the others are letters. Hmmm... Tricky, but right I guess.

Okay, I continued reading the other answers and we have to make something clear : when you say that something is a word, it is a word. It is not "a word defined on an alphabet". If you want to have an object defined by both a word and an alphabet, this thing has no right to be displayed as "word: 123". It has to be displayed as "word: 123 on alphabet=['1','2','3']". So you can add whatever class you think are useful for you, for sage, for everybody on earth but what you define as a word is expected to be a word, and nothing else.
And equality of words if perfectly defined. And a set of words is perfectly define. So there is no problem.

If you think that the mathematical definitions do not yield sufficiently fast algorithm that's a problem indeed, and you can fix it however you want for as long as nobody can be led to misunderstand the results given. But a word is a sequence of letter, period. A pair "Word, Alphabet" is something else. But sage users have no way to guess that they compare alphabets when they compare words, and they do not have to expect it anyway because comparing words and comparing sets of words is CLEAR. So let's make Sage return true answers first, then there will be no problem to make things faster.

Yes, this is a choice we made. Maybe it is wrong. We decided w in W if and only if W is the parent of w. It makes the code more efficient but also less flexible...

IT MAKES THE RESULTS INCORRECT.

Testing if some words is in some set of words appears so often in the code that we decided to make it like that. I think the __contains__ could be changed to be more flexible because we don't have to use the __contains__ to test for the parent of some words.
sage: words.FibonacciWord() in Words([0,1,2])
False
Same comment as above: w in W if and only if W is the parent of w. Should this be fixed?

Give this to a colleague of yours who works on words and does not use Sage, ask what he thinks. His answer is what Sage should answer. And it's crystal clear.

sage: words.FibonacciWord() in Words([0,1])
True
sage: words.FibonacciWord() in Words([0,1,2])
False

Come on guy... "Words" are sets, and you are telling me that if A contains w and B is larger than A I can not infer that B contains w ? honestly ?

So, in summary, things must be fixed according to

the behavior of the input finite=False, etc. and

the behavior of __contains__.

Word equality and containment should be word equality and containment. The alphabets are NOT involved in any of these things, and Sage should not involve them either to decide whether something is true or false. Of course you can cache the alphabet USED by a word if you need, or whatever suits you to make things faster, but all these operations are defined mathematically and you just don't return wrong answers because it makes the algorithms faster.

If you need to have something representing a word and an alphabet, this thing's __repr__ should indicate this information. If only to say "word: 123 defined on an alphabet" without explicitly giving the alphabet. And no "word" will ever be equal to a "word defined on an alphabet".

So, what should __contains__ be?

It seems you would like:
def __contains__(self, word):
    A = self.alphabet()
    return all(letter in A for letter in word)
This is much too long computations for words we know the alphabet. Also, it doesn't work for infinite words. But if it is what the user expect, why not?

Indeed. That's the only mathematical truth. Note that you can cache the set of letters used in a word if you think it helps.

Maybe something more wise could be:
def __contains__(self, word):
    A = self.alphabet()
    B = word.parent().alphabet()
    return B is subset of A
This will make code more efficient, but the user must provide an alphabet to the word. Thus, this won't work (if the default alphabet of Word remains the set of all Python objects):

That is NOT word containment. That is not even the equality of pairs "(word, alphabet)". Right now, you have this :

sage: w1=Words('123')('12')  
sage: w2=Words('12345')('12')
sage: w1 == w2
True
sage: w1 in Words('123')
True
sage: w2 in Words('123')
False

How can you expect ANYBODY to agree with that ?

For reference, the actual code is something like this :

def __contains__(self, word):
    A = self.alphabet()
    B = word.parent().alphabet()
    return B is A

I can't even understand how a code like that can be 1) written 2) reviewed.

Nathann

nathanncohen · 2014-08-16T13:47:06Z

comment:7

ping ?...

sagetrac-jakobkroeker · 2017-03-03T22:28:08Z

comment:8

in recent sage (7.6.beta4) some of the issues above seems fixed;

The following examples run forever

words.FibonacciWord() in Words([0,1])
words.FibonacciWord() in Words([0,1,2])

nathanncohen mannequin added this to the sage-6.1 milestone Dec 3, 2013

nathanncohen mannequin added c: combinatorics labels Dec 3, 2013

This comment has been minimized.

Sign in to view

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.1, sage-6.2 Jan 30, 2014

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.2, sage-6.3 May 6, 2014

sagetrac-vbraun-spam mannequin modified the milestones: sage-6.3, sage-6.4 Aug 10, 2014

mkoeppe removed this from the sage-6.4 milestone Dec 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Words.contains returns wrong answers #15481

Words.contains returns wrong answers #15481

nathanncohen mannequin commented Dec 3, 2013

This comment has been minimized.

seblabbe commented Dec 12, 2013

nathanncohen mannequin commented Dec 12, 2013

nathanncohen mannequin commented Aug 16, 2014

sagetrac-jakobkroeker mannequin commented Mar 3, 2017

Words.__contains__ returns wrong answers #15481

Words.__contains__ returns wrong answers #15481

Comments

nathanncohen mannequin commented Dec 3, 2013

This comment has been minimized.

seblabbe commented Dec 12, 2013

SUMMARY

About the behavior of contains

nathanncohen mannequin commented Dec 12, 2013

nathanncohen mannequin commented Aug 16, 2014

sagetrac-jakobkroeker mannequin commented Mar 3, 2017

Words.contains returns wrong answers #15481

Words.contains returns wrong answers #15481