Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add is_sturmian_factor, is_tangent methods for finite words #9877

Closed
sagetrac-tmonteil mannequin opened this issue Sep 8, 2010 · 39 comments
Closed

Add is_sturmian_factor, is_tangent methods for finite words #9877

sagetrac-tmonteil mannequin opened this issue Sep 8, 2010 · 39 comments

Comments

@sagetrac-tmonteil
Copy link
Mannequin

sagetrac-tmonteil mannequin commented Sep 8, 2010

Add 3 methods to sage/combinat/words/finite_word.py:

  1. sturmian_desubstitute_as_possible
  2. is_sturmian_factor
  3. is_tangent

Add a protecting is_tangent method to sage/combinat/words/paths.py.

sage: w = Word('01110110110111011101',alphabet='01')
sage: w.is_tangent()                  
True

Depends on #8739.

Apply:

CC: @sagetrac-sage-combinat @sagetrac-abmasse @seblabbe

Component: combinatorics

Keywords: sturmian word, tangent word

Author: Thierry Monteil, Frédéric Chapoton

Reviewer: Alexandre Blondin Massé, Sébastien Labbé, Vincent Delecroix

Merged: sage-5.9.beta1

Issue created by migration from https://trac.sagemath.org/ticket/9877

@sagetrac-tmonteil sagetrac-tmonteil mannequin added this to the sage-5.8 milestone Sep 8, 2010
@sagetrac-tmonteil sagetrac-tmonteil mannequin self-assigned this Sep 8, 2010
@sagetrac-tmonteil

This comment has been minimized.

@sagetrac-abmasse
Copy link
Mannequin

sagetrac-abmasse mannequin commented Oct 1, 2010

comment:2

I'm testing your patch in the next two hours. Before testing it explicitly, just some comments:

  1. Could you add a reference where one can find the proof that a word is finite sturmian if and only if you can desubstitute it to the empy word?
  2. As I told you when you were in Montreal, I think I prefer the name is_finite_sturmian (or just is_sturmian) over the name is_sturmian_factor. I feel that the last one implies an argument like in w.is_sturmian_factor(u) and it seems to be used in many articles (just googling it, you find a big list).
  3. Not very important, but there are two comment # print <something> that should be removed from the code in the function is_sturmian_factor.

I'm waiting for sage-combinat to install and I'll test and look more thoroughly at your code.

@sagetrac-abmasse
Copy link
Mannequin

sagetrac-abmasse mannequin commented Oct 1, 2010

comment:3

There is another problem with your function:

TypeError: Your word must be defined on a binary alphabet
sage: 
sage: 
sage: Word('ababab', alphabet='ab').is_sturmian_factor()
True
sage: Word('ababab').is_sturmian_factor()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/Users/alexandre/Applications/sage/devel/sage-combinat/sage/combinat/words/<ipython console> in <module>()

/Users/alexandre/Applications/sage/local/lib/python2.6/site-packages/sage/combinat/words/finite_word.pyc in is_sturmian_factor(self)
   4197         - Thierry Monteil
   4198         """
-> 4199         return self.sturmian_desubstitute_as_possible().is_empty()
   4200 
   4201 

/Users/alexandre/Applications/sage/local/lib/python2.6/site-packages/sage/combinat/words/finite_word.pyc in sturmian_desubstitute_as_possible(self)
   4086         W = self.parent()
   4087         if (W.size_of_alphabet() != 2):
-> 4088             raise TypeError, 'Your word must be defined on a binary alphabet'
   4089         alphabet = W.alphabet()
   4090         if self.is_empty():

TypeError: Your word must be defined on a binary alphabet

I think this shouldn't be the case. In fact, this is interesting, because it raises another problem I had when working on another patch. What I suggest is that you could first verify if the parent has size 2 and, if it's not the case, you check if its real alphabet has size 2 by using the expression
len(set(self)) == 2

I think the clean solution would be to replace the deprecated function alphabet() by a new one that would compute the real alphabet in question, so that there would be two kinds of alphabet:

  1. The one corresponding to the parent of the word (for instance aaab might be a word defined on the alphabet {a,b,c}).
  2. The one corresponding to the alphabet realized by the word Alph(aaab) = {a,b}

The first one is called by w.parent().alphabet() while the second one would be obtained from w.alphabet(). But this is for another ticket.

What I suggest is that you simply add the len(set(w)) == 2 condition to your code and we'll clean it eventually.

@sagetrac-abmasse
Copy link
Mannequin

sagetrac-abmasse mannequin commented Oct 1, 2010

comment:4

I might be wrong, but it seems that the code for the desubstitute_as_possible is long and should be cut into smaller functions that maybe already exist in Sage. Could you provide me the pseudocode of the function?

@sagetrac-tmonteil
Copy link
Mannequin Author

sagetrac-tmonteil mannequin commented Oct 1, 2010

comment:5

Replying to @sagetrac-abmasse:

  1. Could you add a reference where one can find the proof that a word is finite sturmian if and only if you can desubstitute it to the empy word?

Sure. This is a well known result (though there are many variants of this idea, i do not know whether this exact one is written somewhere), but i will add at least in the Arnoux chapter of the Pytheas Fogg book. I guess i will also add a reference to a recent article of Smillie and Ulcigrai which explains this in a nice way.

@sagetrac-tmonteil
Copy link
Mannequin Author

sagetrac-tmonteil mannequin commented Oct 1, 2010

comment:6

Replying to @sagetrac-abmasse:

  1. Not very important, but there are two comment # print <something> that should be removed from the code in the function is_sturmian_factor.

ok.

@sagetrac-tmonteil
Copy link
Mannequin Author

sagetrac-tmonteil mannequin commented Oct 1, 2010

comment:7

Replying to @sagetrac-abmasse:

  1. As I told you when you were in Montreal, I think I prefer the name is_finite_sturmian (or just is_sturmian) over the name is_sturmian_factor. I feel that the last one implies an argument like in w.is_sturmian_factor(u) and it seems to be used in many articles (just googling it, you find a big list).

Actually, those words are the balanced one, but the method is_balanced already exists with a slower (quadratic complexity) implementation (that implements the definition of the balanced word). The first name of my method was is_factor_of_a_sturmian_word, which describes precisely the method but is too long.

I do not really like is_finite_sturmian, and i am completely opposed to the name is_sturmian because the word Sturmian is reserved for infinite words (and this is not the role of sage to change historical notations). Also is_finite_sturmian will not be well positioned in the automatic completion.

Another possibility could be to add a parameter algorithm with values default, desubstitution or definition and merge this method into the existing slow method is_balanced (with the desubstitution algorithm only applying for 1-balanced words).

@sagetrac-abmasse
Copy link
Mannequin

sagetrac-abmasse mannequin commented Oct 1, 2010

comment:8

Replying to @sagetrac-tmonteil:

Another possibility could be to add a parameter algorithm with values default, desubstitution or definition and merge this method into the existing slow method is_balanced (with the desubstitution algorithm only applying for 1-balanced words).

This is a very good idea. I think you should do it.

@sagetrac-tmonteil
Copy link
Mannequin Author

sagetrac-tmonteil mannequin commented Oct 2, 2010

comment:9

Replying to @sagetrac-abmasse:

There is another problem with your function:

[cut]
TypeError: Your word must be defined on a binary alphabet

I think this shouldn't be the case. In fact, this is interesting, because it raises another problem I had when working on another patch. What I suggest is that you could first verify if the parent has size 2 and, if it's not the case, you check if its real alphabet has size 2 by using the expression
len(set(self)) == 2

I think the clean solution would be to replace the deprecated function alphabet() by a new one that would compute the real alphabet in question, so that there would be two kinds of alphabet:

  1. The one corresponding to the parent of the word (for instance aaab might be a word defined on the alphabet {a,b,c}).
  2. The one corresponding to the alphabet realized by the word Alph(aaab) = {a,b}

The first one is called by w.parent().alphabet() while the second one would be obtained from w.alphabet(). But this is for another ticket.

What I suggest is that you simply add the len(set(w)) == 2 condition to your code and we'll clean it eventually.

Since it is explained in the documentation, this is not a bug but a feature ;)

I was thinking about this alphabet problem, and i am willing to write a ticket about that issue. I think that such a decision should be made for the whole word directory, with its whole community. Here are some bits.

If i start to do such tests in such a method, then almost every method of the FiniteWords class will have to do it as well, and the faster parent().alphabet() method will become useless. So there is a problem of code replication.

Another problem is that a test like len(set(w)) == 2 takes probably a linear time whereas size_of_alphabet is just looking into the parent, which is constant time.

Some programmer (user) should know where his/her words are living and should be warned if not. Making "friendly code" does not mean to pass over dirty programmer's inconsistencies, because then it won't detect the clean programmer's mistakes anymore. Fore those lazy ways of programming, we can imagine a global option which tells how the methods should care about the alphabet, so that both kinds of behavior are possible.

As for me, this may look Bourbachist (in particular not bash-ist ;), but i would like the empty word defined on a 2-letter alphabet to be recognized as a factor of a Sturmian word and not necessarily the empty word defined on a three letter alphabet (or at least to be warned if i do so). This may be, a contrario, an argument to make a difference between is_sturmian_factor (or is_finite_sturmian) and is_balanced(1).

If in some very few occasion, the user will indeed need to use such a method on some words defined on some non-two-letter alphabet, but then there should be some general coerce methods like redefine_alphabet or minimize_alphabet_to_actually_used_letters that s-he can use in his/her code.

Another example of why the alphabet question deals with the whole word directory, and why there should be an overall policy is the following:

sage: words.LowerMechanicalWord(1/sqrt(2)).parent()
Words
sage: words.CharacteristicSturmianWord(1/sqrt(2)).parent()
Words over Ordered Alphabet [0, 1]

@seblabbe
Copy link
Contributor

comment:10

The first one is called by w.parent().alphabet() while the second one would be obtained from w.alphabet(). But this is for another ticket.

What I suggest is that you simply add the len(set(w)) == 2 condition to your code and we'll clean it eventually.

I think this is a very good suggestion.

Sébastien

@sagetrac-abmasse
Copy link
Mannequin

sagetrac-abmasse mannequin commented Nov 13, 2010

comment:11

Hi, Thierry !

Sorry if I haven't been available lately (busy, you know, the usual justification...).

If I'm not mistaken, there were last minor issues with your ticket that haven't been completely solved. To recap:

  1. Cleaning some commented code.
  2. Adding the len(set(w)) == 2 condition. I understand your point of view that this should be done by many other functions in Sage, but for now, it will be cleaner if we do so.
  3. Finally, I agree with you about keeping the name is_sturmian_factor.
  4. Providing a reference for the theorem that desubstituting as possible to the empty word is equivalent to being Sturmian.

This is pretty much it!

@sagetrac-tmonteil
Copy link
Mannequin Author

sagetrac-tmonteil mannequin commented Dec 29, 2010

comment:12

We were on strike my friend.

@sagetrac-tmonteil
Copy link
Mannequin Author

sagetrac-tmonteil mannequin commented Dec 29, 2010

Tested on Sage 4.6

@sagetrac-tmonteil
Copy link
Mannequin Author

sagetrac-tmonteil mannequin commented Dec 29, 2010

comment:13

Attachment: trac_9877_words_sturmian_desubstitution_attempt_2-tm.patch.gz

Hi Alexandre,

i started to work back on this ticket. This new patch is supposed to solve your four points.

Since the is_tangent() method can now be applied to a word with more than 2 letters, i added a protection in the FiniteWordPath_all class into the file paths.py so that is_tangent() could not be applied for such word paths. Indeed, there is an extended definition for them, which is not yet written (see the paper).

Also, concerning a previous comment on using existing sage code, i looked for some "run" method and the only similar code i found was the _delta_iterator() method in the Word_class class. I tried to use it, but it seems that the use of the groupby itertool followed by len makes a slower code:

from itertools import groupby

def runs_groupby(self):
    for letter, run in groupby(self):
        yield letter, len(list(run))

def runs_homemade(self):
    try:
        previous_letter = self[0]
    except IndexError:
        return
    current_run_length = 0
    for i in self:
        if i == previous_letter:
            current_run_length += 1
        else:
            yield previous_letter , current_run_length
            current_run_length = 1
            previous_letter = i
    yield previous_letter , current_run_length

timeit('for i in runs_groupby(words.FibonacciWord()[:1000]):\n    print i')

timeit('for i in runs_homemade(words.FibonacciWord()[:1000]):\n    print i')

The first takes 28.6 ms per loop whereas the second takes 18.2 ms per loop. The problem is that the first method, though more low-level, is like passing twice on the word, which seems to be slower than the second (high-level) one-pass method.

Ciao,
Thierry

@sagetrac-tmonteil
Copy link
Mannequin Author

sagetrac-tmonteil mannequin commented Dec 29, 2010

Author: Thierry Monteil

@sagetrac-tmonteil

This comment has been minimized.

@fchapoton
Copy link
Contributor

comment:14

Please fix the 5 test errors. Four of them are the same :

AttributeError: 'WordGenerator' object has no attribute 'KolakoskiWord'

See the report of the patchbot.

@sagetrac-abmasse

This comment has been minimized.

@seblabbe
Copy link
Contributor

comment:16

Salut Thierry,

Since the is_tangent() method can now be applied to a word with more than 2 letters, i added a protection in the FiniteWordPath_all class into the file paths.py so that is_tangent() could not be applied for such word paths. Indeed, there is an extended definition for them, which is not yet written (see the paper).

Why? Can you explain more? I don't think the protection is needed.

The patch is having the following lines::

w_isolated = W(l_isolated,datatype='letter') #the word associated to the isolated letter
w_running = W(l_running,datatype='letter') #the word associated to the running letter

I remember I suggested you to use datatype = 'letter'. But I realized today that this was supported only for the old sage words objects saved in the pickle jar. So, by introducing it now in this patch, we would have to keep it. So we need to make a decision if we keep it or not. If we don't keep it, you can do the following instead:

w_isolated = W([l_isolated]) #the word associated to the isolated letter
w_running = W([l_running]) #the word associated to the running letter
  1. I think we should redefine the function alphabet for finite words and return the letter occuring in the word and caching the result (I can post a patch for this). And so, the function is_tangent could look if this method return a two letter alphabet when the parent don't.

@fchapoton
Copy link
Contributor

comment:23

Here it is. I had to change the result to

word: abaaabaaabaabaaabaaabaabaaabaabaaabaaaba...

for the doctest to pass.

Please review.

@videlec
Copy link
Contributor

videlec commented Feb 15, 2013

comment:24

Some last docstring comments:

  • for expressions that refer to Python variables or reserved names (in particular self) you may use double quotes (at some places it is ok and at some other it is not)
  • line 5179: "desubstitution consist" -> "desubstitution consists"
  • lines 5190, 5340, 5419: self is not a part of the input
  • is_tangent doc: the word tangent, where it is defined, must be bold (put the word between two *). The definition is not very english, I suggest: A binary word is said to be tangent if it can appear in infintely many cutting sequences of a smooth curve, where each cutting sequence is observed on a progressively smaller grid.

Best,
Vincent

@fchapoton
Copy link
Contributor

comment:25

New patch, with previous points corrected. Please review.

@fchapoton
Copy link
Contributor

comment:26

Vincent, do you think that this ticket is ready to go ?

@videlec
Copy link
Contributor

videlec commented Mar 5, 2013

Changed keywords from none to sturmian word, tangent word

@videlec
Copy link
Contributor

videlec commented Mar 5, 2013

Changed author from Thierry Monteil to Thierry Monteil, Frédéric Chapoton

@videlec
Copy link
Contributor

videlec commented Mar 5, 2013

Reviewer: Vincent Delecroix

@videlec
Copy link
Contributor

videlec commented Mar 5, 2013

comment:27

Yes it is. Thanks Frédéric for the finalization !

@videlec
Copy link
Contributor

videlec commented Mar 6, 2013

Changed reviewer from Vincent Delecroix to Alexandre Blondin Massé, Sébastien Labbé, Vincent Delecroix

@jdemeyer jdemeyer modified the milestones: sage-5.8, sage-5.9 Mar 6, 2013
@jdemeyer
Copy link

jdemeyer commented Mar 7, 2013

comment:30

Please clarify which patch(es) must be applied.

@fchapoton

This comment has been minimized.

@jdemeyer
Copy link

comment:33

There is a problem building the documentation:

[combinat ] /release/merger/sage-5.9.beta0/local/lib/python2.7/site-packages/sage/combinat/words/finite_word.py:docstring of sage.combinat.words.finite_word:47: WARNING: Enumerated list ends without a blank line; unexpected unindent.

@fchapoton
Copy link
Contributor

comment:34

Attachment: trac_9777-sturm-review-fc.patch.gz

I have corrected the doc, back to positive review

@jdemeyer
Copy link

Merged: sage-5.9.beta1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants