Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify words.py #19619

Closed
videlec opened this issue Nov 24, 2015 · 73 comments
Closed

Simplify words.py #19619

videlec opened this issue Nov 24, 2015 · 73 comments

Comments

@videlec
Copy link
Contributor

videlec commented Nov 24, 2015

Currently we have too many parent for words:

  • for finite and infinite words (Words_all, Words_over_alphabet, Words_over_OrderedAlphabet)
  • finite words (FiniteWords_over_OrderedAlphabet)
  • infinite words (InfiniteWords_over_OrderedAlphabet)
  • words of fixed length (FiniteWords_length_k_over_OrderedAlphabet and Words_n)

This lead to subtle bug like

sage: W = Words([0,1], finite=False, infinite=True)
sage: u = W.an_element()
sage: u[2:5].parent()   # a finite word !!
Infinite Words over {0, 1}

The proposal of this ticket is to have only four classes:

  • FiniteWords
  • Words_n: words of length n (as a slice of the one before)
  • InfiniteWords (or FullShift)
  • FiniteOrInfiniteWords
    The parent FiniteWords should hence have a method .shift() that return the associated shift (e.g u ** Infinity will belong there). Similarly, the parent InfiniteWords should have a method .factors() that return the set of factors (and finite slices will belong there).

We also:

  • deprecate size_of_alphabet and has_letter
  • use a bit more of categories in lyndon_word.py
  • use more often Word_char if possible
  • implement a better iterator for periodic points that avoids computing huge power of the morphism (that is not always doable)

CC: @seblabbe

Component: combinatorics

Author: Vincent Delecroix

Branch/Commit: 4fd6556

Reviewer: Sébastien Labbé

Issue created by migration from https://trac.sagemath.org/ticket/19619

@videlec videlec added this to the sage-6.10 milestone Nov 24, 2015
@seblabbe
Copy link
Contributor

comment:1

Sounds great. I will be reviewing this.

@videlec
Copy link
Contributor Author

videlec commented Nov 25, 2015

Branch: u/vdelecroix/19619

@videlec
Copy link
Contributor Author

videlec commented Nov 25, 2015

Commit: fc24b01

@videlec
Copy link
Contributor Author

videlec commented Nov 25, 2015

Author: Vincent Delecroix

@videlec
Copy link
Contributor Author

videlec commented Nov 25, 2015

comment:2

It took me much more time than expected... It is not yet completely finished but all tests pass in combinat/words/, combinat/lyndon_word.py and combinat/e_one_star.py.


New commits:

4d94566Trac 19619: simplify words.py
fc24b01Trac 19619: fix Lyndon words

@videlec

This comment has been minimized.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 25, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

e16fd47Trac 19619: fix digraph generators
c86a5ffTrac 19619: fix continued fractions
3ccf384Trac 19619: cardinality for set of words
3cdf5d7Trac 19619: doc in algebras with basis
eddb67cTrac 19619: doc in finite state machine

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 25, 2015

Changed commit from fc24b01 to eddb67c

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 25, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

0917c15Trac 19619: fix unpickling

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 25, 2015

Changed commit from eddb67c to 0917c15

@videlec
Copy link
Contributor Author

videlec commented Nov 25, 2015

comment:6

What a mess this unpickling stuff!!!

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 26, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

ab9f0e1Trac 19619: typo

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 26, 2015

Changed commit from 0917c15 to ab9f0e1

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 26, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

661588fTrac 19619: bad input block

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 26, 2015

Changed commit from ab9f0e1 to 661588f

@videlec
Copy link
Contributor Author

videlec commented Nov 26, 2015

comment:9

All tests pass!

@seblabbe
Copy link
Contributor

comment:10

Is it ready for review? I saw a TODO in the diff...

@videlec
Copy link
Contributor Author

videlec commented Nov 26, 2015

comment:11

Replying to @seblabbe:

Is it ready for review? I saw a TODO in the diff...

Read for review. The TODO can be removed. It was because of the cmp_letters method. Now its status is decided on construction as follows

        if alphabet.cardinality() == Infinity or \
           (alphabet.cardinality() < 36 and
            all(alphabet.unrank(i) > alphabet.unrank(j) for
            i in range(min(36,alphabet.cardinality())) for j in range(i))):
            self.cmp_letters = cmp
        else:
            self.cmp_letters = self._cmp_letters

where _cmp_letters is the previous version which compares by ranking in the alphabet. It is of course not ideal, but I think that we should get rid of this cmp_letters anyway.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 26, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

a706f49Trac 19619: remove the TODO

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 26, 2015

Changed commit from 661588f to a706f49

@seblabbe
Copy link
Contributor

comment:13

Some methods are missing doctests apparently:

Decreased doctests in combinat/words/morphism.py: from 53 / 53 = 100% to 53 / 56 = 94%
Decreased doctests in combinat/words/paths.py: from 57 / 57 = 100% to 57 / 59 = 96%
Decreased doctests in combinat/words/words.py: from 45 / 45 = 100% to 63 / 66 = 95%

@videlec
Copy link
Contributor Author

videlec commented Nov 27, 2015

comment:14

Is that a problem?

@videlec
Copy link
Contributor Author

videlec commented Nov 27, 2015

comment:15

In words.py, I can add some:

TESTS:

    sage: print "hello"
    hello

but I do not see the point. The missing doctest are in the deprecated class Words_all I do not see any relevant test...

For morphism.py and paths.py I will complete the missing ones.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 27, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

bcc2d86Trac 19619: more doc (and equality improvements)

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Nov 27, 2015

Changed commit from a706f49 to bcc2d86

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 4, 2015

Changed commit from fa5ef78 to dd103be

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 4, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

0db113aTrac 19619: simplifications in word constructors
dd103beTrac 19619: doctest failures + documentation

@videlec
Copy link
Contributor Author

videlec commented Dec 4, 2015

comment:43

I tried to do my best for FiniteWords._word_from_word. I also removed _word_from_callable from FiniteOrInfiniteWords and simplified its __call__.

@seblabbe
Copy link
Contributor

seblabbe commented Dec 4, 2015

comment:44

I looked at the code. It looks good. But I now get (also reported by the patchbots):

----------------------------------------------------------------------
sage -t --warn-long 31.0 permutation.py  # 1 doctest failed
sage -t --warn-long 31.0 parking_functions.py  # 62 doctests failed
----------------------------------------------------------------------

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 4, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

13e989eTrac 19619: fix wrong guessing

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 4, 2015

Changed commit from dd103be to 13e989e

@videlec
Copy link
Contributor Author

videlec commented Dec 4, 2015

comment:46

Right. Permutations or parking functions are callable but should be considered as finite...

@seblabbe
Copy link
Contributor

seblabbe commented Dec 4, 2015

comment:47

I would prefer if the code of FiniteOrInfiniteWords.__call__ be like the others FiniteWords.__call__ and InfiniteWords.__call__ in the sense that first the function does:

if datatype is not None:
    #return directly the good word
    if datatype in ['str', 'list', 'char', 'tuple']:
        return etc...
else:
    #guess the datatype, the length, etc.

if you agree. It makes the code easier to read and be convince anybody that it does the minimal and most efficient path in the function.

Doing what you do in InfiniteWords.__call__ essentially reverts ticket #17021 where that kind of guessing was done first. Note that to avoid duplication of code, it is possible that you will have to recreate _word_from_callable because of that... or maybe not.

@videlec
Copy link
Contributor Author

videlec commented Dec 4, 2015

comment:48

Replying to @seblabbe:

I would prefer if the code of FiniteOrInfiniteWords.__call__ be like the others FiniteWords.__call__ and InfiniteWords.__call__ in the sense that first the function does:

<SNIP>

if you agree. It makes the code easier to read and be convince anybody that it does the minimal and most efficient path in the function.

Doing what you do in InfiniteWords.__call__ essentially reverts ticket #17021 where that kind of guessing was done first. Note that to avoid duplication of code, it is possible that you will have to recreate _word_from_callable because of that... or maybe not.

If a user provides datatype and length there will be no guessing... And with what I changed the first step is to determine the length (in order to determine the parent). I do not see how to handle what you suggested without copy/paste.

@videlec
Copy link
Contributor Author

videlec commented Dec 4, 2015

comment:49

Moreover, the length guessing is based on what the user input as datatype...

@videlec
Copy link
Contributor Author

videlec commented Dec 4, 2015

comment:50

Some timings with the branch

sage: W = Words()
sage: FW = FiniteWords()
sage: L = range(1000)
sage: %timeit W(L, check=False)
100000 loops, best of 3: 2.54 µs per loop
sage: %timeit W(L, length="finite", check=False)
1000000 loops, best of 3: 1.58 µs per loop
sage: %timeit W(L, datatype="list", check=False)
100000 loops, best of 3: 1.75 µs per loop
sage: %timeit W(L, length="finite", datatype="list", check=False)
1000000 loops, best of 3: 1.49 µs per loop
sage: %timeit FW(L, check=False)
1000000 loops, best of 3: 823 ns per loop

Whereas we have on develop

sage: W = Words()
sage: %timeit W(L, check=False)
1000000 loops, best of 3: 800 ns per loop

It looks like the code in the branch is:

  • 3x slower now if nothing is provided
  • 2x slower if length or datatype is provided
  • 1x if call directly FiniteWords
    Plenty of reasons: more inheritance, more function calls.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 6, 2015

Changed commit from 13e989e to 4fd6556

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Dec 6, 2015

Branch pushed to git repo; I updated commit sha1. New commits:

4fd6556Trac 19619: merge into 6.10.beta7

@seblabbe
Copy link
Contributor

seblabbe commented Dec 7, 2015

comment:52

Good to go.

@videlec
Copy link
Contributor Author

videlec commented Dec 7, 2015

comment:53

Great! Thanks for the review!

@vbraun
Copy link
Member

vbraun commented Dec 8, 2015

Changed branch from u/vdelecroix/19619 to 4fd6556

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants