De Bruijn Sequence construction for combinat #10530

eviatarbach · 2010-12-29T02:58:33Z

I have written an implementation of De Bruijn sequences for combinat. It can currently generate one sequence for an alphabet (or arity) and substring length. It can also calculate cardinality.

For the human:

Apply :

attachment: 10530.patch

for the bot:

Apply 10530.patch

CC: @sagetrac-sage-combinat @sagetrac-tmonteil

Component: combinatorics

Author: Eviatar Bach

Reviewer: Nicolas M. Thiéry, Nathann Cohen

Merged: sage-4.7.2.alpha2

Issue created by migration from https://trac.sagemath.org/ticket/10530

eviatarbach · 2010-12-29T03:26:50Z

Attachment: 15057.patch.gz

nthiery · 2010-12-29T22:38:00Z

comment:2

Hi!

I just had a quick look at your patch. This sounds good!

A couple micro comments (before an actual review):

CombinatorialClass is deprecated. Please use the
FiniteEnumeratedSets category. For an example, see:
```
sage: C = FiniteEnumeratedSets().example()
sage: C??
File:		/opt/sage/local/lib/python2.6/site-packages/sage/categories/examples/finite_enumerated_sets.py
```
(sorry, we still have a lot of old code giving bad hints to the
newcomers)
What about just implementing an_element? (no __iter__, no first).

Actually at this point DeBruijnSequences(alphabet, n) should just
be in the category FiniteSets(). However the later does not exist
yet, so Sets() will do.
Rather than defining an attribute self.k_is_alphabet, I would setup
two attributes self.k and self.alphabet once for all in the
init. Then no more branching needs to be done later on.

Maybe rename the parameter k to alphabet (which is more explicit)
DeBruijnSequence_evaluation -> DeBruijnSequences

No need for the DeBruijnSequence function at this point.
The length of the sequence is known in advance, right? Then it
should be faster to allocate a bit list right away, and then fill
it up, rather than using append.
In the long run, we might want to provide an iterator over the
sequence (once cython will support iterators)
debruijn.py -> debruijn_sequence.py
The speed difference is impressive :-) I don't know if we want to
mention it in the doc though (this may change in the future).

Cheers,
Nicolas

nthiery · 2010-12-29T22:38:27Z

Reviewer: Nicolas M. Thiéry

eviatarbach · 2010-12-30T01:47:21Z

comment:4

Thank you for the comments. I have a few questions:

Rather than defining an attribute self.k_is_alphabet, I would setup two attributes self.k and self.alphabet once for all in the init. Then no more branching needs to be done later on.

I know it looks ugly. The point of having k_is_alphabet is to check whether the input k is an alphabet or an arity, as the processing is different in either case. I was thinking of removing the alphabet functionality; maybe just the arity would suffice? One could always map values onto the digits after.

The length of the sequence is known in advance, right? Then it should be faster to allocate a bit list right away, and then fill it up, rather than using append.

I'm not clear on this. Are you suggesting I compute the length of the array in advance, create sequence full of zeros to that length, and then use indeces to change the values? Is append that much slower that creating a long list and then filling it would be faster?

Or are you suggesting that I declare sequence as an int array in C, then fill it? This was my initial idea, but I couldn't find a way to insert arguments into the declaration. This returns a traceback:

cdef sequence[k^n]

Thanks again!

nthiery · 2010-12-30T08:20:49Z

comment:5

Replying to @eviatarbach:

Thank you for the comments. I have a few questions:

Rather than defining an attribute self.k_is_alphabet, I would setup two attributes self.k and self.alphabet once for all in the init. Then no more branching needs to be done later on.

I know it looks ugly. The point of having k_is_alphabet is to check whether the input k is an alphabet or an arity, as the processing is different in either case.

It feels only marginaly different. For example, if alphabet is set to
[0,...,k-1], then the last if becomes unnecessary; granted, there
is an overhead of an indexed lookup in an array, but I assume it
should be negligible (to be checked with timeit though!).

I was thinking of removing the alphabet functionality; maybe just the arity would suffice? One could always map values onto the digits after.

As you feel like; that functionality can indeed be added later.

The length of the sequence is known in advance, right? Then it should be faster to allocate a bit list right away, and then fill it up, rather than using append.

I'm not clear on this. Are you suggesting I compute the length of the array in advance, create sequence full of zeros to that length, and then use indeces to change the values?

Yes.

Is append that much slower that creating a long list and then filling it would be faster?

With append, Python needs to regularly reallocate the list in memory,
requiring a copy of it each time. In principle, Python uses
"exponential" reallocation so the theoretical complexity of append is
"amortized O(1)", but there still is an overhead.

I recommend trying both, and benchmarking with timeit (and post here
the result for our instruction).

Or are you suggesting that I declare sequence as an int array in C,
then fill it? This was my initial idea, but I couldn't find a way to
insert arguments into the declaration. This returns a traceback:

cdef sequence[k^n]

I don't know what's the canonical way to allocate C arrays in
Cython. Probably via malloc or something. It will be in the Cython
doc. Anyway, since the result will be returned as a list, I assume
that copying back the result into a list would ruin the benefit of
having used a C array in the first place.

Ah, by the way: please add tests for all the base cases: k=0,
alphabet=0, ...

Also please add tests for the direct call to debruijn_cython.
Speaking of which I suggest renaming it to debruijn_sequence, and
adding a link back to DeBruijnSequences.

Cheers,

eviatarbach · 2010-12-30T20:44:44Z

comment:6

Append is marginally faster (by about one microsecond), probably because for the other I had to declare a global index variable in Python.

Let me get this straight:

Python

Filename: debruijn_sequence.py
Class name: DeBruijnSequence (DeBruijnSequences wouldn't make sense since only one is generated)

Cython

Filename: debruijn_sequence.pyx
Function name: debruijn_sequence

Is this correct?

Also, for some reason, when I use underscores in debruijn_sequence.py, I get this error:

ImportError: dynamic module does not define init function (initdebruijn_sequence)

Thank you!

nthiery · 2011-01-02T10:52:51Z

comment:7

Replying to @eviatarbach:

Append is marginally faster (by about one microsecond), probably
because for the other I had to declare a global index variable
in Python.

Interesting to know! Have you tried with large output? Say, a sequence of length 10^5 or 10^6?

For the following, see also the discussion on:

http://groups.google.com/group/sage-combinat-devel/browse_thread/thread/bf20e88681bc557b

Let me get this straight:

Python

Filename: debruijn_sequence.py
Class name: DeBruijnSequence (DeBruijnSequences wouldn't make sense since only one is generated)

Class name: DeBruijnSequences (it models the set of all such sequences)
Filename: debruijn_sequences.py (for consistency with the name of the class; granted this convention is not yet followed consistently across combinat; that should eventually be fixed)

Cython

Filename: debruijn_sequence.pyx
Function name: debruijn_sequence

+1

Possibly debruijn_sequences_cython.pyx in case you expect other
functionalities on de Bruijn sequences to be implemented in cython.

Also, for some reason, when I use underscores in debruijn_sequence.py, I get this error:
ImportError: dynamic module does not define init function (initdebruijn_sequence)

Hmm, as far as I know this should work. Please double check all
references to the f,ile and that everything was recompiled
properly. Also, there might be some old files that have not been
cleaned up and that cython tries to recompile.

Cheers,
Nicolas

eviatarbach · 2011-02-11T20:19:53Z

comment:9

Attachment: 15454.patch.gz

Sorry for taking so long, but here is the updated patch. The changes are as follows:

sanity-checks the input; corresponding tests added in the docstring
filenames changed
option for alphabet removed; not really useful and having an additional keyword argument may confuse

I tried to allocate the length of the sequence length beforehand and tested with larger inputs, as you suggested, but it is still slower. I don't know much about algorithmic efficiency, but that's what I observed in my tests. For one input, I got around 50 ms and 120 ms, respectively.

I decided to not have the sequences as words, to leave the option open for generation of all possible sequences open.

Thanks in advance!

eviatarbach · 2011-02-11T20:24:59Z

comment:10

By the way, it was built on Sage 4.6.2.alpha4.

sagetrac-ylchapuy · 2011-02-12T15:36:50Z

comment:11

Hi,
sight remark: are you sure about the cardinality part? I think a factorial is missing...
( see e.g. your reference or http://en.wikipedia.org/wiki/De_Bruijn_sequence )

eviatarbach · 2011-02-12T21:26:55Z

comment:12

Yes, you're correct. It seems that the formula I was looking at only applies when k=2. I uploaded a new patch.

eviatarbach · 2011-02-12T21:27:33Z

To be applied on top of previous patch

sagetrac-ylchapuy · 2011-02-15T17:45:32Z

comment:13

Attachment: 15455.patch.gz

Another slight remark:
in debruijn_sequence, in the inner function gen, you should "cdef int j" for a nice speedup.

eviatarbach · 2011-02-16T01:07:10Z

Attachment: 15456.patch.gz

To be applied on top of previous.

eviatarbach · 2011-02-16T01:08:12Z

comment:14

That is a significant speedup! Thank you!

New patch uploaded.

nathanncohen · 2011-05-08T15:10:00Z

comment:15

Hello !

Several superficial remarks :

As in your other patch, the lines should be kept to less than 80 characters

Some functions are not documented

   ```
   /home/ncohen/sage/devel/sage-1/sage/combinat$ sage -coverage debruijn_sequence*
   ----------------------------------------------------------------------
   debruijn_sequence.pyx
   SCORE debruijn_sequence.pyx: 0% (0 of 2)
   
   Missing documentation:
   	 * gen(int t, int p):
   
   
   Missing doctests:
   	 * debruijn_sequence(int k, int n):
   
   ----------------------------------------------------------------------
   
   ----------------------------------------------------------------------
   debruijn_sequences.py
   ERROR: Please add a `TestSuite(s).run()` doctest.
   SCORE debruijn_sequences.py: 75% (3 of 4)
   
   Missing documentation:
   	 * __init__(self, k,n):
   
   ----------------------------------------------------------------------
   
   /home/ncohen/sage/devel/sage-1/sage/combinat$ sage -coverage debruijn_sequence.pyx 
   ----------------------------------------------------------------------
   debruijn_sequence.pyx
   SCORE debruijn_sequence.pyx: 0% (0 of 2)
   
   Missing documentation:
   	 * gen(int t, int p):
   
   
   Missing doctests:
   	 * debruijn_sequence(int k, int n):
   
   ----------------------------------------------------------------------

Let's keep this to 100% lest we write other patches later to improve it `:-)`

The "NOTE::" environment is misused. Look at other places in the source code where it is used. You can see it doesn't display properly when building the documentation
```
```
sage -docbuild reference html
```
Same for the "reference" environment and citations.The best is to look at other places in the souce code :-)
The "cardinality" method, for instance, is not doctested.
It would be good to add a reference to the book "Combinatorial Generation" in the file containing the actual implementaion of the algorithm
There are 4 patches on this ticket.
- If the ticket's name is a number, let it be the ticket's number
- In this kind of situation, it's also very nice to produce "flattened" patched. Sometimes it is better to update the current patch than to create another one to apply on top. You may need to use "Mercurial Queue" if you're not used to it.

Short of all this, thank you for this patch ! As usual, there's much more work in dealing with the programming standards than with the actual math content, but it quickly becomes natural. Thank you very much also, because of you I spend yesterday reading random parts of Combinatorial Generation, and learned very nice things :-)

(send me an email if I can be of any help)

Nathann

fchapoton · 2011-06-28T07:52:27Z

comment:37

Attachment: trac_10530-reviewer.patch.gz

for the bot:

Apply 10530.2.patch trac_10530-reviewer.patch

eviatarbach · 2011-06-30T07:34:05Z

comment:38

Sorry for taking so long with the patch, but I'm having a hard time learning Mercurial queues.

eviatarbach · 2011-06-30T22:59:45Z

comment:39

Merged the two files, other fixes. Only one patch to apply now!

eviatarbach · 2011-06-30T23:05:27Z

Changed reviewer from Nicolas M. Thiéry to Nicolas M. Thiéry, Nathann Cohen

nathanncohen · 2011-07-01T13:00:24Z

comment:42

Hello !!

First, it is my mistake as it appeared in my previous patch, but clearly the second doctest should be changed :-)

if not True:

Something like

D.an_element() in D

Would be much better !

Then, do you get a warning when you build the documentation ? Here's what I get :

/home/ncohen/sage/devel/sage-4$ sage -docbuild reference html
...
reading sources... [100%] sage/combinat/debruijn_sequence                                                                                                                                                                                                                        
docstring of sage.combinat.debruijn_sequence:35: (WARNING/2) Inline interpreted text or phrase reference start-string without end-string.

But sometimes this kind of bug just comes from having built the documentation many times while adding/removing files from it... O_o

Nathann

eviatarbach · 2011-07-02T02:08:08Z

comment:44

I can never get it to build the documentation properly. It ignores debruijn_sequences.pyx for some reason.

Do you have any idea what that warning could mean?

nathanncohen · 2011-07-02T12:45:31Z

comment:45

No idea :-/

I've never been friends with Sphinx. When this kind of things happen, I usually swear a lot and reinstall Sage from scratch ^^;

Nathann

nthiery · 2011-07-02T13:25:12Z

comment:46

Replying to @nathanncohen:

No idea :-/

I've never been friends with Sphinx. When this kind of things happen, I usually swear a lot and reinstall Sage from scratch ^^;

Deleting doc/output (or sometimes just doc/output/doctrees is still quite brute force but usually works.

Just to make sure: debruijn_sequences.pyx is referenced in doc/en/reference/combinat/index.rst?

eviatarbach · 2011-07-02T17:09:45Z

comment:47

Nathann, I can relate. That's what I'm going to do now. Unfortunately my computer takes ages to build the documentation.

debruijn_sequence.pyx is referenced in index.rst.

eviatarbach · 2011-07-08T01:01:20Z

comment:48

Hello Nathann,

I tried building the documentation, and I'm getting the same warning. Do you think it would be okay to submit it anyways?

What's the point of this test?

sage: for n in range(1, 7):

... for k in range(1, 7):
... D = DeBruijnSequences(k, n)
... if not True:
... print "Something's dead wrong (n = ", n, "k =", k, ") !"
... break

I don't see how this can fail.

nathanncohen · 2011-07-08T07:06:05Z

comment:49

Yooooo !

I tried building the documentation, and I'm getting the same warning. Do you think it would be okay to submit it anyways?

No, it has to be fixed ! The release manager will set it back to needs/work otherwise anyway ;-)

What's the point of this test?

As I trold you during my latest review, the

if not True:

Should be repaced by

D.an_element() in D

The aim is to check that all the small DeBruijnSequences that the code return are indeed valid :-)

Nathann

nthiery · 2011-07-11T08:00:14Z

comment:50

Replying to @nathanncohen:

D.an_element() in D

For the record: this is among the generic sanity checks that TestSuite(D).run() does.

eviatarbach · 2011-07-12T05:50:53Z

comment:51

I fixed the docbuild warning (had to quarantine successive parts of the documentation in order to isolate the problem!). It was due to the whitespace before the URL for the Wikipedia link.

I also fixed the doctest.

eviatarbach · 2011-07-12T06:01:46Z

Attachment: 10530.patch.gz

nathanncohen · 2011-07-12T08:27:57Z

comment:52

yeahhhhhhhhhhhhhh !

Positive review to this patch :-)

Nice job debugging the Sphinx warning. I'll keep that in mind for my future patch : I very often spend 10 minutes on each of those ;-)

Nathann

eviatarbach · 2011-07-12T08:50:54Z

comment:53

Yes! Finally :)

eviatarbach · 2011-07-12T19:57:05Z

comment:54

Thank you Nathann and Nicolas for reviewing!

fchapoton · 2011-07-22T19:47:57Z

comment:56

for the bot

Apply 10530.patch

jdemeyer · 2011-08-18T22:02:08Z

Merged: sage-4.7.2.alpha2

eviatarbach added this to the sage-4.7.1 milestone Dec 29, 2010

eviatarbach added c: combinatorics labels Dec 29, 2010

eviatarbach assigned sagetrac-sage-combinat Dec 29, 2010

eviatarbach added the s: needs review label Dec 29, 2010

nthiery added s: needs work and removed s: needs review labels Dec 29, 2010

eviatarbach added s: needs review and removed s: needs work labels Feb 11, 2011

eviatarbach assigned eviatarbach and unassigned sagetrac-sage-combinat Feb 16, 2011

nathanncohen mannequin added s: needs work and removed s: needs review labels May 8, 2011

This comment has been minimized.

Sign in to view

nathanncohen mannequin added s: needs work and removed s: needs review labels Jul 1, 2011

nathanncohen mannequin added s: positive review and removed s: needs work labels Jul 12, 2011

jdemeyer modified the milestones: sage-4.7.1, sage-4.7.2 Jul 13, 2011

jdemeyer removed the s: positive review label Aug 18, 2011

jdemeyer closed this as completed Aug 18, 2011

anneschilling mentioned this issue Nov 3, 2011

Implementation of class for cores #11742

Closed

De Bruijn Sequence construction for combinat #10530

De Bruijn Sequence construction for combinat #10530

Comments

eviatarbach commented Dec 29, 2010

eviatarbach commented Dec 29, 2010

nthiery commented Dec 29, 2010

nthiery commented Dec 29, 2010

eviatarbach commented Dec 30, 2010

nthiery commented Dec 30, 2010

eviatarbach commented Dec 30, 2010

Python

Cython

nthiery commented Jan 2, 2011

Python

Cython

eviatarbach commented Feb 11, 2011

eviatarbach commented Feb 11, 2011

sagetrac-ylchapuy mannequin commented Feb 12, 2011

eviatarbach commented Feb 12, 2011

eviatarbach commented Feb 12, 2011

sagetrac-ylchapuy mannequin commented Feb 15, 2011

eviatarbach commented Feb 16, 2011

eviatarbach commented Feb 16, 2011

nathanncohen mannequin commented May 8, 2011

fchapoton commented Jun 28, 2011

This comment has been minimized.

eviatarbach commented Jun 30, 2011

eviatarbach commented Jun 30, 2011

This comment has been minimized.

eviatarbach commented Jun 30, 2011

nathanncohen mannequin commented Jul 1, 2011

eviatarbach commented Jul 2, 2011

nathanncohen mannequin commented Jul 2, 2011

nthiery commented Jul 2, 2011

eviatarbach commented Jul 2, 2011

eviatarbach commented Jul 8, 2011

nathanncohen mannequin commented Jul 8, 2011

nthiery commented Jul 11, 2011

eviatarbach commented Jul 12, 2011

eviatarbach commented Jul 12, 2011

nathanncohen mannequin commented Jul 12, 2011

eviatarbach commented Jul 12, 2011

eviatarbach commented Jul 12, 2011

fchapoton commented Jul 22, 2011

jdemeyer commented Aug 18, 2011