New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De Bruijn Sequence construction for combinat #10530
Comments
Attachment: 15057.patch.gz |
comment:2
Hi! I just had a quick look at your patch. This sounds good! A couple micro comments (before an actual review):
Cheers, |
Reviewer: Nicolas M. Thiéry |
comment:4
Thank you for the comments. I have a few questions: Rather than defining an attribute self.k_is_alphabet, I would setup two attributes self.k and self.alphabet once for all in the init. Then no more branching needs to be done later on. I know it looks ugly. The point of having k_is_alphabet is to check whether the input k is an alphabet or an arity, as the processing is different in either case. I was thinking of removing the alphabet functionality; maybe just the arity would suffice? One could always map values onto the digits after. The length of the sequence is known in advance, right? Then it should be faster to allocate a bit list right away, and then fill it up, rather than using append. I'm not clear on this. Are you suggesting I compute the length of the array in advance, create sequence full of zeros to that length, and then use indeces to change the values? Is append that much slower that creating a long list and then filling it would be faster? Or are you suggesting that I declare sequence as an int array in C, then fill it? This was my initial idea, but I couldn't find a way to insert arguments into the declaration. This returns a traceback: cdef sequence[k^n] Thanks again! |
comment:5
Replying to @eviatarbach:
It feels only marginaly different. For example, if alphabet is set to
As you feel like; that functionality can indeed be added later.
Yes.
With append, Python needs to regularly reallocate the list in memory, I recommend trying both, and benchmarking with timeit (and post here
I don't know what's the canonical way to allocate C arrays in Ah, by the way: please add tests for all the base cases: k=0, Also please add tests for the direct call to Cheers, |
comment:6
Append is marginally faster (by about one microsecond), probably because for the other I had to declare a global index variable in Python. Let me get this straight: PythonFilename: debruijn_sequence.py CythonFilename: debruijn_sequence.pyx Is this correct? Also, for some reason, when I use underscores in debruijn_sequence.py, I get this error:
Thank you! |
comment:7
Replying to @eviatarbach:
Interesting to know! Have you tried with large output? Say, a sequence of length 10^5 or 10^6? For the following, see also the discussion on: http://groups.google.com/group/sage-combinat-devel/browse_thread/thread/bf20e88681bc557b
Class name: DeBruijnSequences (it models the set of all such sequences)
+1 Possibly debruijn_sequences_cython.pyx in case you expect other
Hmm, as far as I know this should work. Please double check all Cheers, |
comment:9
Attachment: 15454.patch.gz Sorry for taking so long, but here is the updated patch. The changes are as follows:
I tried to allocate the length of the sequence length beforehand and tested with larger inputs, as you suggested, but it is still slower. I don't know much about algorithmic efficiency, but that's what I observed in my tests. For one input, I got around 50 ms and 120 ms, respectively. I decided to not have the sequences as words, to leave the option open for generation of all possible sequences open. Thanks in advance! |
comment:10
By the way, it was built on Sage 4.6.2.alpha4. |
comment:11
Hi, |
comment:12
Yes, you're correct. It seems that the formula I was looking at only applies when k=2. I uploaded a new patch. |
To be applied on top of previous patch |
comment:13
Attachment: 15455.patch.gz Another slight remark: |
Attachment: 15456.patch.gz To be applied on top of previous. |
comment:14
That is a significant speedup! Thank you! New patch uploaded. |
comment:15
Hello ! Several superficial remarks :
Short of all this, thank you for this patch ! As usual, there's much more work in dealing with the programming standards than with the actual math content, but it quickly becomes natural. Thank you very much also, because of you I spend yesterday reading random parts of Combinatorial Generation, and learned very nice things (send me an email if I can be of any help) Nathann |
comment:37
Attachment: trac_10530-reviewer.patch.gz for the bot: Apply 10530.2.patch trac_10530-reviewer.patch |
This comment has been minimized.
This comment has been minimized.
comment:38
Sorry for taking so long with the patch, but I'm having a hard time learning Mercurial queues. |
comment:39
Merged the two files, other fixes. Only one patch to apply now! |
This comment has been minimized.
This comment has been minimized.
Changed reviewer from Nicolas M. Thiéry to Nicolas M. Thiéry, Nathann Cohen |
comment:42
Hello !! First, it is my mistake as it appeared in my previous patch, but clearly the second doctest should be changed
Something like
Would be much better ! Then, do you get a warning when you build the documentation ? Here's what I get :
But sometimes this kind of bug just comes from having built the documentation many times while adding/removing files from it... Nathann |
comment:44
I can never get it to build the documentation properly. It ignores debruijn_sequences.pyx for some reason. Do you have any idea what that warning could mean? |
comment:45
No idea I've never been friends with Sphinx. When this kind of things happen, I usually swear a lot and reinstall Sage from scratch Nathann |
comment:46
Replying to @nathanncohen:
Deleting doc/output (or sometimes just doc/output/doctrees is still quite brute force but usually works. Just to make sure: debruijn_sequences.pyx is referenced in doc/en/reference/combinat/index.rst? |
comment:47
Nathann, I can relate. That's what I'm going to do now. Unfortunately my computer takes ages to build the documentation. debruijn_sequence.pyx is referenced in index.rst. |
comment:48
Hello Nathann, I tried building the documentation, and I'm getting the same warning. Do you think it would be okay to submit it anyways? What's the point of this test?
... for k in range(1, 7): I don't see how this can fail. |
comment:49
Yooooo !
No, it has to be fixed ! The release manager will set it back to needs/work otherwise anyway
As I trold you during my latest review, the
Should be repaced by
The aim is to check that all the small DeBruijnSequences that the code return are indeed valid Nathann |
comment:50
Replying to @nathanncohen:
For the record: this is among the generic sanity checks that |
comment:51
I fixed the docbuild warning (had to quarantine successive parts of the documentation in order to isolate the problem!). It was due to the whitespace before the URL for the Wikipedia link. I also fixed the doctest. |
Attachment: 10530.patch.gz |
comment:52
yeahhhhhhhhhhhhhh ! Positive review to this patch Nice job debugging the Sphinx warning. I'll keep that in mind for my future patch : I very often spend 10 minutes on each of those Nathann |
comment:53
Yes! Finally :) |
comment:54
Thank you Nathann and Nicolas for reviewing! |
comment:56
for the bot Apply 10530.patch |
Merged: sage-4.7.2.alpha2 |
I have written an implementation of De Bruijn sequences for combinat. It can currently generate one sequence for an alphabet (or arity) and substring length. It can also calculate cardinality.
For the human:
Apply :
for the bot:
Apply 10530.patch
CC: @sagetrac-sage-combinat @sagetrac-tmonteil
Component: combinatorics
Author: Eviatar Bach
Reviewer: Nicolas M. Thiéry, Nathann Cohen
Merged: sage-4.7.2.alpha2
Issue created by migration from https://trac.sagemath.org/ticket/10530
The text was updated successfully, but these errors were encountered: