Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations from guy that did n=16 #5

Open
joulebit opened this issue Jul 13, 2023 · 6 comments
Open

Optimizations from guy that did n=16 #5

joulebit opened this issue Jul 13, 2023 · 6 comments

Comments

@joulebit
Copy link
Contributor

Here is the paper from the guy calculating to n=16 in 2005 in 11 days: http://kevingong.com/Polyominoes/ParallelPoly.html

Something very interesting he wrote about how to keep the hashtables small and more dividable

4.1.2 Hashing

One of most fundamental tasks of the polyomino program is to support uniqueness. We have to ensure that a polyomino has not already been counted. To do this, we store all of the unique polyominoes in a hash tree/table structure, and compare the candidate polyomino with only a small subset of the unique polyominoes. To access the hash structure, we first branch on some criteria(on) to find the correct hash table, then index into the hash table.
If we compare the candiate polyomino only with polyominoes of the same height and width, this narrows down the search considerably. We can also branch on the y-coordinate of the last square. This seems a bit strange, but it works well. (See Figure 4.1.2-1) To narrow it down even further, we use a hash function. This takes the XOR of every n bits, where n = height-1. (See Figure 4.1.2-2) This is not necessarily the best hash function, but it is simple to implement, and gives fair results, as we will see.

@ClaasF
Copy link

ClaasF commented Jul 13, 2023

While watching the video, I wondered, if there's not a simpler way to canonicalize a representation, which would result in almost trivial hashing.

Each cube is essentially a string of 1s and 0s, just packed a bit weirdly (I don't know, how efficient numpy is here). So you could interpret each cube simply as a number.

Example in 2D:

Given the map
`
0 1 0

1 1 0

0 1 0
`
(edit: I have no idea, how to format this properly, but I think you get the idea)

you might interpret that as the binary "010110010". Each of the rotations should have a different binary representation (unless it's symmetrical, in which case it doesn't matter).

What I would call the "canonical" representation of a polycube is simply the rotation with the smallest binary value.

@mikepound
Copy link
Owner

This are really interesting ideas - I didn't read around much when I implemented mine in front of the TV! Actually I had in mind a video where it sparked some discussions, so in some sense my "do it badly" approach has been a success!

The speed of hashing and the speed of lookup are a concern. I certainly think the encoding / hashing could be improved, I also am not sure what the maximum capacity of the python set is. If it's small, then there'll be an increasing number of collisions that will massively slow down lookup. In this case splitting the problem on some trivial condition and having multiple hashes may be a lot more efficient.

Binary representation would also speed up hashing a lot. On my mind when I didn't do this, was the number of bits required to hold the biggest possible polycube for some n. For example what n means that we now have polycubes more than 32 bits? Python actually uses arbitrary precision integers, but i'd imagine these may take a performance hit.

@ClaasF
Copy link

ClaasF commented Jul 13, 2023

Binary representation would also speed up hashing a lot. On my mind when I didn't do this, was the number of bits required to hold the biggest possible polycube for some n. For example what n means that we now have polycubes more than 32 bits? Python actually uses arbitrary precision integers, but i'd imagine these may take a performance hit.

Since it's just plain and simply hashing anyway, that shouldn't be too bad. One could even just use the pure String (capital S) and hash that. If the Bitcoin hype taught us anything it's that modern hardware is really good at hashing.

@ClaasF
Copy link

ClaasF commented Jul 13, 2023

I just played around with the binary representation and turns out, it's not that simple:

cube 1:

[[[0]
[1]]

[[1]
[1]]]

cube 2:

[[[0 1]]

[[1 1]]]

And flattened:

[0 1 1 1]
[0 1 1 1]

So two very different sequences have the same signature - if one uses that simple approach, at least.
grafik

@ClaasF
Copy link

ClaasF commented Jul 13, 2023

Update number 2: I found a relatively simple, but not exactly optimal solution: Padding the array with zeros:

def sig(polycube):
    polycube = np.pad(polycube, 1, 'constant', constant_values=0)
    packed_bytes = np.packbits(polycube)
    return int.from_bytes(packed_bytes)

def canonical_sig(polycube):
    sigs = [ (sig(cube_rotation), cube_rotation) for cube_rotation in all_rotations(polycube)]
    return min(sigs, key=lambda tup: tup[0])

This does yield the correct result, but padding is apparently really expensive, so it takes almost twice as long.

Not exactly the result I hoped for...

bertie2 pushed a commit to bertie2/opencubes that referenced this issue Jul 15, 2023
@prophaner
Copy link

I don't know where else to submit my entry. Check it out and see if it helps:
https://github.com/prophaner/polycubes/tree/main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants