Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could we replace speed with a distributed system? #1

Open
matthunz opened this issue Jul 12, 2023 · 6 comments
Open

Could we replace speed with a distributed system? #1

matthunz opened this issue Jul 12, 2023 · 6 comments

Comments

@matthunz
Copy link

Instead of trying to compact the memory layout of the cubes is it possible to use a shared database to store the hashes of each shape? This almost feels like a blockchain style problem where everyone tries to guess the next hash, in this case it's just the next shape.

We could even make this a big distributed Numberphile viewer calculation!

@bertie2
Copy link
Collaborator

bertie2 commented Jul 12, 2023

distribution is almost certainly the name of the game if we want to get much beyond n = 16.

just to confirm when you say "database" do you mean off the shelf database software or something custom?

off the shelf database software will have a hard time due to the truly MASSIVE read loading involved in finding a new cube, if someone wants to implement it and give it a benchmark I would be very interested to see, I might well be wrong, but I suspect at least a partial large cache on the client end will be mandatory for good performance.

something custom which hands out chunks of work similar to seti@home however is what my current work is leading towards.

@matthunz
Copy link
Author

matthunz commented Jul 12, 2023

That's a great point, I was thinking just like an SQL db but having a local cache would save a lot of time and bandwidth

I wonder how that could work though. Handing out chunks seems hard because there's no clear way I can see to shard the data. If a peer generated a new shape I feel like our only option would be to read from the entire set of previous shapes.

EDIT: Actually we would only have to check shapes with the same number of cubes, maybe we could send a subset of the data with shapes of only that size

@bertie2
Copy link
Collaborator

bertie2 commented Jul 12, 2023

for context, when i tried to do n=14 it did this:
ahhhh-my-ram
and that was just trying to store the n=14 cubes.

yes we should partition data based on cube size, but we will also need to either massively improve density with which we can store cubes or further partition it to clients so that the data is manageable.
if you have a canonical hash for a given cube, you can actually just have the clients work from partial data sets, they will create a bunch of duplicate cubes but they weed out 90% of cubes and you merge the last 10% on the server.

@evanrelf
Copy link

Might a bloom filter or cuckoo filter be useful for this?

@bertie2
Copy link
Collaborator

bertie2 commented Jul 12, 2023

if we can come up with a faster way to rotate the cubes, or a fast orientation independent hash function then yes.
either would make an excellent first pass to narrow down the search space for the actual full checks.

otherwise for now, most of the time is spent rotating the cube all over the place to check for duplicates in different orientations, not in the actual set lookups to check for duplicates themselves.

sorry in fact I feel I should clarify, that most time right now is spent on a combination of several steps to manipulate the cubes, rotating, cropping, hashing. its still not just the set lookups which are the problem but its more complex than my initial comment made out.

finally georgedorn at mikepound/cubes#5 came up with a great way to reduce these steps by allocating bigger sets, but my current implementation is memory and memory bandwidth limited at n=13+ and so I have not currently included them in my testing.

@mikepound
Copy link
Owner

If there was a standard storage implementation for cubes, and then a way to order these, could we have all the cubes at n-1 sorted and indexed. Then a distributed node could calculate e.g. all the new cubes from indexes 0 - 10000, i.e. all the unique shapes created by adding one cube to all the polycubes at n-1 from indexes 0 - 10000. Then you would have a series of sets from all nodes, and @evanrelf's suggestion of bloom or cuckoo could then be used to find the unique final set from these?

datdenkikniet pushed a commit to datdenkikniet/opencubes that referenced this issue Jul 16, 2023
simplify `int.from_bytes(data, 'big')` + other
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants