Could we replace speed with a distributed system? #1

matthunz · 2023-07-12T15:02:42Z

Instead of trying to compact the memory layout of the cubes is it possible to use a shared database to store the hashes of each shape? This almost feels like a blockchain style problem where everyone tries to guess the next hash, in this case it's just the next shape.

We could even make this a big distributed Numberphile viewer calculation!

bertie2 · 2023-07-12T15:34:59Z

distribution is almost certainly the name of the game if we want to get much beyond n = 16.

just to confirm when you say "database" do you mean off the shelf database software or something custom?

off the shelf database software will have a hard time due to the truly MASSIVE read loading involved in finding a new cube, if someone wants to implement it and give it a benchmark I would be very interested to see, I might well be wrong, but I suspect at least a partial large cache on the client end will be mandatory for good performance.

something custom which hands out chunks of work similar to seti@home however is what my current work is leading towards.

matthunz · 2023-07-12T15:44:13Z

That's a great point, I was thinking just like an SQL db but having a local cache would save a lot of time and bandwidth

I wonder how that could work though. Handing out chunks seems hard because there's no clear way I can see to shard the data. If a peer generated a new shape I feel like our only option would be to read from the entire set of previous shapes.

EDIT: Actually we would only have to check shapes with the same number of cubes, maybe we could send a subset of the data with shapes of only that size

bertie2 · 2023-07-12T15:50:52Z

for context, when i tried to do n=14 it did this:

and that was just trying to store the n=14 cubes.

yes we should partition data based on cube size, but we will also need to either massively improve density with which we can store cubes or further partition it to clients so that the data is manageable.
if you have a canonical hash for a given cube, you can actually just have the clients work from partial data sets, they will create a bunch of duplicate cubes but they weed out 90% of cubes and you merge the last 10% on the server.

evanrelf · 2023-07-12T16:19:21Z

Might a bloom filter or cuckoo filter be useful for this?

bertie2 · 2023-07-12T17:22:06Z

if we can come up with a faster way to rotate the cubes, or a fast orientation independent hash function then yes.
either would make an excellent first pass to narrow down the search space for the actual full checks.

~~otherwise for now, most of the time is spent rotating the cube all over the place to check for duplicates in different orientations, not in the actual set lookups to check for duplicates themselves.~~

sorry in fact I feel I should clarify, that most time right now is spent on a combination of several steps to manipulate the cubes, rotating, cropping, hashing. its still not just the set lookups which are the problem but its more complex than my initial comment made out.

finally georgedorn at mikepound/cubes#5 came up with a great way to reduce these steps by allocating bigger sets, but my current implementation is memory and memory bandwidth limited at n=13+ and so I have not currently included them in my testing.

mikepound · 2023-07-14T10:08:37Z

If there was a standard storage implementation for cubes, and then a way to order these, could we have all the cubes at n-1 sorted and indexed. Then a distributed node could calculate e.g. all the new cubes from indexes 0 - 10000, i.e. all the unique shapes created by adding one cube to all the polycubes at n-1 from indexes 0 - 10000. Then you would have a series of sets from all nodes, and @evanrelf's suggestion of bloom or cuckoo could then be used to find the unique final set from these?

simplify `int.from_bytes(data, 'big')` + other

datdenkikniet pushed a commit to datdenkikniet/opencubes that referenced this issue Jul 16, 2023

Merge pull request mikepound#1 from VladimirFokow/overhaul

2459e21

simplify `int.from_bytes(data, 'big')` + other

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could we replace speed with a distributed system? #1

Could we replace speed with a distributed system? #1

matthunz commented Jul 12, 2023

bertie2 commented Jul 12, 2023

matthunz commented Jul 12, 2023 •

edited

Loading

bertie2 commented Jul 12, 2023

evanrelf commented Jul 12, 2023

bertie2 commented Jul 12, 2023 •

edited

Loading

mikepound commented Jul 14, 2023

Could we replace speed with a distributed system? #1

Could we replace speed with a distributed system? #1

Comments

matthunz commented Jul 12, 2023

bertie2 commented Jul 12, 2023

matthunz commented Jul 12, 2023 • edited Loading

bertie2 commented Jul 12, 2023

evanrelf commented Jul 12, 2023

bertie2 commented Jul 12, 2023 • edited Loading

mikepound commented Jul 14, 2023

matthunz commented Jul 12, 2023 •

edited

Loading

bertie2 commented Jul 12, 2023 •

edited

Loading