rabin chunker - object size limit exceeded #6841

RubenKelevra · 2020-01-20T23:17:05Z

Version information:

go-ipfs version: 0.4.22-4e981576b-dirty
Repo version: 7
System version: amd64/linux
Golang version: go1.12.8

Description:

using the rabin chunker seems to work only on small files right now:

ipfs add --nocopy -t --chunker=rabin-262144-1048576-8388608 bigfile.bin

bigfile.bin is 670 MB, the progress freezes at some MB and this is the error message:

Error: object size limit exceeded

Additionally running this command will crash ipfs instantly:

ipfs add --nocopy -t --chunker=rabin-262144-1048576-99999999999 bigfile.bin

The text was updated successfully, but these errors were encountered:

Stebalien · 2020-01-20T23:44:11Z

Hm. This is because the maximum block size is 1MiB.

Additionally running this command will crash ipfs instantly:

Yep, it's trying to allocate a massive file.

@ribasushi could you fix this? We should probably error in github.com/ipfs/go-ipfs-chunker (maybe in FromString?).

RubenKelevra · 2020-01-20T23:47:45Z

Thanks!

I knew I was holding it wrong ... somehow.

I can confirm that this is working fine:

ipfs add --nocopy -t --chunker=rabin-262144-524288-1048576 bigfile.bin

dbaarda · 2020-01-21T14:49:03Z

Was going to say that was a massive chunk size. Deduping content with rabin chunking will work better with smaller chunks. The ideal chunk size is probably the same as used for the recommended block size by rsync, which is sqrt(filesize). However the sweet-spot compromise between number of blocks and amount of deduping for IPFS would be different. You probably don't want chunks smaller than about 16k. On a side note, I've looked at the code for rabin chunking and it's using rabin fingerprint as the rollsum for chunking. This is way more complicated and slower for little-or-no benefit than the rabin-karp rollsum. I recently added rabin-karp support to librsync, and did a bunch of testing of several rollsums before I chose it. Is there any interest in a faster/simpler rollsum for rabin chunking? Of course changing it would make it incompatible with the old chunking.

…

On Tue, 21 Jan 2020, 10:48 am @RubenKelevra, ***@***.***> wrote: Thanks! I knew I was holding it wrong ... somehow. I can confirm that this is working fine: ipfs add --nocopy -t --chunker=rabin-262144-524288-1048576 bigfile.bin — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6841>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPTTM6WS4AB5ZSKM6536BTQ6YZ2XANCNFSM4KJLCFWA> .

Stebalien · 2020-01-21T15:55:12Z

We've just addesd support for a buzhash backed chunker (in master). It's significantly faster and no worse, from what we can tell.

@ribasushi is working on a custom content-based chunker with a bunch of different heuristics.

For now, I'd hold off on adding additional chunking algorithms directly. However, please talk with @ribasushi about this algorithm. He's writing a testing tool for comparing chunking algorithms.

ribasushi · 2020-01-21T15:56:32Z

@dbaarda I'll flag you on the issue containing the test-bed, when it is ready ( couple days, deity-willing )

RubenKelevra · 2020-01-21T17:38:25Z

Was going to say that was a massive chunk size. Deduping content with rabin chunking will work better with smaller chunks.

Yeah, I was just "whatevery, maybe a large number will throw a different error" and it did. ;)

I think we should just do a sanity check of the input values than crashing, and the errormessage could be better than "object size limit exceeded".

RubenKelevra · 2020-01-21T17:41:17Z

We've just addesd support for a buzhash backed chunker (in master). It's significantly faster and no worse, from what we can tell.

Cool! How 'stable' do you think is this implementation? The Wikipedia mirror project is thinking about fetching always the latest versions and running an ipfs cluster. On this file types a rolling checksum is key to get the diffs small.

ribasushi · 2020-01-21T17:51:27Z

@RubenKelevra while buzhash itself is stable, there will be a large amount of "extra considerations" landing in the next ~4 weeks or so, that are likely to lead to a substantially different approach to chunking.

You happen to ask these questions right in the middle of my work on this checklist ipfs/specs#227 (comment), hence the "cagey answers".

TLDR: Stay tuned for a bit more, before making decisions

RubenKelevra · 2020-01-21T18:07:46Z

The mirror is outdated since mid 2017. A month or two doesn't really matter much 😂

Since we're talking about multiple terabyte of data, it's fair to say that it's probably wise to wait for some upcoming major changes, to avoid converting a cluster with maybe dozens of machines.

I'm currently aware of CIDv2, buzhash, UnixFSv2 and some other experimental features which are going to be standard in the future.

The ticket where we try to plan the cluster:

ipfs/distributed-wikipedia-mirror#68

I'm pretty new to this project and like to help organize some stuff going onto clusters of users willing to help a project with some storage - since I don't know Go - that's basically all I can do...

And write some nice bullet point lists of features I like to have hint hint

ipfs/notes#397

Stebalien · 2020-01-21T18:16:56Z

Define "stable". We're not planning on removing it, but it's not likely to be the "final" default chunker.

RubenKelevra · 2020-01-21T18:18:45Z

@Stebalien If selected it will give stable outputs of the same content in the future, that we don't have to convert TB of data just because the way it's been calculated has been changed.

ribasushi · 2020-01-21T18:18:59Z

@RubenKelevra in terms of "too much data" from your list the most bang-for-buck would be the chunking research. CIDv1+base32 and UnixFSv2 are important endeavors but are not directly relevant to what you are doing ( a preexisting encode of historical data ought to keep working "forever", though we are working on exact definitions of "forever" so that project like yours can make informed decisions, ping @momack2 )

In addition there is a concerted push to get the content discovery ( DHT ) work better, slated for an IPFS release in ~mid-March ( #6776 (comment), scroll down to Todo). While it will not change how you encode Wikipedia's data "at rest", it will make a massive difference in terms of usability, so any evaluation you do right now, may need to be re-done at that point.

RubenKelevra · 2020-01-21T18:25:55Z

@ribasushi thanks for all the input!

This gives a lot of time to find some people willing to donate bandwidth and storage to build a cluster for it!

Stebalien · 2020-01-21T18:28:40Z

I don't foresee us changing _this_ algorithm ever. We'll just add additional chunkers.

dbaarda · 2020-01-21T22:32:59Z

Interesting reading in all the links here. I too did my own analysis of different rollsums for librsync. I rejected buzzhash (AKA cyclic polynomial) because it had really bad collision rates for ASCII data; https://github.com/dbaarda/rollsum-tests/blob/master/RESULTS.rst I didn't even test rabin fingerprint because initially I couldn't find a decent description of the algorithm and there was some confusion around it and the rabinkarp (AKA polynomial rolling hash) rollsum used in the Rabin-Karp algorithm documentation. When i did find it my analysis was it was too complicated to be a viable option for librsync and I couldn't see any benefits over the already nearly optimal rabinkarp. However, librsync's rollsum requirements are a little different to just chunking (it uses them for matching), so buzzhash might be fine for chunking.

…

On Wed, 22 Jan 2020, 5:28 am Steven Allen, ***@***.***> wrote: I don't foresee us changing _this_ algorithm ever. We'll just add additional chunkers. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6841>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPTTM2WCC5B4GH57LGIJ7LQ645FTANCNFSM4KJLCFWA> .

RubenKelevra · 2020-01-22T11:16:58Z

However, librsync's rollsum requirements are a little different to just chunking (it uses them for matching), so buzzhash might be fine for chunking.

I think we should import some real world data, like diffs from articles on Wikipedia or the Linux Kernel git, and apply the diffs to pinned objects.

Then we measure how much storage requirement each setup takes.

We could even build a test out of this, to see if a commit change these known numbers significantly, to find some unexpected bugs, or automatically test storage optimizations/new rolling hashes which are added to the project.

ribasushi · 2020-01-22T13:10:38Z

Then we measure how much storage requirement each setup takes.

@RubenKelevra this is literally what I am working on, but it is slower going than I'd like. With a cleared up weekend I should have a preliminary version others can hack on before Monday...

fixes #6841 fixes #6860

Kubuxu · 2020-02-08T00:56:33Z

@dbaarda I think rollsum differs quite a bit with how we've implemented buzhash so you might want to give it a look. It uses a balanced random map for 8bit to 32bit mapping and we also use xor-shit instead of add.

dbaarda · 2020-02-08T02:26:17Z

@Kubuxu in librsync there are now 2 different rollsums; the original (not very good) rsync rollsum which is a hybrid of adler32 and fletcher, and a RabinKarp rollsum (AKA polynomial rolling hash, as used in the RabinKarp algorithm doc, different to the Rabin Fingerprint). It doesn't use buzzhash (AKA cyclic polynomial) because my testing showed it had unacceptably high collisions for ASCII data.

I think the reason buzzhash has high collisions is ASCII data uses only a small subset of byte values, the shift/rotate operations cycle every 32 rotates, and xor doesn't mix adjacent bits. Together this means there is a high probability of bytes to repeat in the input stream in a way that align so the xor cancels them both out of the sum. Also, just swapping any 2 bytes that are 32 bytes apart will give the same hash. It means it's less sensitive to the order of the bytes in the stream, and with ASCII's reduced range of values, the chance of ending up with the same bytes in a different order is much higher. Using a balanced random map for 8->32bits doesn't help with collisions, since the same bytes always map to the same bit pattern you xor with (though it does help to better distribute the hashes over the full 32bit range).

The RabinKarp rollsum uses multiplies and adds instead of rotates and xor's, which both do propagate to adjacent bits, making it much more sensitive to the order of the bytes in the stream. It gets a good hash distribution without needing an 8->32bit balanced random mapping.

However, this might not matter for chunking, which probably doesn't care about collisions. What matters most is a good enough solution that is fast and efficient. Buzzhash's rotate and xor is probably a tiny bit faster than RabinKarp's multiply and add (multiplies can be expensive). However, an 8->32bit lookup table is something that ends up stealing CPU cache space from other parts of the process, and that might be worse than the multiply overheads. You'd need to test it.

For a summary of the rollsums and their names see https://en.wikipedia.org/wiki/Rolling_hash

ribasushi · 2020-02-08T04:49:40Z

... is something that ends up stealing CPU cache space from other parts of the process...

A periodic reminder for folks following along: the actual rolling hash calculation never exists in a vacuum. There is the outer context of hashing/taggin/moving the resulting chunks around, and this almost always takes place synchronously with the chunking itself due to stream-processing.

In other words focusing on the clock-count of a particular rolling hash algorithm is very misleading.

I am still working on the tool I mentioned higher up, that makes it more convenient to quantify the above claim. A lot of progress has been made, but realistically at least another full week out before a v0.1.

RubenKelevra · 2020-02-08T09:09:34Z

... is something that ends up stealing CPU cache space from other parts of the process...

A periodic reminder for folks following along: the actual rolling hash calculation never exists in a vacuum. There is the outer context of hashing/taggin/moving the resulting chunks around, and this almost always takes place synchronously with the chunking itself due to stream-processing.
In other words focusing on the clock-count of a particular rolling hash algorithm is very misleading.

This sounds like it's probably wise to run a test on a single CPU core and run synthetic tests additionally along, to measure how the impact on additionally jobs running in parallel would be.

Prime95 comes to mind a synthetic load.

I did some testing a while back and used BOINC which would run just one workunit of Seti@home as a background load with high nice setting.

The interesting value for me was the system load, one minute into the test.

The issue with using something like BOINC is, you can't start the same work units over and over again in an automated test to see if changes to the rest of the code may have an impact.

Prime95 could run the same work on each test, helping to find performance bugs.

I am still working on the tool I mentioned higher up, that makes it more convenient to quantify the above claim. A lot of progress has been made, but realistically at least another full week out before a v0.1.

Sounds like you made some progress, thanks for your work!

fixes ipfs#6841 fixes ipfs#6860

RubenKelevra added the kind/bug A bug in existing code (including security flaws) label Jan 20, 2020

Stebalien assigned ribasushi Jan 20, 2020

RubenKelevra mentioned this issue Jan 21, 2020

Set up collaborative pinning clusters ipfs/distributed-wikipedia-mirror#68

Closed

3 tasks

RubenKelevra mentioned this issue Jan 25, 2020

git-diff feature: Improve efficiency of IPFS for sharing updated file. Decrease file/block duplication ipfs/notes#392

Open

Stebalien added a commit that referenced this issue Feb 7, 2020

chore(dep): update

1b72325

fixes #6841 fixes #6860

Stebalien mentioned this issue Feb 7, 2020

chore(dep): update #6884

Merged

Stebalien closed this as completed in #6884 Feb 7, 2020

ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 6, 2020

chore(dep): update

74fa2ae

fixes ipfs#6841 fixes ipfs#6860

ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 6, 2020

chore(dep): update

1531196

fixes ipfs#6841 fixes ipfs#6860

ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020

chore(dep): update

d2edd3a

fixes ipfs#6841 fixes ipfs#6860

ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020

chore(dep): update

efc2a3a

fixes ipfs#6841 fixes ipfs#6860

ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020

chore(dep): update

85783aa

fixes ipfs#6841 fixes ipfs#6860

ralendor pushed a commit to ralendor/go-ipfs that referenced this issue Jun 8, 2020

chore(dep): update

1918869

fixes ipfs#6841 fixes ipfs#6860

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rabin chunker - object size limit exceeded #6841

rabin chunker - object size limit exceeded #6841

RubenKelevra commented Jan 20, 2020 •

edited

Loading

Stebalien commented Jan 20, 2020

RubenKelevra commented Jan 20, 2020

dbaarda commented Jan 21, 2020 via email

Stebalien commented Jan 21, 2020

ribasushi commented Jan 21, 2020

RubenKelevra commented Jan 21, 2020 •

edited

Loading

RubenKelevra commented Jan 21, 2020

ribasushi commented Jan 21, 2020

RubenKelevra commented Jan 21, 2020

Stebalien commented Jan 21, 2020 via email

RubenKelevra commented Jan 21, 2020

ribasushi commented Jan 21, 2020

RubenKelevra commented Jan 21, 2020

Stebalien commented Jan 21, 2020 via email

dbaarda commented Jan 21, 2020 via email

RubenKelevra commented Jan 22, 2020

ribasushi commented Jan 22, 2020 •

edited

Loading

Kubuxu commented Feb 8, 2020

dbaarda commented Feb 8, 2020

ribasushi commented Feb 8, 2020

RubenKelevra commented Feb 8, 2020

rabin chunker - object size limit exceeded #6841

rabin chunker - object size limit exceeded #6841

Comments

RubenKelevra commented Jan 20, 2020 • edited Loading

Version information:

Description:

Stebalien commented Jan 20, 2020

RubenKelevra commented Jan 20, 2020

dbaarda commented Jan 21, 2020 via email

Stebalien commented Jan 21, 2020

ribasushi commented Jan 21, 2020

RubenKelevra commented Jan 21, 2020 • edited Loading

RubenKelevra commented Jan 21, 2020

ribasushi commented Jan 21, 2020

RubenKelevra commented Jan 21, 2020

Stebalien commented Jan 21, 2020 via email

RubenKelevra commented Jan 21, 2020

ribasushi commented Jan 21, 2020

RubenKelevra commented Jan 21, 2020

Stebalien commented Jan 21, 2020 via email

dbaarda commented Jan 21, 2020 via email

RubenKelevra commented Jan 22, 2020

ribasushi commented Jan 22, 2020 • edited Loading

Kubuxu commented Feb 8, 2020

dbaarda commented Feb 8, 2020

ribasushi commented Feb 8, 2020

RubenKelevra commented Feb 8, 2020

RubenKelevra commented Jan 20, 2020 •

edited

Loading

RubenKelevra commented Jan 21, 2020 •

edited

Loading

ribasushi commented Jan 22, 2020 •

edited

Loading