Smart chunking #141

tchardin · 2021-07-09T08:40:15Z

When it is possible to detect file type with the extension name, we should select an appropriate chunking strategy. This improves deduplication, data transfer speed and makes the network overall more efficient.
Here's some general guidelines:

Audio and video content should have trickle layout and chunk sizes of 1MB.
Images, compressed archives (.zip etc), size splitter with 1MB chunks, balanced layout.
Text, JSON etc. Buzhash chunker with balanced layout and 16kb chunks for best deduplication.
We can probably experiment with different params but this seems to be reasonable efforts.

gallexis · 2021-07-12T09:59:01Z

I'm considering using this lib : https://github.com/gabriel-vasile/mimetype to detect mimetype using the first 512 bytes of a file.

Using the lib should give us more accuracy at the price of adding some slight delay when importing a file (need to be benchmarked).
We could also use both approch, where we first try based on extension name, if none is given, we use the library.

tchardin · 2021-07-13T07:30:19Z

@gallexis yeah we use that lib for detecting the type when posting to the HTTP server so that should work fine.

gallexis · 2021-07-13T08:34:03Z

@tchardin chunk.NewBuzhash(r io.Reader) only takes a reader, I can't set its chunk size to 16kb, is it ok like that?
https://github.com/ipfs/go-ipfs-chunker/blob/master/buzhash.go#L24

tchardin assigned gallexis Jul 9, 2021

gallexis mentioned this issue Jul 12, 2021

feat: add smart chunking #150

Merged

gallexis closed this as completed in #150 Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart chunking #141

Smart chunking #141

tchardin commented Jul 9, 2021

gallexis commented Jul 12, 2021

tchardin commented Jul 13, 2021

gallexis commented Jul 13, 2021 •

edited

Loading

Smart chunking #141

Smart chunking #141

Comments

tchardin commented Jul 9, 2021

gallexis commented Jul 12, 2021

tchardin commented Jul 13, 2021

gallexis commented Jul 13, 2021 • edited Loading

gallexis commented Jul 13, 2021 •

edited

Loading