Skip to content
This repository has been archived by the owner on Nov 22, 2023. It is now read-only.

Smart chunking #141

Closed
tchardin opened this issue Jul 9, 2021 · 3 comments · Fixed by #150
Closed

Smart chunking #141

tchardin opened this issue Jul 9, 2021 · 3 comments · Fixed by #150
Assignees

Comments

@tchardin
Copy link
Contributor

tchardin commented Jul 9, 2021

When it is possible to detect file type with the extension name, we should select an appropriate chunking strategy. This improves deduplication, data transfer speed and makes the network overall more efficient.
Here's some general guidelines:

  • Audio and video content should have trickle layout and chunk sizes of 1MB.
  • Images, compressed archives (.zip etc), size splitter with 1MB chunks, balanced layout.
  • Text, JSON etc. Buzhash chunker with balanced layout and 16kb chunks for best deduplication.
    We can probably experiment with different params but this seems to be reasonable efforts.
@gallexis
Copy link
Contributor

I'm considering using this lib : https://github.com/gabriel-vasile/mimetype to detect mimetype using the first 512 bytes of a file.

Using the lib should give us more accuracy at the price of adding some slight delay when importing a file (need to be benchmarked).
We could also use both approch, where we first try based on extension name, if none is given, we use the library.

@tchardin
Copy link
Contributor Author

@gallexis yeah we use that lib for detecting the type when posting to the HTTP server so that should work fine.

@gallexis
Copy link
Contributor

gallexis commented Jul 13, 2021

@tchardin chunk.NewBuzhash(r io.Reader) only takes a reader, I can't set its chunk size to 16kb, is it ok like that?
https://github.com/ipfs/go-ipfs-chunker/blob/master/buzhash.go#L24

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants