Skip to content

implement trickledag for faster unixfs operations #713

Merged
merged 4 commits into from

2 participants

@whyrusleeping
IPFS member

Alright, so ive come up with a new tree structure optimized for both streaming AND seeking through a given file. This improves both upon the ext4 structure (Which is mainly aimed at on disk filesystems) and the "List of Lists" idea i previously commented about.

The downside of the ext4 style tree layout was that, as you got farther into the file, the number of requests you need to make in order to get data increases, I noticed this problem and came up with the "List of Lists" layout, which would work fantastically for a sequential stream, the issue though, comes when you try to seek through it, the top level node is very poorly weighted to one side so that its 'narrow' from the data's perspective, thus seeking through requires O(n) requests to find the desired location in the file, where ext4 was roughly O(log(n)).

The Trickle{Tree,Dag} addresses both of these concerns, each request after the first can return actual file data, and the cost of seeking remains near O(log(n)) since it has a recursive tree structure. A visualization of it would look like the ext4 tree, but instead of having iteratively deeper 'balanced' trees, it has an iteratively deeper version of itself. The primary tenet of its design is "Data at every layer"

An example layout is here:
http://gateway.ipfs.io/ipfs/QmRPfwo1XQErHDXpeCnJ7j92ibGNTBxkrmBFCbvEa78gZB

@jbenet
IPFS member

(i think that if you rebase on master, that error will be fixed)

@jbenet jbenet modified the milestone: α
@whyrusleeping whyrusleeping was assigned by jbenet
@jbenet
IPFS member

I think the right thing to do with all these datastructures is to setup a benchmark suite that tests various different types of workloads. it may be that we find one or two that are really different datastructures will be better for different use cases. re-indexing the same data blocks might be fine to have "different handles" on the same content.

@jbenet jbenet commented on an outdated diff
importer/importer_test.go
+ }
+ if n != start {
+ t.Fatal("Failed to seek to correct offset")
+ }
+
+ out, err := ioutil.ReadAll(rs)
+ if err != nil {
+ t.Fatal(err)
+ }
+
+ err = arrComp(out, should[start:])
+ if err != nil {
+ t.Fatal(err)
+ }
+}
+
@jbenet
IPFS member
jbenet added a note

maybe add some benchmarks to this pkg?

@jbenet
IPFS member
jbenet added a note

Also, these tests only test it from the outside. It would be useful to test the implementation actually creates a well-formed structure. Maybe add a test that checks the structure produced?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jbenet jbenet commented on an outdated diff
importer/trickledag.go
+// given depth. Higher values increase the width of a given node, which
+// improves seek speeds.
+const layerRepeat = 4
+
+func BuildTrickleDagFromReader(r io.Reader, ds dag.DAGService, mp pin.ManualPinner, spl chunk.BlockSplitter) (*dag.Node, error) {
+ // Start the splitter
+ blkch := spl.Split(r)
+
+ // Create our builder helper
+ db := &dagBuilderHelper{
+ dserv: ds,
+ mp: mp,
+ in: blkch,
+ maxlinks: DefaultLinksPerBlock,
+ indrSize: defaultIndirectBlockDataSize(),
+ }
@jbenet
IPFS member
jbenet added a note

let's not share datstructures here. changing the regular dag reader can break trickle, etc. why dont we separate into packages:

/importer      <--- maybe exports a version of dagBuilderHelper that subpkgs can extend/compose
/importer/balanced   <--- the normal one
/importer/trickle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jbenet jbenet and 1 other commented on an outdated diff
importer/importer_test.go
+ t.Fatal(err)
+ }
+
+ rs, err := uio.NewDagReader(context.Background(), nd, dnp.ds)
+ if err != nil {
+ t.Fatal(err)
+ }
+
+ start := int64(4000)
+ n, err := rs.Seek(start, os.SEEK_SET)
+ if err != nil {
+ t.Fatal(err)
+ }
+ if n != start {
+ t.Fatal("Failed to seek to correct offset")
+ }
@jbenet
IPFS member
jbenet added a note

seeking once, to a fixed offset, is not robust... how about:

buf := make([]byte, nbytes)
for i := 0; i < 1000; i++ {
  start := rand.Intn(nbytes)
  end := rand.Intn(nbytes - start)
  n, err := rs.Seek(start, os.SEEK_SET)
  if err != nil {
    t.Fatal(err)
  }
  if n != start {
    t.Fatal("failed to seek to correct offset")
  }
  n, err = ioutil.ReadFull(rs, buf)
  if err != nil {
    t.Fatal(err)
  }
  if n != (end - start) {
    t.Fatal("failed to read correct size")
  }
  err = arrComp(buf, should[start:end])
  if err != nil {
    t.Fatal(err)
  }
}
@whyrusleeping
IPFS member

I think ill actually just make the tests accept a BuilderFunction and run them all for both layouts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@jbenet
IPFS member

Few comments, otherwise LGTM

whyrusleeping added some commits
@whyrusleeping whyrusleeping implement trickledag for faster unixfs operations b3e74fa
@whyrusleeping whyrusleeping refactor importer package with trickle and balanced dag generation bc79ae1
@whyrusleeping whyrusleeping fix benchmarks 414bdc7
@whyrusleeping whyrusleeping clean up benchmarks, implement WriterTo on DAGReader, and optimize Da…
…gReader
1e93ee0
@whyrusleeping whyrusleeping merged commit adb7ad9 into master

1 of 2 checks passed

Details default Build #1344-feat/trickledag-1e93 failed in 19 min
Details continuous-integration/travis-ci The Travis CI build passed
@jbenet jbenet deleted the feat/trickledag branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.