Skip to content
This repository has been archived by the owner on Aug 12, 2020. It is now read-only.

Builder refactoring, trickle builder and balanced builder #118

Merged
merged 30 commits into from
Jan 11, 2017
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
511c746
builder refactoring. trickle builder. balanced builder
pgte Dec 28, 2016
0e3f158
removed unused experimental builder
pgte Dec 28, 2016
6af7458
documented importer options
pgte Dec 28, 2016
977f41b
default builder strategy is now the balanced strategy
pgte Dec 28, 2016
c767848
removed unused test
pgte Dec 28, 2016
7854682
removed superfluous comment
pgte Dec 28, 2016
df48f69
fixed trickle builder
pgte Dec 29, 2016
50c5d35
removed superfluous comment
pgte Dec 29, 2016
cbe2ce4
using options.chunkerOptions for chunker-specific options
pgte Dec 29, 2016
f8b9e80
docs: corrected option name
pgte Dec 29, 2016
fedfc30
fix: error handling in trickle reducer
pgte Dec 29, 2016
8e8d3d6
using pull-pair instead of backpressure-less bespoke pair
pgte Dec 30, 2016
7647657
fixed trickle builder tests
pgte Jan 3, 2017
74482f3
recursive streaming trickle builder
pgte Jan 3, 2017
2b92345
missing dep
pgte Jan 3, 2017
01d8583
some style corrections
pgte Jan 3, 2017
0036314
importing multiple roots yields an error
pgte Jan 3, 2017
02cdefd
reinstated testing importing using flat and balanced strategies
pgte Jan 3, 2017
8ac163c
asserting that root node is one and only one
pgte Jan 3, 2017
e723586
testing import and export using various builder strategies
pgte Jan 3, 2017
b9a01f8
fixed error propagation into push streams
pgte Jan 3, 2017
03f49d4
simplified some iteration logic
pgte Jan 3, 2017
fedbe5f
default for maximum children pre node is 174
pgte Jan 7, 2017
180b808
by default, only reduces one leaf to self if specific option is present
pgte Jan 8, 2017
937c292
test results reflect new default config
pgte Jan 8, 2017
0f706df
testing against big files genearted from a pseudo random byte stream gen
pgte Jan 9, 2017
0d3602e
added missing dep
pgte Jan 9, 2017
973c483
removed unnecessary dev dependency
pgte Jan 10, 2017
67fbf87
go-ipfs parity: no root node with single leaf
pgte Jan 10, 2017
ff6cce5
docs: corrected the default maximum number of children nodes
pgte Jan 10, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,19 @@ been written into the [DAG Service][]'s storage mechanism.
The input's file paths and directory structure will be preserved in the DAG
Nodes.

### Importer options

In the second argument of the importer constructor you can specify the following options:

* `chunker` (string, defaults to `"fixed"`): the chunking strategy. Now only supports `"fixed"`
* `chunkSize` (positive integer, defaults to `262144`): the maximum chunk size for the `fixed` chunker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have a specific chunkerOptions object, which has in the case for the fixed chunker an option of size, as for other chunkers this might not make sense as an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dignifiedquire Implemented your suggestion.

* `strategy` (string, defaults to `"balanced"`): the DAG builder strategy name. Supports:
* `flat`: flat list of chunks
* `balanced`: builds a balanced tree
* `trickle`: builds [a trickle tree](https://github.com/ipfs/specs/pull/57#issuecomment-265205384)
* `maxChildrenPerNode` (positive integer, defaults to `172`): the maximum children per node for the `balanced` and `trickle` DAG builder strategies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now defaults to 174

* `layerRepeat` (positive integer, defaults to 4): (only applicable to the `trickle` DAG builder strategy). The maximum repetition of parent nodes for each layer of the tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dignifiedquire wanna push a commit with jsdoc to this PR so that unixfs-engine gets fresh docs with this refactor? Or perhaps just to a PR to the PR, that would be also good :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this as is and we can add the documentation in a later PR, other wise if @pgte wants to take a stab he can start adding this info as jsdoc as well. It doesn't have to be me to do that.


### Example Exporter

Expand Down
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
"aegir": "^9.3.0",
"buffer-loader": "0.0.1",
"chai": "^3.5.0",
"deep-extend": "^0.4.1",
"fs-pull-blob-store": "^0.4.1",
"idb-pull-blob-store": "^0.5.1",
"ipfs-block-service": "^0.8.0",
Expand All @@ -59,6 +60,7 @@
"ipld-resolver": "^0.4.1",
"is-ipfs": "^0.2.1",
"multihashes": "^0.3.1",
"pull-batch": "^1.0.0",
"pull-block": "^1.0.2",
"pull-paramap": "^1.2.1",
"pull-pushable": "^2.0.1",
Expand All @@ -76,4 +78,4 @@
"jbenet <juan@benet.ai>",
"nginnever <ginneversource@gmail.com>"
]
}
}
64 changes: 64 additions & 0 deletions src/builder/balanced/balanced-reducer.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
'use strict'

const assert = require('assert')
const pull = require('pull-stream')
const pullWrite = require('pull-write')
const pushable = require('pull-pushable')
const batch = require('pull-batch')

module.exports = function balancedReduceToRoot (reduce, options) {
const source = pushable()

const sink = pullWrite(
function (item, cb) {
source.push(item)
cb()
},
null,
1,
function (end) {
source.end(end)
}
)

const result = pushable()

reduceToParents(source, function (err, roots) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use arrow functions for unnamed functions, to keep consistent with the rest of our code base

if (err) {
result.emit('error', err)
return // early
}
assert.equal(roots.length, 1, 'no root')
result.push(roots[0])
result.end()
})

function reduceToParents (_chunks, callback) {
let chunks = _chunks
if (Array.isArray(chunks)) {
chunks = pull.values(chunks)
}

pull(
chunks,
batch(options.maxChildrenPerNode),
pull.asyncMap(reduce),
pull.collect(reduced)
)

function reduced (err, roots) {
if (err) {
callback(err)
} else if (roots.length > 1) {
reduceToParents(roots, callback)
} else {
callback(null, roots)
}
}
}

return {
sink: sink,
source: result
}
}
12 changes: 12 additions & 0 deletions src/builder/balanced/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
'use strict'

const balancedReducer = require('./balanced-reducer')

const defaultOptions = {
maxChildrenPerNode: 172
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment with a link to https://github.com/ipfs/go-ipfs/blob/master/importer/helpers/helpers.go#L16-L35 so that we remember where this value comes from


module.exports = function (reduce, _options) {
const options = Object.assign({}, defaultOptions, _options)
return balancedReducer(reduce, options)
}
126 changes: 126 additions & 0 deletions src/builder/builder.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
'use strict'

const assert = require('assert')
const UnixFS = require('ipfs-unixfs')
const pull = require('pull-stream')
const parallel = require('async/parallel')
const waterfall = require('async/waterfall')
const dagPB = require('ipld-dag-pb')
const CID = require('cids')

const reduce = require('./reduce')

const DAGNode = dagPB.DAGNode

const defaultOptions = {
chunkSize: 262144
// chunkSize: 26214
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with this lonely commented out option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was used in an experiment and had forgotten to remove it. Removed now.

}

module.exports = function (Chunker, ipldResolver, Reducer, _options) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use uppercase variables only for constructors, and if they are constructors call them with new.

const options = Object.assign({}, defaultOptions, _options)

return function (source, files) {
return function (items, cb) {
parallel(items.map((item) => (cb) => {
if (!item.content) {
// item is a directory
return createAndStoreDir(item, (err, node) => {
if (err) {
return cb(err)
}
source.push(node)
files.push(node)
cb()
})
}

// item is a file
createAndStoreFile(item, (err, node) => {
if (err) {
return cb(err)
}
source.push(node)
files.push(node)
cb()
})
}), cb)
}
}

function createAndStoreDir (item, callback) {
// 1. create the empty dir dag node
// 2. write it to the dag store

const d = new UnixFS('directory')
waterfall([
(cb) => DAGNode.create(d.marshal(), cb),
(node, cb) => {
ipldResolver.put({
node: node,
cid: new CID(node.multihash)
}, (err) => cb(err, node))
}
], (err, node) => {
if (err) {
return callback(err)
}
callback(null, {
path: item.path,
multihash: node.multihash,
size: node.size
})
})
}

function createAndStoreFile (file, callback) {
if (Buffer.isBuffer(file.content)) {
file.content = pull.values([file.content])
}

if (typeof file.content !== 'function') {
return callback(new Error('invalid content'))
}

const reducer = Reducer(reduce(file, ipldResolver), options)

pull(
file.content,
Chunker(options),
pull.map(chunk => new Buffer(chunk)),
pull.map(buffer => new UnixFS('file', buffer)),
pull.asyncMap((fileNode, callback) => {
DAGNode.create(fileNode.marshal(), (err, node) => {
callback(err, { DAGNode: node, fileNode: fileNode })
})
}),
pull.asyncMap((leaf, callback) => {
ipldResolver.put(
{
node: leaf.DAGNode,
cid: new CID(leaf.DAGNode.multihash)
},
err => callback(err, leaf)
)
}),
pull.map((leaf) => {
return {
path: file.path,
multihash: leaf.DAGNode.multihash,
size: leaf.DAGNode.size,
leafSize: leaf.fileNode.fileSize(),
name: ''
}
}),
reducer,
pull.collect((err, roots) => {
if (err) {
callback(err)
} else {
assert.equal(roots.length, 1, 'should result in exactly one root')
callback(null, roots[0])
}
})
)
}
}
30 changes: 30 additions & 0 deletions src/builder/create-build-stream.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
'use strict'

const pullPushable = require('pull-pushable')
const pullWrite = require('pull-write')

module.exports = function createBuildStream (createStrategy, ipldResolver, flushTree, options) {
const files = []

const source = pullPushable()

const sink = pullWrite(
createStrategy(source, files),
null,
options.highWaterMark,
(err) => {
if (err) {
return source.end(err)
}

flushTree(files, ipldResolver, source, () => {
source.end()
})
}
)

return {
source: source,
sink: sink
}
}
46 changes: 46 additions & 0 deletions src/builder/flat/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
'use strict'

const pull = require('pull-stream')
const pushable = require('pull-pushable')
const pullWrite = require('pull-write')
const batch = require('pull-batch')

module.exports = function (reduce, options) {
const source = pushable()
const sink = pullWrite(
function (d, cb) {
source.push(d)
cb()
},
null,
1,
function (err) {
if (err) {
source.emit('error', err)
} else {
source.end()
}
}
)

const result = pushable()

pull(
source,
batch(Infinity),
pull.asyncMap(reduce),
pull.collect(function (err, roots) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow function

if (err) {
result.emit('error', err)
return // early
}
result.push(roots[0])
result.end()
})
)

return {
sink: sink,
source: result
}
}
32 changes: 32 additions & 0 deletions src/builder/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
'use strict'

const assert = require('assert')
const createBuildStream = require('./create-build-stream')
const Builder = require('./builder')

const reducers = {
flat: require('./flat'),
balanced: require('./balanced'),
trickle: require('./trickle')
}

const defaultOptions = {
strategy: 'balanced',
highWaterMark: 100
}

module.exports = function (Chunker, ipldResolver, flushTree, _options) {
assert(Chunker, 'Missing chunker creator function')
assert(ipldResolver, 'Missing IPLD Resolver')
assert(flushTree, 'Missing flushTree argument')

const options = Object.assign({}, defaultOptions, _options)

const strategyName = options.strategy
const reducer = reducers[strategyName]
assert(reducer, 'Unknown importer build strategy name: ' + strategyName)

const createStrategy = Builder(Chunker, ipldResolver, reducer, options)

return createBuildStream(createStrategy, ipldResolver, flushTree, options)
}
Loading