Skip to content
This repository has been archived by the owner on Aug 12, 2020. It is now read-only.

Builder refactoring, trickle builder and balanced builder #118

Merged
merged 30 commits into from
Jan 11, 2017

Conversation

pgte
Copy link
Contributor

@pgte pgte commented Dec 28, 2016

Refactored the builder constructors and streams.
A builder now is a transform stream that reduces into one root. (some opportunities for externalising these reducers into their own packages).
Each builder can be parametrised by options.
Documented those options.
Client can also choose which builder to use (now defaults to balanced builder).
Adds individual tests for each reducer stream to make sure it's yielding the correct tree structure.

By the way,
Can now easily add a different chunker.
Client can choose chunker strategy using option. Defaults and only supports fixed chunker for now.

Please review.

@RichardLitt RichardLitt added the status/in-progress In progress label Dec 28, 2016
@pgte pgte requested a review from daviddias December 28, 2016 17:18
In the second argument of the importer constructor you can specify the following options:

* `chunker` (string, defaults to `"fixed"`): the chunking strategy. Now only supports `"fixed"`
* `chunkSize` (positive integer, defaults to `262144`): the maximum chunk size for the `fixed` chunker.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to have a specific chunkerOptions object, which has in the case for the fixed chunker an option of size, as for other chunkers this might not make sense as an option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dignifiedquire Implemented your suggestion.


const defaultOptions = {
chunkSize: 262144
// chunkSize: 26214
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with this lonely commented out option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was used in an experiment and had forgotten to remove it. Removed now.

@dignifiedquire
Copy link
Contributor

@pgte good stuff, one thing I'm a bit worried about, is the usage of pushable in a quite a lot of places. I haven't gone through all of them but I think some of them could be removed in favor of more direct transforms. The same goes for the use of pullWritable, as that forces you to use something like pushable if you want to use it in a transform stream.

I haven't found simple solutions for where you used these, so this might be the best way for these use cases, but maybe not.

@dominictarr maybe you have some good ideas about how the use of pushable could be avoided in these situations.

@dominictarr
Copy link

instead of using pull-write to write to pull-pushable,
i think what you can do here is use either pull-defer,
or pull-pair because what it looks like you are doing is creating a source stream that initially doesn't output anything, until later when you get the real source. They way you are doing it with pushable looses back pressure, but with pull-pair or pull-defer or maybe pull-cont then you can do the same thing but still have back pressure.

@pgte
Copy link
Contributor Author

pgte commented Dec 30, 2016

@dominictarr thanks, good point about the loss of back-pressure.
I'm now using pull-pair instead of the bespoke back-pressureless version.

Copy link
Contributor

@daviddias daviddias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid! I've little or no comments codewise, big thumbs up for the refactor 👍

Couple of requests:

  • We need a couple of exporter tests for the new added importer strategies
  • We need a test to make sure that if several roots are passed to one importer instance, it errors out (because it only knows how to return one root). This is a test that was missing.

* `trickle`: builds [a trickle tree](https://github.com/ipfs/specs/pull/57#issuecomment-265205384)
* `maxChildrenPerNode` (positive integer, defaults to `172`): the maximum children per node for the `balanced` and `trickle` DAG builder strategies
* `layerRepeat` (positive integer, defaults to 4): (only applicable to the `trickle` DAG builder strategy). The maximum repetition of parent nodes for each layer of the tree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dignifiedquire wanna push a commit with jsdoc to this PR so that unixfs-engine gets fresh docs with this refactor? Or perhaps just to a PR to the PR, that would be also good :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can merge this as is and we can add the documentation in a later PR, other wise if @pgte wants to take a stab he can start adding this info as jsdoc as well. It doesn't have to be me to do that.


const result = pushable()

reduceToParents(source, function (err, roots) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use arrow functions for unnamed functions, to keep consistent with the rest of our code base

}
}

module.exports = function (Chunker, ipldResolver, Reducer, _options) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use uppercase variables only for constructors, and if they are constructors call them with new.

source,
batch(Infinity),
pull.asyncMap(reduce),
pull.collect(function (err, roots) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow function

}

describe('balanced builder', () => {
it('reduces one value into itself', callback => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use parens around the args for arrow functions

)
})

it('reduces 3 values into parent', callback => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parens

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this to the aegir-lint

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always thought standard was linting for this, which is why I adopted it in the first place

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, given this standard/eslint-config-standard#54 (comment) gigantic history I don't feel like touching it tbh

@pgte
Copy link
Contributor Author

pgte commented Jan 3, 2017

@diasdavid both requested changes are done via:

  1. e723586
  2. 0036314

@pgte
Copy link
Contributor Author

pgte commented Jan 4, 2017

@diasdavid there are also some opportunities to create more independent packages from this, namely the pull streams that generate a balanced tree and the trickle tree..

@@ -163,7 +163,7 @@ In the second argument of the importer constructor you can specify the following
* `trickle`: builds [a trickle tree](https://github.com/ipfs/specs/pull/57#issuecomment-265205384)
* `maxChildrenPerNode` (positive integer, defaults to `172`): the maximum children per node for the `balanced` and `trickle` DAG builder strategies
* `layerRepeat` (positive integer, defaults to 4): (only applicable to the `trickle` DAG builder strategy). The maximum repetition of parent nodes for each layer of the tree.

* `reduceSingleLeafToSelf` (boolean, defaults to `false`): optimization for, when reducing a set of nodes with one node, reduce it to that node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌🏽

@pgte pgte changed the title WIP: Builder refactoring, trickle builder and balanced builder Builder refactoring, trickle builder and balanced builder Jan 9, 2017
@@ -3,7 +3,7 @@
const balancedReducer = require('./balanced-reducer')

const defaultOptions = {
maxChildrenPerNode: 172
maxChildrenPerNode: 174
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a comment with a link to https://github.com/ipfs/go-ipfs/blob/master/importer/helpers/helpers.go#L16-L35 so that we remember where this value comes from

* `flat`: flat list of chunks
* `balanced`: builds a balanced tree
* `trickle`: builds [a trickle tree](https://github.com/ipfs/specs/pull/57#issuecomment-265205384)
* `maxChildrenPerNode` (positive integer, defaults to `172`): the maximum children per node for the `balanced` and `trickle` DAG builder strategies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now defaults to 174

@daviddias daviddias merged commit c1acd5b into master Jan 11, 2017
@daviddias daviddias deleted the feat/builders branch January 11, 2017 13:00
@daviddias daviddias removed the status/in-progress In progress label Jan 11, 2017
@daviddias
Copy link
Contributor

🎉🎉🎉🎉🎉🎉🎉🎉

THIS IS HAPPENING :D

Thank you @pgte ❤️❤️❤️❤️❤️

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants