Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorangized and implementation of a basic dag store #2

Merged
merged 7 commits into from Nov 13, 2015

Conversation

travisperson
Copy link
Member

Quick reorganization.

Implementation of a simple dag backed store.

I'm not a huge fan of the way this is currently setup, so this PR Is a work-in-process. A few things I want to change:

See if there is a way to wrap streams in bytes.
Right now we have to buffer all the data before we can add the dag. I think it would be really neat to have a stream that is wrapped in bytes.

Name spaced keys
The cool thing about the dag is that "leafs" (or in this key values) are still dags which can contain links. So we can have a key that is foo, which resolves to some value, as well as a key foo/bar, which would resolve to something different. foo would be the root hash of bar.

Note: Right now I store the last root hash in the data field of the "root", this is mostly because if you try to remove a link from a dag with a single link (a dag with zero links and no data) ipfs does not like it. Also it's kind of cool because it provides history.

Quick reorganization.

Implementation of a simple dag backed store.
@jbenet jbenet mentioned this pull request Jun 4, 2015
var stream = require('stream')
var multihash = require('multihashes')
var base58 = require('base58-native')
var merkledag = require('node-ipfs-mdag')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this module coming from? symlink to node-ipfs or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not suppose to be there, I was using it, and might go back to it, but for now it wasn't doing what I wanted. I guess I forgot to delete it.

@travisperson
Copy link
Member Author

Wrapped streams might work with: https://github.com/nfroidure/StreamQueue.

Idea is to create a front stream which basically contains:

{
    Links: [{
              Name: "",
              Hash: "",
              Size: 0,
    }...],
    Data: "

The actually data stream would come in the middle here, and then the tailing stream

    "
}

I have no idea how well this will work with JSON, but it should work really nicely with protobufs.

@whyrusleeping
Copy link

The problem with using protobufs for that, is that nested fields are always length delimited, you could do repeated bytes as the type, which would allow you to continue sending bytes as much as you want, but then protobuf doesnt have a way to signal that youre done sending things.

@travisperson
Copy link
Member Author

Ya, after thinking it for a bit I kind of decided it's not really a great idea, and it's not hard to buffer something small like this either way.

if (typeof opts === 'string') opts = {key: opts}

var bufferStream = new stream.PassThrough()
node.object.data((opts.root || root_hash) + '/' + opts.key, function (err, stream) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want node.object.get here, right? If the write stream stores a JSON object, the read stream should probably output one as well.

@daviddias
Copy link
Member

@travisperson can I get some enlightenment on how you structured this? Namely, is the code in the PR running and passing some of the tests? I'm trying to but unsuccessfully:

TAP version 13
# piping a blob into a blob write stream
ok 1 no setup err
/Users/david/Documents/code/ipfs/ip-npm/ipfs-blob-store/lib/dag.js:47
          node.object.stat(res.Hash, function(err, stat) {
                      ^

TypeError: node.object.stat is not a function
    at /Users/david/Documents/code/ipfs/ip-npm/ipfs-blob-store/lib/dag.js:47:23

And btw, there is a bunch of codestyle errors, being the use of blob as global variable being the most concerning, not sure if it is intended

lib/dag.js|39 col 7 warning| "blob" is not defined. (no-undef)
lib/dag.js|40 col 7 warning| "blob" is not defined. (no-undef)
lib/dag.js|43 col 47 warning| "blob" is not defined. (no-undef)

Also, what is the purpose of block.js? Some early experiment? Is it still needed?

@travisperson
Copy link
Member Author

What version of ipfs-api is actually installed? It would appear that it's missing the stat method on object. If you npm install ipfs-api@1.1.7 you can verify that it's there.

As far the blob, it's definitely not suppose to be global, just missing a var statement.

@daviddias
Copy link
Member

I had 1.2.0 (installed following the package.json semver)

$ head node_modules/ipfs-api/package.json                                                                
{
  "name": "ipfs-api",
  "version": "1.2.0",

But now updated to 2.3.2 and all tests pass:

1..45
# tests 45
# pass  45

# ok

This is a good sign, right? :)

@daviddias
Copy link
Member

What about block.js? Doesn't look like it is used for anything

@travisperson
Copy link
Member Author

See: https://github.com/ipfs/ipfs-blob-store/blob/feat/dag-store/index.js#L14

They are two different implementations. This isn't a fully featured blob store. It was a quick hack together to get something working. I have a third that I hacked together a while back that implements a blob store using unixfs via a patch system, similar to @whyrusleeping patch/update code (whatever it is called now).

@daviddias
Copy link
Member

I'm confused by what is called a 'fully featured blob store', when both of these are passing 100% of the tests. I was expecting that anything that implements abstract-blob-store interface and passes the tests would mean that is 'fully' compatible.

Would this be 'enough' to unblock ipfs/notes#2 ? If not, what is missing?

Also, if we have 3 implementations, can we list pros and cons? I guess that one of them is the fact that dag and block use the HTTP API, while the third one, not present in the repo, would use system calls and let IPFS handle the syncing.

Thanks :)

@travisperson
Copy link
Member Author

They are not passing 100% though are they? I haven't looked at this code in a while. Both the block and the dag implement the API, but I'm pretty sure if you run the tests, block will not run them all. I do not fully understand tape though. But it appears that it's not running through all tests. Block always returns false on a remove. Which to me would say that it would not be passing all tests.

Each of these have pitfalls, specifically the dag implementation only stores links to blobs on the root dag node, leading to an absolutely massive root object. The block implementation is simply storing basic block objects. It doesn't do anything fancy and since it's just a raw block it has an upper limit on it's size (if imposed by the backing daemon/api).

The third implementation is almost a pure unixfs implementation (I'd have to double check that), but in the blob store itself (does self patching of objects). Instead of sending objects to a daemon to patch and update, it handles that it self. Since it's a unixfs object, you can mount and traverse the blob store.

It's kind of in my head and I have a few ideas, tonight I can push the code up (as it's on my laptop and I don't have that with me at the moment). I'll be free in about 2 hours.

@jbenet
Copy link
Contributor

jbenet commented Sep 18, 2015

ok product of this etherpad: https://etherpad.mozilla.org/97sGEBwwkH

// what we want is the "dag store" in ipfs-blob-store
// to treat the "keys" it receives as a merkledag path,
// and more specifically, as a unixfs path. so something
// like this:
var dagStore = require('ipfs-blob-store').dagStore({ ... })

var ws = dagStore.createWriteStream("foo/bar/baz")
ws.write("hello")
ws.write(null)

// so the above should make a total of 4 objects:
// [ root ] ---foo--> [ A ] ---bar--> [ B ] ---baz-->  [  "hello" ]

// so that we can do it again:

var ws = dagStore.createWriteStream("foo/bar/quux")
ws.write("world")
ws.write(null)

// and only _adds one new object, updating the link in the third
// object, and bubbling the changes up. And it's all nicely 
// sharded as a filesystem:
// [ root2 ] ---foo--> [ A2 ] ---bar--> [ B2 ] ---baz-->  [ "hello" ]
//                                                            \--quux->  [ "world" ]

// ((NOTE: though of course, the change bubbles up the merkle
// dag, so technically, four objects are created. typical merkle
// dag update semantics. this means that the "dagStore" thing
// has to always keep the latest _root_.))

// all of this can be done with "ipfs object patch" in a concurrency
// safe manner.
o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ 
o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ o/ 


--------

// btw, the way it is now, it makes only TWO objects:
// [ root ] ---foo/bar/baz---> [ "hello" ]

// addint a link with name "foo/bar/baz" which does not scale at all,
// because the root object would get __enormous__.

@travisperson
Copy link
Member Author

So while using ipfs object patch is awesome, it's not realistic. You can only ever have a single mutation in flight since you must have the resulting object before you may perform the next operations. This is why I was performing the patch on the library side. I could keep track of mutations down the directory tree and only ever update a directory once all operations bubbled back up. (this is in v3, the one I worked on with you in Seattle).

@jbenet
Copy link
Contributor

jbenet commented Sep 18, 2015

@travisperson wait so this does handle mutations in a full dag? i thought this only has one massive root object? (i don't see the splitting on a "/", etc)

@jbenet
Copy link
Contributor

jbenet commented Sep 18, 2015

btw, in shell i would do this:

// keep root, initialized to:
root=$(ipfs object new unix-fs)
blobHash=(cat $data | ipfs add -q | tail -n1)
root=$(ipfs object patch $root add-link $dataPath $blobHash)

one thing i'm not sure on is whether ipfs object patch can take paths for $dataPath and bubble the changes up. we may need to look deeper into it and pull out the object we want. (like walk the dag component by component. this is what the "mfs/files" tool is supposed to do, but it hasn't been finalized or merged in go-ipfs. so may need to walk it manually if ipfs-object-patch doesn't do this right now)

@jbenet
Copy link
Contributor

jbenet commented Sep 18, 2015

update: apparently it does work with ipfs object patch now, just need to use --create.

@travisperson is right though that this can only have one update in flight at a time.

@travisperson
Copy link
Member Author

@travisperson wait so this does handle mutations in a dag? i thought this only has one massive root object?

The current dag implementation here has that pitfall, the other implementation I mentioned (never got around to pushing it sorry), does not have this issue. It takes the resulting hash of a blob and uses it to fan out over the directory structure. So a bit different than what you describe above, where it would be on the user to properly distribute their keys. Which is totally fine, and makes it so the user can specify keys easier.

@jbenet
Copy link
Contributor

jbenet commented Sep 18, 2015

(the best solution is to implement the mfs/files tool, which would allow concurrent access, ordered by the daemon itself)

@jbenet
Copy link
Contributor

jbenet commented Sep 19, 2015

As i see it we have four options:

  1. write the new implementation described by @travisperson above.
  2. use ipfs object patch --create (limitation: only one write in flight at a time)
  3. write a proper mfs/files tool in node
  4. finish and merge mfs/files tool in go (and expose them through the API too)

@travisperson
Copy link
Member Author

I'd say 1 and 3 are one and the same in a way. My 1 is basically mfs, or at least it would be very close to it if implemented properly.

@daviddias daviddias mentioned this pull request Sep 21, 2015
51 tasks
@daviddias
Copy link
Member

UPDATE

Now we have a more robust js-ipfs-api and a working version of ipfs-blob-store with mfs, which you can find in this PR: #5

@daviddias daviddias merged commit 7bee9a2 into master Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants