Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'nosync' option #1616

Closed
davidar opened this issue Aug 28, 2015 · 31 comments
Closed

Add 'nosync' option #1616

davidar opened this issue Aug 28, 2015 · 31 comments
Assignees
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/repo Topic repo

Comments

@davidar
Copy link
Member

davidar commented Aug 28, 2015

From #1324:

@barnacs: I guess the only viable option is then to reconsider durability guarantees and ease up on sync() calls.
@davidar: In the meantime, would it be possible to add a --nosync option to ipfs add that just disables sync calls? You could add a warning that using it may result in inconsistencies in the blockstore, so people know the risks.
@whyrusleeping: @davidar that would work for offline adds, but if youre doing it with the daemon online, you would have the start the daemon with such an option. I think the ideal stopgap would be to add a nosync field to the config for the datastore in .ipfs/config

@jbenet
Copy link
Member

jbenet commented Aug 28, 2015

yeah this option would be nice.

@whyrusleeping
Copy link
Member

I can add it pretty easily by patching flatfs in dev0.4.0 (where we actually respect the config for datastores)

@whyrusleeping whyrusleeping added kind/enhancement A net-new feature or improvement to an existing feature topic/repo Topic repo labels Aug 28, 2015
@davidar
Copy link
Member Author

davidar commented Aug 31, 2015

Thanks. Performance of ipfs add is a major issue for ipfs/archives

@whyrusleeping
Copy link
Member

@davidar i'll set up a branch for you with nosync hardcoded so you can work faster.

@whyrusleeping whyrusleeping self-assigned this Sep 2, 2015
@davidar
Copy link
Member Author

davidar commented Sep 2, 2015

@whyrusleeping much appreciated :)

@whyrusleeping
Copy link
Member

@davidar
Copy link
Member Author

davidar commented Sep 4, 2015

@whyrusleeping I'm getting the following error when trying to build (on both temp-nosync and dev0.4.0 branches):

./daemon.go:207: multiple-value repo.Config() in single-value context

@rht
Copy link
Contributor

rht commented Sep 4, 2015

@davidar
Copy link
Member Author

davidar commented Sep 4, 2015

Thanks @rht

@davidar
Copy link
Member Author

davidar commented Sep 4, 2015

@whyrusleeping @rht Hmm, no dice. The temp-nosync branch is still painfully slow trying to add a directory with lots of small files (eta approaching 400h).

@rht
Copy link
Contributor

rht commented Nov 7, 2015

Dependency: ipfs/go-datastore#30.

(sync) up to 1000 files with each 1KB
outdata_sync_1kb

(nosync) up to 1000 files with each 1KB
outdata_nosync_1kb

(nosync) up to 5000 files with each 1KB (the bottleneck hasn't been characterized yet)
outdata

@jbenet
Copy link
Member

jbenet commented Nov 8, 2015

@rht only git is fair comparison. other's not really.

but anyway, sure. let's add it both:

  • as a global config value (easy)
  • on a command (imagine ipfs add --no-sync) (harder)

@davidar
Copy link
Member Author

davidar commented Nov 9, 2015

@jbenet git still beats the pants off ipfs though (if I squint, I can almost see the line for git :p)

@rht
Copy link
Contributor

rht commented Nov 9, 2015

Here is one with darcs and sqlite added.

(sync) up to 500 files each 1KB
outdata

(nosync) up to 500 files each 1KB without sqlite
outdata

Since offline add is equivalent to creating ipfs archive format, I think this benchmark is fair. Due to random files, no deduplication / loose object packing is involved. git was often compared with cp / rsync in the past (though git fetch instead of git add), hence is included here.

iirc the next bottleneck is protobuf Marshal, but I have to check again.

@jbenet
Copy link
Member

jbenet commented Nov 10, 2015

nice. @rht want to shepherd these changes? we need:

  • datastore support
  • global config value (takes effect then daemon starts) (easy)
  • on ipfs add (per command run) (hard, not necessary)

@rht
Copy link
Contributor

rht commented Nov 10, 2015

This is a major bottleneck to the archiving effort (otherwise ipfs archive could have meshed with ia sooner).

It's rather I'm writing the change (global config one) right away after you merge the sync flag in go-datastore. ipfs add --no-sync is hard when the daemon is on.

@rht
Copy link
Contributor

rht commented Nov 10, 2015

(and on top of dev0.4.0 after dev0.4.0 rebase, since it has a lot of datastore changes)

@davidar
Copy link
Member Author

davidar commented Nov 10, 2015

This is a major bottleneck to the archiving effort

Even with nosync, there's still a major bottleneck somewhere :/

@rht
Copy link
Contributor

rht commented Nov 10, 2015

Yes but at least faster than sync-sqlite.

Here is a breakdown of why things are slow:
ipfs add -r -q Godeps

git: 278ms
sync: addFile 67.480s

no-sync: addFile 918ms
  add 286ms
    importer.BuildDagFromReader 284ms
      bal.BalancedLayout 283ms
        db.Add 170ms (helpers.DagBuilderHelper)
          dagservice.Add 159ms
  addNode 432ms
    InsertNodeAtPath 491ms
      root.GetLinkedNode 370ms
        n.GetNodeLink 384ms
      dagservice.Add 136ms

Bottleneck of both add and addNode converges to dagservice.Add:

dagservice.Add 386ms
  nd.Encoded(false) 174ms
    sort.Stable 17ms
    n.Marshal 35.6 ms
    u.Hash 139ms
  n.Blocks.AddBlock 205ms
    s.Blockstore.Put 182ms
      block.Key().DsKey() 14ms
      bs.datastore.Has 63ms
      bs.datastore.Put 118ms

@rht
Copy link
Contributor

rht commented Nov 10, 2015

The slowest part that can be optimized is perhaps n.GetNodeLink, the link search (basically to get the hash of the folder) can possibly be cached.
Also, for every single file insert, does the hash of the folder containing it has to be recomputed?

What about InsertNodesAtPath for inserting several nodes at once?

@davidar
Copy link
Member Author

davidar commented Nov 11, 2015

Also, for every single file insert, does the hash of the folder containing it has to be recomputed?

It seems like a zipper could help here, which allows efficient traversal and mutation of persistent datastructures (like merkledags). This is what @argonaut-io uses, for example.

@rht
Copy link
Contributor

rht commented Nov 16, 2015

Worth implementing, I think. It appears that there exists a filesystem based on zipper (http://okmij.org/ftp/continuations/zipper.html).

The merkle version of zipper is possible if the root hash and the sub-root hashes along the path to the nodes (node(s) for the case of concurrent mutation) aren't precomputed. In go-ipfs, indeed the hashes are computed only when the nodes are about to be committed to disk.

(@davidar now you don't have to squint #1964 (comment))

@rht
Copy link
Contributor

rht commented Nov 16, 2015

(@whyrusleeping perhaps zipper could be a name for what you requested in https://github.com/ipfs/go-ipfs/blob/master/unixfs/mod/dagmodifier.go#L36 ?)

@whyrusleeping
Copy link
Member

@rht i forgot about that comment, something along the lines of zipper sounds good to me there

@jbenet
Copy link
Member

jbenet commented Nov 17, 2015

zipper 👍

@davidar
Copy link
Member Author

davidar commented Nov 18, 2015

Worth implementing, I think. It appears that there exists a filesystem based on zipper (http://okmij.org/ftp/continuations/zipper.html).

Haha, of course Oleg would have written such a thing

@ghost
Copy link

ghost commented Nov 24, 2015

@rht I want to deploy nosync to castor.i.ipfs.io, which branch should I use? Is there one based on dev0.4.0?

@rht
Copy link
Contributor

rht commented Nov 24, 2015

@lgierth https://github.com/ipfs/go-ipfs/tree/dev0.4.0 contains nosync (by @whyrusleeping since 3 days ago). Sorry should have notified.

@ghost
Copy link

ghost commented Nov 24, 2015

lovely :)

@jbenet
Copy link
Member

jbenet commented Nov 30, 2015

is this issue good to be closed, then?

@ghost
Copy link

ghost commented Nov 30, 2015

ipfs add with "NoSync": true is nice and fast on the dev0.4.0 hosts (castor, pollux, pluto)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature topic/repo Topic repo
Projects
None yet
Development

No branches or pull requests

4 participants