Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Add Performance W/ Async Datastores #6775

Closed
3 of 4 tasks
Stebalien opened this issue Dec 3, 2019 · 5 comments
Closed
3 of 4 tasks

Improve Add Performance W/ Async Datastores #6775

Stebalien opened this issue Dec 3, 2019 · 5 comments
Labels

Comments

@Stebalien
Copy link
Member

Stebalien commented Dec 3, 2019

fixes #6523

@aschmahmann
Copy link
Contributor

aschmahmann commented Dec 18, 2019

Proposal for where to add Sync calls in go-ipfs (and related libraries)

The overall plan is to determine the places where a user would expect data to be persisted to disk and make sure we've called sync then such that a crash.

If the data that needs to be persisted is IPLD blocks then we should sync the entire blockstore, and if the data could be in the filestore then we should sync the entire filestore as well.

  • Pinning: Any modification of the pin set
    • During Pin Flush:
      • Start: assume it's possible new DAG nodes have been added to the blockstore
      • Create new DAG containing the pinset, add to blockstore
      • Sync the blockstore because of potentially new DAG nodes added during a sync
      • Sync the blockstore because of potential changes to the DAG storing the pinset
      • Sync the datastore key that points to the root of the pin set
  • Adding data to IPFS: After an add operation is completed the data should be synced
    • Data can potentially be added to either the filestore or blockstore
    • Add all data into the blockstore or filestore
    • Call Sync on the blockstore + filestore
    • [Optional] Pinning may be called after adding (see above)
  • MFS: Anytime the root is modified
    • Data can be added that is in either of the filestore or blockstore
    • Periodically the MFS Republisher's publish function (pubfunc) is called
    • During the publish function
      • Sync the blockstore + filestore because of potentially new DAG nodes added during a sync
        • Because MFS has a different API then Unixfs.Add it may have added nodes without utilizing the code paths above
      • Sync the datastore key that points to the MFS root
  • IPNS: When ipnsPublisher.Publish() is called (before any network calls are made)
    • Create the IPNS record
    • Put the IPNS record locally into the datastore
    • Sync the datastore key that points to the IPNS record

@Stebalien
Copy link
Member Author

So, the goal was to improve add performance on hard disks. However, testing this on my SSD, I'm still seeing a 2x speedup. That is, 145MB/s versus 73MB/s.

👏👏👏

@momack2
Copy link
Contributor

momack2 commented Jan 7, 2020

Do we still plan to switch to badger by default, or is this item considered closed (at least, as far as the 0.5 release is concerned in #6776?

@Stebalien
Copy link
Member Author

@momack2 I've updated the meta issue. You're right, this is "closed" as far as 0.5.0 is concerned.

@Stebalien
Copy link
Member Author

This issue isn't really helping anything so I'm closing it. We still want to switch to a different datastore, but, well, details in the badger issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants