Improve Add Performance W/ Async Datastores #6775

Stebalien · 2019-12-03T17:30:03Z

Add async write support to datastores - Asynchronous Datastores go-datastore#137
Add Sync calls where appropriate (e.g., when pinning).
Turn syncwrites off on badger in go-ipfs by default.
Switch to badger by default - Make badger-ds the default datastore #4279.

The text was updated successfully, but these errors were encountered:

aschmahmann · 2019-12-18T15:25:01Z

Proposal for where to add Sync calls in go-ipfs (and related libraries)

The overall plan is to determine the places where a user would expect data to be persisted to disk and make sure we've called sync then such that a crash.

If the data that needs to be persisted is IPLD blocks then we should sync the entire blockstore, and if the data could be in the filestore then we should sync the entire filestore as well.

Pinning: Any modification of the pin set
- During Pin Flush:
  - Start: assume it's possible new DAG nodes have been added to the blockstore
  - Create new DAG containing the pinset, add to blockstore
  - Sync the blockstore because of potentially new DAG nodes added during a sync
  - Sync the blockstore because of potential changes to the DAG storing the pinset
  - Sync the datastore key that points to the root of the pin set
Adding data to IPFS: After an add operation is completed the data should be synced
- Data can potentially be added to either the filestore or blockstore
- Add all data into the blockstore or filestore
- Call Sync on the blockstore + filestore
- [Optional] Pinning may be called after adding (see above)
MFS: Anytime the root is modified
- Data can be added that is in either of the filestore or blockstore
- Periodically the MFS Republisher's publish function (pubfunc) is called
- During the publish function
  - Sync the blockstore + filestore because of potentially new DAG nodes added during a sync
    - Because MFS has a different API then Unixfs.Add it may have added nodes without utilizing the code paths above
  - Sync the datastore key that points to the MFS root
IPNS: When ipnsPublisher.Publish() is called (before any network calls are made)
- Create the IPNS record
- Put the IPNS record locally into the datastore
- Sync the datastore key that points to the IPNS record

Stebalien · 2020-01-03T22:20:08Z

So, the goal was to improve add performance on hard disks. However, testing this on my SSD, I'm still seeing a 2x speedup. That is, 145MB/s versus 73MB/s.

👏👏👏

momack2 · 2020-01-07T07:47:56Z

Do we still plan to switch to badger by default, or is this item considered closed (at least, as far as the 0.5 release is concerned in #6776?

Stebalien · 2020-01-07T15:23:34Z

@momack2 I've updated the meta issue. You're right, this is "closed" as far as 0.5.0 is concerned.

Stebalien · 2021-04-22T22:11:32Z

This issue isn't really helping anything so I'm closing it. We still want to switch to a different datastore, but, well, details in the badger issue.

Stebalien added the epic label Dec 3, 2019

Stebalien assigned Stebalien and aschmahmann Dec 3, 2019

Stebalien added the topic/meta Topic meta label Dec 3, 2019

Stebalien mentioned this issue Dec 3, 2019

Road to 0.5 #6776

Closed

21 tasks

This was referenced Dec 5, 2019

Support Async Datastores MichaelMure/go-ipfs-pinner#1

Closed

Support Async Datastores ipfs/go-ipfs-pinner#1

Merged

Support Asynchronous Datastores #6785

Merged

aschmahmann mentioned this issue Dec 16, 2019

profile: badger profile now defaults to asynchronous writes ipfs/go-ipfs-config#60

Merged

Stebalien closed this as completed Apr 22, 2021

Stebalien unassigned Stebalien and aschmahmann Apr 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Add Performance W/ Async Datastores #6775

Improve Add Performance W/ Async Datastores #6775

Stebalien commented Dec 3, 2019 •

edited

aschmahmann commented Dec 18, 2019 •

edited

Stebalien commented Jan 3, 2020

momack2 commented Jan 7, 2020

Stebalien commented Jan 7, 2020

Stebalien commented Apr 22, 2021

Improve Add Performance W/ Async Datastores #6775

Improve Add Performance W/ Async Datastores #6775

Comments

Stebalien commented Dec 3, 2019 • edited

aschmahmann commented Dec 18, 2019 • edited

Proposal for where to add Sync calls in go-ipfs (and related libraries)

Stebalien commented Jan 3, 2020

momack2 commented Jan 7, 2020

Stebalien commented Jan 7, 2020

Stebalien commented Apr 22, 2021

Stebalien commented Dec 3, 2019 •

edited

aschmahmann commented Dec 18, 2019 •

edited