Pinning is slow when there are many pins #5221

Stebalien · 2018-07-13T08:34:56Z

We store all pins in a single massive object so adding and removing pins is really slow when we have many pins.

This affects:

ipfs dag add --pin=true
ipfs add
ipfs pin add
ipfs pin rm

Listing pins also appears to be slow but for a different reason:

ipfs pin ls buffers pins in memory before sending them back to the user (see pin ls should stream the result #6304).
ipfs pin ls lists all pinned blocks, directly or indirectly, by default. Calling ipfs pin ls --type=recursive is much faster.

Proposed solutions:

Use MFS for all pins: Store pins in MFS #4675
Named pins: Named pins & pins stored in datastore #4757, Moving away from current pin system #4763

I'm filing this issue so we can have a single issue that succinctly describes the entire issue and all variants.

The text was updated successfully, but these errors were encountered:

ghost · 2018-07-13T18:21:01Z

Also sounds like another use case for an embedded graph database

Stebalien · 2018-07-18T15:28:06Z

Not really. We have MFS, we can just use that. The current blockers are:

Private content (meh, IPFS doesn't really have this at the moment anyways).
Fancy pins. Currently, the only "fancy pin" we have is the "direct" pin (non-recursive).
Background "fetch" jobs.
The ability to wait on the background fetch job.

Unfortunately, this'll only get worse as we hack in new pin types for cluster. We need some way to specify (in unixfs) how a file/directory should be pinned (where pin policies higher up the directory tree take precedence).

Stebalien · 2018-07-31T16:11:33Z

The current thought here is to introduce an intermediate fix that stores pins in go-ipld-hamt. Blockers:

Make the refmt version of go-ipld-cbor correctly handle CIDs.
Merge the refmt patches into go-ipld-cbor.
Finish up go-ipld-hampt.

ivan386 · 2018-08-02T18:39:35Z

Maybe need just use read only or archive flag for pined blocks in the underline file system?

Stebalien · 2018-08-02T20:25:27Z

Unfortunately, it's not quite that simple. Pinning happens at a higher layer and not all of our datastores store one file per block.

pjz · 2018-08-23T14:41:34Z

This is also causing an issue with monitoring over at netdata/netdata#3156 as it makes a lot of 'ipfs pin ls' calls.

bonedaddy · 2018-08-24T05:48:06Z

Is the pinset object stored and read/written on disk when operations are performed? If so wouldn't it be possible to load the object into memory and read/write to there to get high performance IO with memory access? You could copy the object to disk as a backup, but you wouldn't incur expensive read operations as you are reading from the in-memory object. This would serve as a reasonable intermediate fix until the pinning system at large is reworked.

Stebalien · 2018-08-28T17:19:18Z

Reading is fast, we store the pinset in memory. The slow part is flushing to disk.

pjz · 2018-08-30T14:38:34Z

Why would an 'ipfs pin ls' flush to disk? I think there must be something else going on, if the netdata guys are seeing an inordinate load due to an 'ipfs pin ls' being sent once every 5s or so.

Stebalien · 2018-08-30T16:04:58Z

pin ls by default lists all indirectly pinned objects (children of recursive pins). I don't know why it does this by default but it does...

You can list pins you added by running ipfs pin --type=recursive; ipfs pin --type=direct.

pjz · 2018-08-30T20:31:10Z

That still doesn't answer why the guys over on netdata/netdata#3156 are seeing massive IPFS resource usage when 1) they have a large repo (several thousand objects) and 2) they turn on monitoring (which does an 'ipfs pin ls' every few seconds). Is there instrumentation they could turn on?

Stebalien · 2018-08-30T21:10:00Z

It's listing every single object (block) that has been pinned. It's consuming a ton of ram because we, unfortunately, create a list of pins in-memory before returning them to the client. We should fix* (the second part) this but doing so will be a breaking API change so we'll have to be careful.

Stebalien · 2018-08-30T21:15:25Z

It's also probably garbage collecting a bunch (we're working on some fixes to CIDs that'll make them allocate less but that's still in progress).

bonedaddy · 2018-08-30T22:15:11Z

@pjz So on my own nodes to avoid having to constantly poll IPFS and incur slow performance from examining the pinset, I maintain a database which contains an exact copy of the pins my IPFS nodes currently have. Any updates that would effect the pinset must also update the database.

By doing this, I avoid having to contact my IPFS node and perform performance impacting operations like ipfs pin ls.

Yes while this isn' desirable it has been working very well but has a couple of considerations, namely that all operations which effect pinset must also update the DB. Don't forget, IPFS is still very new so sometimes you have to make small accommodations until such issues are resolved.

djdv · 2018-08-30T22:36:41Z

I maintain a database which contains an exact copy of the pins my IPFS nodes

I should mention that in working with pins, I've also come to the pattern of maintaining my own cache, to avoid delay on large nodes, even when only listing recursive pins,

In my specific case, I'm interested in both the listing being more performant, but also having some means of notification from the node. Like an event that I can subscribe to, which signals when the pinset has changed.
This would allow me to maintain my own state, poll once and then just poll once more (or do some means of a delta with info from the event) on state change, instead of polling based around time or some other arbitrary metric.

For context, I'm dealing with ipfs mount at the moment, and refresh the listing for /ipfs entails getting the node's pinset.
I'm also interested in other events from the node, such as knowing when keys have changed, mfs has changed, etc.
I'm willing to bet monitoring tools would be interested in this as well.

bonedaddy · 2018-08-30T22:41:06Z

Yes, absolutely it's made my node perform significantly better. i've currently begun moving to a model where the only time I need to talk to my IPFS node to list anything is for crucial operations. Otherwise, everything else that isn't a write operation should be reading from my cache/database

pjz · 2018-08-31T13:53:21Z

While those are great workarounds, they're not really feasible for a general monitoring solution. I guess they'll just have to wait until the IPFS server gets it together. I think it's clear that whatever datastructure it's using needs to be re-evaluated or supplemented to make this kind of monitoring/usage not cause it to eat itself.

Stebalien · 2018-08-31T23:55:22Z

So, adding pins should be faster. But listing every single object that has been pinned (directly or indirectly by some recursive pin) in your datastore will always be somewhat slower.

pjz · 2018-09-08T21:46:29Z

If everyone's solution is to maintain a parallel cache of what pins exist... why not have IPFS do that internally instead? Keep a cache that's invalidated on add/remove of pins, but otherwise is untouched. Then repeated calls to 'ipfs pin ls' would be trivial. Maybe make 'ipfs pin verify' also serve as a way to manually invalidate the cache/force a rebuild of it.

djdv · 2018-09-08T23:33:59Z

@pjz
I think that could help in improving the performance.
I know that awareness of the node's state is a separate issue, but if we have to come up with a messaging system for invalidating some node-wide, pinset-cache, we may as well have a system to broadcast those same events as well.
For those that still want to be made aware of when the state has changed.
The practical reason for this, would still just be to avoid unnecessary calls via polling, in long-lived processes.
Even if pin ls is fast, it'd be nice to update your copy of the pinset, only when it's been changed.

However, this only seems useful to implement if there's more than 1 event (more than just "pins have changed").
I mentioned some others before, like writes to MFS, IPNS key has been created/deleted/updated, etc.
This would allow people to maintain their own cache of various states, if they like, but still have a generic implementation underneath for fast operation in the general case.

Any opinions on this?

pjz · 2018-09-09T03:31:48Z

What you describe sounds somewhat like a way to tap into the logging system.

obo20 · 2019-07-02T19:09:57Z

@Stebalien Has there been much progress / prioritization on this front? As we continue to scale, this becomes increasingly relevant.

Stebalien · 2019-07-02T19:12:58Z

No progress.

S3bb1 · 2019-07-03T06:57:57Z

We're also facing this issue with aroung 2mio hashes and around 4-500k pins, can we support you in any way? We're currently "workarounding" this with multiple ipfs instances

Stebalien · 2019-07-11T18:45:28Z

@dirkmc you were looking into this for js-ipfs. Are you still planing on applying that same optimization to go-ipfs?

dirkmc · 2019-07-11T18:52:22Z

@Stebalien I'm currently doing some research to understand where the performance bottlenecks are with adding large numbers of files to go-ipfs, which will likely include performance analysis for pinning.

Before making any pinning optimizations, we'll likely want to decide if it makes sense for pins to be stored in the blockstore, which is a bigger conversation.

Zorlin · 2021-05-09T12:35:36Z

I will soon need to pin a million+ pins, I'm hoping this can be improved

Stebalien · 2021-05-10T18:40:18Z

This was actually fixed in go-ipfs 0.8.0, we just never closed the issue (see https://github.com/ipfs/go-ipfs/blob/master/CHANGELOG.md#-faster-local-pinning-and-unpinning). The number of pins you have should no longer matter when adding new pins.

Zorlin · 2021-05-10T22:58:20Z

Awesome, thanks! Love your work. I'll report any issues with scalability later if we run into them.

Stebalien added kind/bug A bug in existing code (including security flaws) topic/perf Performance labels Jul 13, 2018

This was referenced Jul 13, 2018

POST to /api/v0/dag/put?pin=true causes high CPU usage #4673

Closed

ipfs pin rm _very_ slow & cpu intensive #4717

Closed

IPFS API not responsive if the repository size has a lot of objects #5163

Closed

Stebalien added the status/deferred Conscious decision to pause or backlog label Jul 13, 2018

Stebalien mentioned this issue Jul 31, 2018

IPFS add hangs #5321

Closed

pjz mentioned this issue Aug 23, 2018

[ipfs plugin]: ipfs_local.repo_objects chart: API calls to IPFS make it consume too much cpu/ram. Disable by default? netdata/netdata#3156

Closed

Stebalien mentioned this issue Sep 7, 2018

IPFS Repo Storage And Files Over A Huge Amount（30G+）, "ipfs add" action slow down in a high concurrency environment #5438

Open

djdv mentioned this issue Sep 12, 2018

Windows mount support #5003

Closed

Stebalien mentioned this issue Sep 26, 2018

2018 Q4 OKR Planning #5474

Closed

Stebalien mentioned this issue Dec 16, 2018

Add file takes long time and concurrency in ipfs #5408

Closed

leerspace mentioned this issue Mar 27, 2019

progressive slowdown "ipfs add file" #6136

Closed

Stebalien mentioned this issue Mar 28, 2019

progressive slowdown "ipfs add file" #6148

Closed

DonaldTsang mentioned this issue May 16, 2019

Performace, or How IPFS will be better than BitTorrent #6342

Closed

obo20 mentioned this issue Sep 15, 2020

[Benchmark] - Store pins in a datastore instead of a DAG #7674

Closed

akavel mentioned this issue Nov 12, 2020

Named pins & pins stored in datastore #4757

Closed

5 tasks

aschmahmann mentioned this issue Dec 24, 2020

Ipfs executes pin=true slowly #7842

Closed

Stebalien closed this as completed May 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pinning is slow when there are many pins #5221

Pinning is slow when there are many pins #5221

Stebalien commented Jul 13, 2018 •

edited

Loading

ghost commented Jul 13, 2018

Stebalien commented Jul 18, 2018

Stebalien commented Jul 31, 2018

ivan386 commented Aug 2, 2018

Stebalien commented Aug 2, 2018

pjz commented Aug 23, 2018

bonedaddy commented Aug 24, 2018 •

edited

Loading

Stebalien commented Aug 28, 2018

pjz commented Aug 30, 2018

Stebalien commented Aug 30, 2018

pjz commented Aug 30, 2018 •

edited

Loading

Stebalien commented Aug 30, 2018 •

edited

Loading

Stebalien commented Aug 30, 2018

bonedaddy commented Aug 30, 2018 •

edited

Loading

djdv commented Aug 30, 2018

bonedaddy commented Aug 30, 2018 •

edited

Loading

pjz commented Aug 31, 2018

Stebalien commented Aug 31, 2018

pjz commented Sep 8, 2018

djdv commented Sep 8, 2018

pjz commented Sep 9, 2018

obo20 commented Jul 2, 2019 •

edited

Loading

Stebalien commented Jul 2, 2019

S3bb1 commented Jul 3, 2019

Stebalien commented Jul 11, 2019

dirkmc commented Jul 11, 2019

Zorlin commented May 9, 2021

Stebalien commented May 10, 2021

Zorlin commented May 10, 2021

Pinning is slow when there are many pins #5221

Pinning is slow when there are many pins #5221

Comments

Stebalien commented Jul 13, 2018 • edited Loading

ghost commented Jul 13, 2018

Stebalien commented Jul 18, 2018

Stebalien commented Jul 31, 2018

ivan386 commented Aug 2, 2018

Stebalien commented Aug 2, 2018

pjz commented Aug 23, 2018

bonedaddy commented Aug 24, 2018 • edited Loading

Stebalien commented Aug 28, 2018

pjz commented Aug 30, 2018

Stebalien commented Aug 30, 2018

pjz commented Aug 30, 2018 • edited Loading

Stebalien commented Aug 30, 2018 • edited Loading

Stebalien commented Aug 30, 2018

bonedaddy commented Aug 30, 2018 • edited Loading

djdv commented Aug 30, 2018

bonedaddy commented Aug 30, 2018 • edited Loading

pjz commented Aug 31, 2018

Stebalien commented Aug 31, 2018

pjz commented Sep 8, 2018

djdv commented Sep 8, 2018

pjz commented Sep 9, 2018

obo20 commented Jul 2, 2019 • edited Loading

Stebalien commented Jul 2, 2019

S3bb1 commented Jul 3, 2019

Stebalien commented Jul 11, 2019

dirkmc commented Jul 11, 2019

Zorlin commented May 9, 2021

Stebalien commented May 10, 2021

Zorlin commented May 10, 2021

Stebalien commented Jul 13, 2018 •

edited

Loading

bonedaddy commented Aug 24, 2018 •

edited

Loading

pjz commented Aug 30, 2018 •

edited

Loading

Stebalien commented Aug 30, 2018 •

edited

Loading

bonedaddy commented Aug 30, 2018 •

edited

Loading

bonedaddy commented Aug 30, 2018 •

edited

Loading

obo20 commented Jul 2, 2019 •

edited

Loading