Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.4.5: repo does not scale beyond ~ 8000 files #3621

Closed
mguentner opened this issue Jan 22, 2017 · 15 comments · Fixed by #3640
Closed

0.4.5: repo does not scale beyond ~ 8000 files #3621

mguentner opened this issue Jan 22, 2017 · 15 comments · Fixed by #3640
Assignees
Labels
kind/bug A bug in existing code (including security flaws) topic/perf Performance topic/technical debt Topic technical debt
Milestone

Comments

@mguentner
Copy link

mguentner commented Jan 22, 2017

Version information:

go-ipfs version: 0.4.5-dev-
Repo version: 5
System version: amd64/linux
Golang version: go1.7.4

Type:

Bug

Priority:

P0

Description:

When adding a lot of files locally (or using the daemon without --enable-gc), the repo grows even faster in size after a certain limit.
It seems to be around 8000, my gut tells me it's 8192 💫
Unless you have a lot of space with good IO, this makes it very hard to import files without
garbage collecting in between (say every 1k).
This is probably a blocker for 0.4.5

Here is a fancy graph that displays the problem:
Adding 12000 files

After running ipfs repo gc the repo size shrinks from 16 GiB to 78 MiB

If you want to reproduce it, go and fetch QmcsrSRuBmxNxcEXjMZ1pmyRgnutCGwfAhhnRfaNn9P94F @ ipfs.io

@jbenet
Copy link
Member

jbenet commented Jan 22, 2017

Thank you, this is great. we should:

  • add this as a test case to go-ipfs (i.e. verify smooth scaling of repo size)
  • have a way to get graphs/reports like this

@Kubuxu
Copy link
Member

Kubuxu commented Jan 22, 2017

My bet is that it might be the pin sharding we are currently use.ing it is being enabled at 8Ki items in the pin set, and causes at least 256 objects to be created with each add (change of the sharing seed, IDK why it was designed this way).

This shouldn't be a be a 0.4.5 blocker.

@mguentner
Copy link
Author

mguentner commented Jan 22, 2017

Even without the 8192 limit, the overhead before a gc seems rather high (roughly 2200%)

@Kubuxu anyway...I think you are right. This is not a blocker for 0.4.5, people should be made aware that adding a lot without garbage collecting in between is not possible

@whyrusleeping
Copy link
Member

oh... that must be the pinset stupidity. I made a note to fix that last quarter and didnt. Thank you for bringing it up and making it a priority.

@whyrusleeping whyrusleeping added this to the ipfs 0.4.6 milestone Jan 24, 2017
@whyrusleeping whyrusleeping added kind/bug A bug in existing code (including security flaws) topic/perf Performance topic/technical debt Topic technical debt labels Jan 24, 2017
@whyrusleeping
Copy link
Member

In the meantime, depending on the dataset, adding without pinning should not trigger this bug. @mguentner do you mind rerunning the tests that created that graph with --pin=false ?

@mguentner
Copy link
Author

For reference ipfs 0.4.4 with the same dataset (pinned):

And now ipfs 0.4.5 without pinning as requested by @whyrusleeping (mind the scale)

@whyrusleeping
Copy link
Member

I think a quick backwards compatible change we can make to fix this is to fix the random seed thing in the pinsets. That should prevent this exponential explosion thing in the short term. Long term we need to find a better datastructure to manage the pins.

@mguentner could you get me a script that generates these graphs? That would be super helpful

@whyrusleeping
Copy link
Member

@mguentner nvm, found your script linked above. sorry about that

@whyrusleeping
Copy link
Member

Yeah, making the random seed in pin/set.go return 42 causes the numbers to look much better: https://ipfs.io/ipfs/QmaiQCNM7bURXHtmfe9BXFsZfjgXj7NsSobobc9u2iRYQJ

I see no good reason why we should continue using a random number there... Im actually not certain what use that had in the first place (I think it was trying to make sure the set didnt get weirdly unbalanced).

@whyrusleeping
Copy link
Member

Ah, the random number was used to ensure that when placing items in nested buckets, they didnt all end up in the same spot. Say 1000 entries hash into bucket slot 50, We then recursively add those 1000 entries into their own 'sub-set'. Without the random number, they would all hash into bucket slot 50 of the sub-bucket, and we would likely have some uber-recursion-stack-overflow sort of thing. Whats weird is that the test i ran didnt appear to trigger that scenario... Need to investigate the conditions that lead to that issue

@whyrusleeping
Copy link
Member

(havent found the problem yet, just logging my thoughts here)

One easy solution is to just use the subtree depth as the seed instead of a random number. Its deterministic AND solves the bucket collision issue.

@ghost
Copy link

ghost commented Jan 28, 2017

Make that comment about the random number a comment in the code, for the next random developer wondering ;)

@whyrusleeping
Copy link
Member

Figured out the recursion issue that the random number is meant to prevent, confirmed that using the depth of the tree instead of a random number solves the problem as well as preventing the exponential object creation issue. Will push some code soon

@rht
Copy link
Contributor

rht commented Jan 28, 2017

As with the graph / test case, this is redundant with having a repo that constantly pin large real datasets (e.g. the ongoing data.gov or an existing cdnjs) as a natural test case. Performance regression can be detected by how long it takes to version-update these datasets.

@rht
Copy link
Contributor

rht commented Jan 28, 2017

(drawing the parallel with nix and nixpkgs-hydra build -- in this scope, ipfs to data is as nix to software)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) topic/perf Performance topic/technical debt Topic technical debt
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants