fix bug in pinsets and add a stress test for the scenario #3273

Merged
merged 4 commits into from Oct 9, 2016

Conversation

Projects
None yet
5 participants
@whyrusleeping
Member

whyrusleeping commented Sep 29, 2016

Sometime after having ~5000 items in a pinset, we start to get some hash collisions when mapping the 32bit int space over an 8 bit integer space. The easy enough fix is to modulo the hash output down into our final key space before we even get to that point.

Longer term, i want to see us using the HAMT code for this purpose (though its not yet ready).

License: MIT
Signed-off-by: Jeromy why@ipfs.io

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 29, 2016

Member

Don't merge this before I CR it-- i will do so in the next day or two.

Member

jbenet commented Sep 29, 2016

Don't merge this before I CR it-- i will do so in the next day or two.

@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
Member

whyrusleeping commented Sep 29, 2016

@jbenet SGTM

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Sep 29, 2016

Member

@whyrusleeping can you explain more what's going on? we should have a long comment somewhere in the code explaining the algorithm, and then how this change affects it.

Member

jbenet commented Sep 29, 2016

@whyrusleeping can you explain more what's going on? we should have a long comment somewhere in the code explaining the algorithm, and then how this change affects it.

@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
Member

whyrusleeping commented Sep 29, 2016

@jbenet will do

@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
@whyrusleeping

whyrusleeping Sep 29, 2016

Member

@jbenet added a large comment, and a third commit that cleans up the logic around the bugged area a bit to make things a little more obvious and readable

Member

whyrusleeping commented Sep 29, 2016

@jbenet added a large comment, and a third commit that cleans up the logic around the bugged area a bit to make things a little more obvious and readable

@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
@whyrusleeping

whyrusleeping Oct 2, 2016

Member

@jbenet Can you review this? This is fairly important and its been 'a day or two'

Member

whyrusleeping commented Oct 2, 2016

@jbenet Can you review this? This is fairly important and its been 'a day or two'

@Kubuxu

This comment has been minimized.

Show comment
Hide comment
@Kubuxu

Kubuxu Oct 5, 2016

Member

The fix looks solid to me, the bug was simple: the recursive fanout was done with fanout of 1<<32 and just later it was contracting it to fanout of 256 thus overriding fanned out keys with data if the lower 256 bits were the same.

Member

Kubuxu commented Oct 5, 2016

The fix looks solid to me, the bug was simple: the recursive fanout was done with fanout of 1<<32 and just later it was contracting it to fanout of 256 thus overriding fanned out keys with data if the lower 256 bits were the same.

whyrusleeping added some commits Sep 29, 2016

fix bug in pinsets and add a stress test for the scenario
License: MIT
Signed-off-by: Jeromy <why@ipfs.io>
add comment detailing the algorithm and fix
License: MIT
Signed-off-by: Jeromy <why@ipfs.io>
pinset: clean up storeItems logic a bit
Switched from using a map to an array since the bounds are
small and fixed. This should save us some significant time and on
accesses

License: MIT
Signed-off-by: Jeromy <why@ipfs.io>
@lgierth

This comment has been minimized.

Show comment
Hide comment
@lgierth

lgierth Oct 6, 2016

Member

To everybody reading this, please make sure to have backups of your pinsets: ipfs pin ls > pinset.txt

Member

lgierth commented Oct 6, 2016

To everybody reading this, please make sure to have backups of your pinsets: ipfs pin ls > pinset.txt

@lgierth lgierth added bug repo labels Oct 6, 2016

Remove legacy multiset 'data' fields, comment and cleanup more
License: MIT
Signed-off-by: Jeromy <why@ipfs.io>
@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
@whyrusleeping

whyrusleeping Oct 7, 2016

Member

I removed all of the old 'multiset' code that made things much much more confusing. I also cleaned up a few different things and added a bunch more comments. I think its much easier to understand whats going on now.

Member

whyrusleeping commented Oct 7, 2016

I removed all of the old 'multiset' code that made things much much more confusing. I also cleaned up a few different things and added a bunch more comments. I think its much easier to understand whats going on now.

@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
@whyrusleeping

whyrusleeping Oct 7, 2016

Member

I have a program that is able to find 'lost' hashes if you havent run a garbage collection yet.
The only downside is that it also reports pins that you've manually removed via ipfs pin rm.

https://github.com/whyrusleeping/ipfs-see-all

I'll be updating the build instructions soon and providing pre-built binaries to download from dist.ipfs.io shortly.

Member

whyrusleeping commented Oct 7, 2016

I have a program that is able to find 'lost' hashes if you havent run a garbage collection yet.
The only downside is that it also reports pins that you've manually removed via ipfs pin rm.

https://github.com/whyrusleeping/ipfs-see-all

I'll be updating the build instructions soon and providing pre-built binaries to download from dist.ipfs.io shortly.

@jbenet

This comment has been minimized.

Show comment
Hide comment
@jbenet

jbenet Oct 7, 2016

Member

Hey @tv42 -- could you CR this and verify it's right?

Member

jbenet commented Oct 7, 2016

Hey @tv42 -- could you CR this and verify it's right?

@tv42

This comment has been minimized.

Show comment
Hide comment
@tv42

tv42 Oct 7, 2016

Contributor

1f853c5 LGTM

Contributor

tv42 commented Oct 7, 2016

1f853c5 LGTM

@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
@whyrusleeping

whyrusleeping Oct 7, 2016

Member

Doing some more tests, the bug is triggered any time we hit more than 8192 pins, very reliably.

Member

whyrusleeping commented Oct 7, 2016

Doing some more tests, the bug is triggered any time we hit more than 8192 pins, very reliably.

@Kubuxu

This comment has been minimized.

Show comment
Hide comment
@Kubuxu

Kubuxu Oct 7, 2016

Member

8192 is the moment we start hashing pins, from my analysis you will have almost 100% failure rate at 8192+256+1.

Member

Kubuxu commented Oct 7, 2016

8192 is the moment we start hashing pins, from my analysis you will have almost 100% failure rate at 8192+256+1.

@Kubuxu

This comment has been minimized.

Show comment
Hide comment
@Kubuxu

Kubuxu Oct 7, 2016

Member

But estimated length might be higher than 8192 even if pincount is lower than 8192 so we will start hashing earlier, if estimated length is greater or equal to 8192 you will need just 256+1 pins to trigger faulty code.

Member

Kubuxu commented Oct 7, 2016

But estimated length might be higher than 8192 even if pincount is lower than 8192 so we will start hashing earlier, if estimated length is greater or equal to 8192 you will need just 256+1 pins to trigger faulty code.

@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
@whyrusleeping

whyrusleeping Oct 8, 2016

Member

The tests show a zero percent failure rate up until 8192 pins. After that,
it's a 100%. (Over 50 runs at each pin count)

On Fri, Oct 7, 2016, 14:59 Jakub Sztandera notifications@github.com wrote:

But estimated length might be higher than 8192 even if pincount is lower
than 8192 so we will start hashing earlier, if estimated length is greater
or equal to 8192 you will need just 256+1 pins to trigger faulty code.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3273 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABL4HFWcFqoBtGsIsVCUwgFPBfIjl9hCks5qxsDAgaJpZM4KKalh
.

Member

whyrusleeping commented Oct 8, 2016

The tests show a zero percent failure rate up until 8192 pins. After that,
it's a 100%. (Over 50 runs at each pin count)

On Fri, Oct 7, 2016, 14:59 Jakub Sztandera notifications@github.com wrote:

But estimated length might be higher than 8192 even if pincount is lower
than 8192 so we will start hashing earlier, if estimated length is greater
or equal to 8192 you will need just 256+1 pins to trigger faulty code.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3273 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABL4HFWcFqoBtGsIsVCUwgFPBfIjl9hCks5qxsDAgaJpZM4KKalh
.

@Kubuxu

This comment has been minimized.

Show comment
Hide comment
@Kubuxu

Kubuxu Oct 8, 2016

Member

Yes as then it will skip this: https://github.com/ipfs/go-ipfs/pull/3273/files#diff-15e7154f15253315d2a8ba7e1744d9e7L116 branch and proceed to split the 8192 pins into buckets.

Member

Kubuxu commented Oct 8, 2016

Yes as then it will skip this: https://github.com/ipfs/go-ipfs/pull/3273/files#diff-15e7154f15253315d2a8ba7e1744d9e7L116 branch and proceed to split the 8192 pins into buckets.

@whyrusleeping

This comment has been minimized.

Show comment
Hide comment
@whyrusleeping

whyrusleeping Oct 9, 2016

Member

Gonna go ahead and merge this, no sense waiting any longer.

Member

whyrusleeping commented Oct 9, 2016

Gonna go ahead and merge this, no sense waiting any longer.

@whyrusleeping whyrusleeping merged commit 391b78a into master Oct 9, 2016

5 of 6 checks passed

teamcity Started TeamCity Build go-ipfs :: ci_tests_linux
Details
ci/circleci Your tests passed on CircleCI!
Details
commit-message-check/gitcop All commit messages are valid
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
js-ipfs-api Finished TeamCity Build go-ipfs :: js-ipfs-api-tests : Running
Details

@whyrusleeping whyrusleeping deleted the fix/pin-fail branch Oct 9, 2016

@lgierth lgierth referenced this pull request Dec 23, 2016

Closed

Code Review v0.4.4...v0.4.5 #3534

0 of 199 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment