Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: remove union-by-size for lower memory and better performance #17

Merged
merged 1 commit into from
Jun 11, 2019

Conversation

william-silversmith
Copy link
Contributor

Related issues #15 #16

I'd like to understand why this is faster for the images I've attempted this on before merging.

@william-silversmith william-silversmith added the performance Lower memory or faster computation. label Jun 10, 2019
@william-silversmith william-silversmith self-assigned this Jun 10, 2019
@william-silversmith
Copy link
Contributor Author

william-silversmith commented Jun 10, 2019

import cc3d
import numpy as np
import time

for i in range(10):
  labels = np.random.randint(0,2, size=(512, 512, 512), dtype=np.bool)

  start = time.time()
  labels = cc3d.connected_components(labels)
  print(time.time() - start, "sec")

image

Running cc3d against random binary images 10x. (black) cc3d 1.2.0 (blue) Without union by size (red) scipy

@william-silversmith
Copy link
Contributor Author

william-silversmith commented Jun 10, 2019

Here's a different test on 512x512x512 uint32 connectomics data. 10 iterations of each algorithm.

image

(black) cc3d 1.2.0 (blue) without union by size (red) scipy (which treats the entire volume as a single label...)

@william-silversmith
Copy link
Contributor Author

I wonder if this is an L1/L2 cache effect. If you are storing fewer cache lines of the size array in cache, there's more room for the labels.

@william-silversmith
Copy link
Contributor Author

It looks like scipy uses a little more than 128 MB (input uint8 image) + 512 MB (output int32 image). This would be hard to beat without a sparse representation of the equivalence table.

@william-silversmith
Copy link
Contributor Author

For example, here's how scipy does on an np.arange( 512**3 ) + 1 reshaped into a 512x512x512:

image

@william-silversmith
Copy link
Contributor Author

I don't have a fantastic explanation, but this seems faster on all the examples I'm throwing at it so... 🤷‍♂️

@william-silversmith william-silversmith merged commit 47c5716 into master Jun 11, 2019
@william-silversmith william-silversmith deleted the wms_drop_union_by_size branch June 11, 2019 01:49
@william-silversmith
Copy link
Contributor Author

I think I might have an explanation for why this is okay. The Big-O for "find" in union-find is not as good when you remove union-by-rank or union-by-size, however this is the worst case and it is improved by path compression after accessing. Since we are typically processing the same label many times on an image, it's possible that the first access will be bad, but on average / amortized, the access will be fast. The union-by-size / union-by-rank code simply adds constant overhead and impairs the cache in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Lower memory or faster computation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant