Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_labels questions #15

Closed
unidesigner opened this issue Jun 10, 2019 · 6 comments
Closed

max_labels questions #15

unidesigner opened this issue Jun 10, 2019 · 6 comments
Labels
performance Lower memory or faster computation. question Further information is requested

Comments

@unidesigner
Copy link

I'm a bit confused by the max_labels argument. If I run a connected_component call without the argument and then do a np.max(labels_out), I should get the number of components in the version after the recent 1.2.0 release. However, if now use this number with some margin to set max_labels, the procedure fails with exception:

Connected Components Error: Label 60000 cannot be mapped to union-find array of length 60000.
terminate called after throwing an instance of 'char const*'

It seems that internally, the union-find algorithm requires a higher number, but it is not clear to me how to estimate this number.

It would be great to find a way to reduce the peak memory footprint of this very nice package. :)

@william-silversmith william-silversmith added the question Further information is requested label Jun 10, 2019
@william-silversmith
Copy link
Contributor

william-silversmith commented Jun 10, 2019

Hi Stephan,

The max_labels argument is a memory reduction hack that's not guaranteed to work well. Typically, I try to estimate it (for connectomics data) as a "representative volume" number of labels plus a large safety factor. In Kimimaro, right or wrong, it is set to 1/4th the number of voxels in the volume.

I might have a better way to reduce memory usage. I'm currently experimenting with removing the "union by size" feature from union-find. In some experiments, it reduces memory usage by half and improves performance ~10% on a set of connectomics labels and on random arrays. However, it doesn't make a ton of sense to me that it's faster.

If I can find a theoretical justification for it I'd be happy to release that, as it's less code and more performant.

@william-silversmith william-silversmith added the performance Lower memory or faster computation. label Jun 10, 2019
@william-silversmith
Copy link
Contributor

I added issue #16 for memory reduction discussion.

@william-silversmith
Copy link
Contributor

@unidesigner Check out this PR and let me know what you think. #17

@william-silversmith
Copy link
Contributor

@unidesigner I released v1.2.2, which can achieve lower than 2x the previous memory consumption and is also ~40% faster.

@unidesigner
Copy link
Author

@william-silversmith Fantastic news - thanks for getting back to this so quickly! I will test it tomorrow and report back asap.

@unidesigner
Copy link
Author

It works for me too, is much faster and uses less memory! The number of regions it finds is still the same, so so much for an additional datapoint to confirm that things still work. I close this is issue as I don't think there is not much else we can do about the max_label option at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Lower memory or faster computation. question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants