max_labels questions #15

unidesigner · 2019-06-10T18:07:31Z

I'm a bit confused by the max_labels argument. If I run a connected_component call without the argument and then do a np.max(labels_out), I should get the number of components in the version after the recent 1.2.0 release. However, if now use this number with some margin to set max_labels, the procedure fails with exception:

Connected Components Error: Label 60000 cannot be mapped to union-find array of length 60000.
terminate called after throwing an instance of 'char const*'

It seems that internally, the union-find algorithm requires a higher number, but it is not clear to me how to estimate this number.

It would be great to find a way to reduce the peak memory footprint of this very nice package. :)

The text was updated successfully, but these errors were encountered:

william-silversmith · 2019-06-10T19:30:21Z

Hi Stephan,

The max_labels argument is a memory reduction hack that's not guaranteed to work well. Typically, I try to estimate it (for connectomics data) as a "representative volume" number of labels plus a large safety factor. In Kimimaro, right or wrong, it is set to 1/4th the number of voxels in the volume.

I might have a better way to reduce memory usage. I'm currently experimenting with removing the "union by size" feature from union-find. In some experiments, it reduces memory usage by half and improves performance ~10% on a set of connectomics labels and on random arrays. However, it doesn't make a ton of sense to me that it's faster.

If I can find a theoretical justification for it I'd be happy to release that, as it's less code and more performant.

william-silversmith · 2019-06-10T19:50:52Z

I added issue #16 for memory reduction discussion.

william-silversmith · 2019-06-10T22:44:39Z

@unidesigner Check out this PR and let me know what you think. #17

william-silversmith · 2019-06-11T02:50:24Z

@unidesigner I released v1.2.2, which can achieve lower than 2x the previous memory consumption and is also ~40% faster.

unidesigner · 2019-06-11T19:49:41Z

@william-silversmith Fantastic news - thanks for getting back to this so quickly! I will test it tomorrow and report back asap.

unidesigner · 2019-06-12T15:10:06Z

It works for me too, is much faster and uses less memory! The number of regions it finds is still the same, so so much for an additional datapoint to confirm that things still work. I close this is issue as I don't think there is not much else we can do about the max_label option at the moment.

william-silversmith added the question Further information is requested label Jun 10, 2019

william-silversmith added the performance Lower memory or faster computation. label Jun 10, 2019

william-silversmith mentioned this issue Jun 10, 2019

perf: remove union-by-size for lower memory and better performance #17

Merged

unidesigner closed this as completed Jun 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_labels questions #15

max_labels questions #15

unidesigner commented Jun 10, 2019

william-silversmith commented Jun 10, 2019 •

edited

Loading

william-silversmith commented Jun 10, 2019

william-silversmith commented Jun 10, 2019

william-silversmith commented Jun 11, 2019

unidesigner commented Jun 11, 2019

unidesigner commented Jun 12, 2019

max_labels questions #15

max_labels questions #15

Comments

unidesigner commented Jun 10, 2019

william-silversmith commented Jun 10, 2019 • edited Loading

william-silversmith commented Jun 10, 2019

william-silversmith commented Jun 10, 2019

william-silversmith commented Jun 11, 2019

unidesigner commented Jun 11, 2019

unidesigner commented Jun 12, 2019

william-silversmith commented Jun 10, 2019 •

edited

Loading