-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid uneccesary copies in skimage.morphology.label #2701
Conversation
As each of the copies above is replacing each other, I don't expect this PR improves memory usage much if at all. It should improve performance a bit. I rescaled the cameraman image up to size 4096x4096 and ran I think the main reason for the high memory usage reported in that mailing list thread is that the labels are stored in int64 dtype. This means that the labels are 8 times the memory usage of a uint8 input. I don't think there is much way around that without rewriting the routines to allow the label array to mach the integer dtype of the input (perhaps via fused types?). I don't intend to pursue that in this PR, but it is probably worth still merging it for the minor performance boost. |
There is some sort of PIL-related failure in the doctests that don't seem related to this PR: |
Note that the dtype of the labels in no way depends on the dtype of the input, but rather on the number of connected components of the input. This is bounded by |
right, it is not related to input dtype. I misstated that above. I think 16 bit integers are probably sufficient for the majority of cases, but I guess the question would be whether to use a conservative value like |
I would be in favor of having a dtype keyword argument to impose the type, with some warning in the case of overflow. Another trick might be to check the number of connected components at the end of the function, and down-cast the type if int64 was not needed: it will not improve the memory usage during the execution of the function, but it's useful if the user is manipulating the image of labels later on. |
This PR seems good to merge, and then we can start another issue for Emma's suggestion above (I think it's a good one). |
@@ -343,49 +343,52 @@ def undo_reshape_array(arr, swaps): | |||
return reshaped | |||
|
|||
|
|||
def label_cython(input, neighbors=None, background=None, return_num=False, | |||
def label_cython(input_, neighbors=None, background=None, return_num=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why rename input
to input_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input
is a reserved Python keyword. I endorse this renaming.
@grlee77 could you, please, rebase onto |
Thanks @grlee77 ! |
Description
A user on the mailing list brought up an issue related to high memory usage in
skimage.morphology.label
https://mail.python.org/pipermail/scikit-image/2017-July/005313.html
This PR makes minor tweaks to
label_cython
to avoid some unnecessary extra copies of the data.Specifically, the following line will make three separate copies:
(
flatten
returns a copy,astype
returns a copy, and thennp.copy
will make another copy!)I also renamed the variable
input
toinput_
to avoid confusion with the built-in function of the same name. I don't think this should be a problem aslabel_cython
is not exported as part of the public API.For reviewers
(Don't remove the checklist below.)
later.
__init__.py
.doc/release/release_dev.rst
.