-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stain Normalizer #186
Stain Normalizer #186
Conversation
Can one of the admins verify this patch? |
add to allowlist |
Thanks for adding this @drbeh . However, I'm trying to find if there is an equivalent of this in scikit-image, do you know if there is ? |
Hi @quasiben, I am not aware of an equivalent to this stain normalizer. The most relevant thin on scikit-image that I know is It might be useful to see how we can connect these two and maybe leverage skimage for improving our stain normalizer. |
There has been a recent, related discussion on just having a normalization to Hematoxylin & Eosin in scikit-image at: |
@gigony @quasiben taking a closer look at scikit-image, I realized that although the functionality of this PR is very related to stain separation in skimage, it is orthogonal/complementary to that.
|
Yes, in scikit-image there is a fixed matrix that gets applied to do the stain normalization. The user could also apply their own matrix or use one of the other predefined ones with |
Ah, I see there is a reference to a Macenko et. al. paper in the docstrings. I will take a look at that to understand the method proposed here a bit better. |
absorbance = absorbance[cp.all(absorbance > self.beta, axis=1)] | ||
if len(absorbance) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absorbance = absorbance[cp.all(absorbance > self.beta, axis=1)] | |
if len(absorbance) == 0: | |
absorbance = absorbance[:, cp.all(absorbance > self.beta, axis=0)] | |
if absorbance.size == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the prior line, it looks like this method expects an image with channels along the first axis.
I have been refactoring things while testing locally. I think we can support specification of a channel_axis
argument to make it clear which axis of the input is expected to be channels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear from the Marcenko et. al. publication whether this should ideally be all(absorbance > self.beta)
as here or if some other operation like the mean or max absorbance would be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that everything that they have used for this filtering is based on heuristics. Even the proposed value for beta
doesn't have any scientific justification.
""" | ||
# calculate absorbance | ||
image = image.astype(cp.float32, copy=False) + 1.0 | ||
absorbance = -cp.log( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marcenko et. al. uses log10
rather than log
in the publication, but existing open source implementations I found all use the natural log. I guess either is valid and it only effects the absolute scale of the beta
threshold. Empirically, computation of log
is better optimized than log10
, so we should probably keep this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! It only affects beta
and as long as we are using the correct inverse (exp
for log
and 10**
for log10
) both are fine.
I agree that this Marcenko method is highly cited an seems like a good starting point. I see that in determining the stain vectors, least squares is used here, which is readily available in CuPy. Other, related techniques use different solutions, but those can potentially be added later. For example, there is a technique based on non-negative matrix factorization here: The Marcenko method is also implemented (under an MIT license) here: https://github.com/Peter554/StainTools I will update this PR with specific suggestions later today after testing a bit more with some actual images (such as the ones used for the demo in the ITKColorNormalization repo). |
Closing this in favor of #273 |
closes #96 This PR resumes work that was started in #186. Given the large overall refactoring, it was not feasible to make the suggestions there as individual comments. Overall the approach is the same Macenko method that was was proposed in #186. I spent quite a bit of time refactoring for performance and to separate out some aspects so that it will be easier to add additional related methods in the future. I find about 3x improvement for the case here vs. the one in #186. A summary of the changes relative to #186 are: ### Enhancements - added a `channel_axis` argument that can be used to specify which axis of the input array corresponds to color channels. - more descriptive docstrings ### General refactoring - Now also provides a function-based interface as well to more closely match the typical `cucim.skimage` style. The existing class-based interface was kept as well. There is a small amount of redundancy in providing both, so we should decide if this is worth it. - Aspects like conversion to/from absorbance units was separated out into separate functions. ### Performance Related Changes - Used `cupy.fuse` to fuse multiple kerenel operations needed for absorbance calculations into a single GPU kernel. This gives ~4x improvement in conversions to/from absorbance space. - Added an `image_type` argument that defaults to 'intensity', but can be set to 'absorbance' to indicate that the image is already in absorbance space. This is used to avoid redundant conversions during stain normalization. - Added a `_covariance` function that is a simplified and optimized version of `cupy.cov`. It runs 4x faster for me for the float32 test case I tried on a roughly size 2000x2000 image. - Added a second 'ortho' method aside from least-squares for estimating the raw concentrations. The `_complement_stain_matrix` helper adds a third column that is orthogonal to the two estimated stain vectors, so that a standard matrix inverse can be used. This is much faster in practice than calling `cupy.linalg.lstsq` and gives identical result for almost all voxels in test images. A tiny fraction of voxels differed in uint8 intensity by a magnitude of 1, but this is likely just due differences in the rounding result of finite precision floating point values. This approach is based on the one used by HistomicsTK software (Apache 2.0 licensed). ### Test Changes - refactored to use `pytest.mark.parametrize` instead of adding a dependency on [parameterized](https://github.com/wolever/parameterized) - There is an increase by 1 in some expected pixel values in the "expected" result for some test cases due to replacement of casting to integers (floor operation) with the use of rounding upon conversion of `float32` back to `uint8` during color normalization. Authors: - Behrooz Hashemian (https://github.com/drbeh) - Gregory Lee (https://github.com/grlee77) Approvers: - Gigon Bae (https://github.com/gigony) - https://github.com/jakirkham
Fix #96
This PR adds an stain extractor and normalizer for Hematoxylin and Eosin implemented in CuPy