-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stain Normalizer (v2) #273
Stain Normalizer (v2) #273
Conversation
add channel_axis kwarg to allow specifying which axis of the image corresponds to channels. Many performance-related updates.
This avoids the need to add a dependency on the parameterize package. Note that some expected results have been incremented by 1 due to a change from use of floor->round during float32->uint8 conversion
@drbeh, please see this refactored version of the stain normalization proposed previously. If there is no desire for a function-based interface we can remove the |
Thank you very much @grlee77 for this upgraded version. I'll review it today. |
Also, @thewtex pointed me to this method that was implemented in ITK: I can try to take a closer look at it at some point to estimate how easy it is to adapt to the GPU. |
# flip to ensure positive first coordinate so arctan2 angles are about 0 | ||
if ev[0, 0] < 0: | ||
ev[:, 0] *= -1 | ||
if ev[0, 1] < 0: | ||
ev[:, 1] *= -1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flipping was added here based on an MIT-licensed implementation in:
https://github.com/Peter554/StainTools/blob/2089900d11173ee5ea7de95d34532932afd3181a/staintools/stain_extraction/macenko_stain_extractor.py#L29-L37
) | ||
|
||
# channels_axis=0 for the shape (3, n_pixels) absorbance matrix | ||
src_stain_coeff = stain_decomposition_macenko( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should perhaps change "decomposition"->"extraction". not sure why I had named it this way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are fine to me since you need to decompose stains to extract them.
However, speaking on naming, do you think that we can give it a more representative name to this method? Rather than the family name of the first author of that paper, maybe we can come use the name of underlying method, since it is not a new method per se, and the paper just showed how they have used this method in histopathology.
RuntimeWarning, stacklevel=2) | ||
fact = 0.0 | ||
|
||
X -= X.mean(axis=1, keepdims=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this application, X here is always shape (3, n_pixels).
As an example justifying enforcing C-contiguous order above is given here.
Reduction along the last axis of an array of shape (3, 10_000_000) is much faster when this axis is contiguous in memory. It also benefits from enabling CUB in the environment via
CUPY_ACCELERATORS="cub"
dtype, order | duration (s) |
---|---|
float64, order=F | 0.15337538 |
float32, order=F | 0.14860819 |
float64, order=C | 0.00744207 (CUB disabled) |
float32, order=C | 0.00718318 (CUB disabled) |
float64, order=C | 0.00226073 (CUB enabled) |
float32, order=C | 0.00080919 (CUB enabled) |
These two pathology tools were used as a reference when developing the Macenko color normalization algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grlee77 thank you very much for refactoring this stain normalizer. Overall it looks great but I left some comments in the code.
python/cucim/src/cucim/core/operations/color/stain_normalizer.py
Outdated
Show resolved
Hide resolved
# slower for float64, which seems odd. Should further validate on | ||
# additional hardware. | ||
X = cp.asfortranarray(X) | ||
out = X.dot(X.T.conj()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this use case we shouldn't have any complex number so we should be able to remove .conj
.
If we remove rowvar
and consider it True
, then we can save the transpose on line 221 (with additional consideration on the following lines) and make this like X.T.dot(X)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in practice .conj()
is near-instantaneous for real-valued inputs. Calling .T similarly is cheap as it doesn't make a copy, but just modifies the strides. Example
import cupy as cp
a = cp.ones((3, 1000000))
d = cp.cuda.Device()
%timeit a.T.conj(); d.synchronize()
gives:
1.21 µs ± 4.21 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
whereas just making a copy of a
takes ~125 µs
# additional hardware. | ||
X = cp.asfortranarray(X) | ||
out = X.dot(X.T.conj()) | ||
out *= 1 / cp.float64(fact) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to separate it into a new line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just to force the multiplication to be done in-place. Otherwise, I think another temporary array is created?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right but anyways you have a division, so there isn't any gain over:
out = X.dot(X.T.conj()) / cp.float64(fact)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will do the division on the host, so there is just one in-place multiplication
out *= 1 / float(fact)
__all__ = [ | ||
"color_jitter", | ||
"rand_color_jitter" | ||
"absorbance_to_image", | ||
"image_to_absorbance", | ||
"StainNormalizer", | ||
"HEStainExtractor", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to add functional interfaces here? stain_decomposition_macenko
and normalize_color_macenko
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I still need to change the tests to use those as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I updated this now and removed the class-based implementation. Does that seem fine? (I think we should chose only one or the other rather than providing both)
python/cucim/src/cucim/core/operations/color/stain_normalizer.py
Outdated
Show resolved
Hide resolved
# This approach relies on a square stain coeffs matrix as used by | ||
# HistomicsTK. In practice, it gives nearly identical results to the | ||
# least-squares approach. | ||
coeff_inv = cp.linalg.inv(src_stain_coeff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating the extra column for statin coefficients, can't we use pseudo-inverse (for rectangular matrices)? {X^T.X}^{-1}X^T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can do that. Two ways to compute are:
Directly computing via
coeff_inv = cp.dot(cp.linalg.inv(cp.dot(src_stain_coeff.T, src_stain_coeff)), src_stain_coeff.T)
is faster than calling cp.linalg.pinv
I think the only reason we might potentially want to the version with an additional column would be if we wanted to not discard that channel and use it to visualize what ended up NOT in the H or E channel.
I will go ahead and remove that for now and just use this pseudo-inverse
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this pinv-based solution just "least squares"? Perhaps we should just remove the method
argument for now and always use the psuedo-inverse code path.
I was thinking method
could be useful later if we were to add additional methods like LASSO or a non-negative matrix factorization that would ensure non-negative concentrations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you added different methods but if the gain is not that much for now, we can keep only one of the methods and add the argument if needed in the future.
python/cucim/src/cucim/core/operations/color/stain_normalizer.py
Outdated
Show resolved
Hide resolved
python/cucim/src/cucim/core/operations/color/stain_normalizer.py
Outdated
Show resolved
Hide resolved
python/cucim/src/cucim/core/operations/color/stain_normalizer.py
Outdated
Show resolved
Hide resolved
python/cucim/src/cucim/core/operations/color/stain_normalizer.py
Outdated
Show resolved
Hide resolved
Either way is fine but I think it depends on |
Co-authored-by: Behrooz Hashemian <3968947+drbeh@users.noreply.github.com>
A threshold of 0.15 for a log10 scale is equivalent to a threshold of ~0.345 on a natural log scale.
Co-authored-by: Behrooz Hashemian <3968947+drbeh@users.noreply.github.com>
rename channel_shape -> spatial_shape for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @grlee77 ! It's been a long time since this implementation has been around.
Neha(author of original algorithm implementation, @nsrivathsa) would love this work :)
And thank you @drbeh for the great feedback!
|
||
__all__ = [ | ||
"color_jitter", | ||
"rand_color_jitter" | ||
"absorbance_to_image", | ||
"image_to_absorbance", | ||
'stain_extraction_macenko', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: quotation mark is not consistent. May want to use " instead of ' in line 11 and 12.
Or, please stick to one and apply the same rule to other cases.
python/cucim/src/cucim/core/operations/color/stain_normalizer.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Gigon Bae <gigony@gmail.com>
…tain_normalizer_v2
closes #96
This PR resumes work that was started in #186. Given the large overall refactoring, it was not feasible to make the suggestions there as individual comments.
Overall the approach is the same Macenko method that was was proposed in #186.
I spent quite a bit of time refactoring for performance and to separate out some aspects so that it will be easier to add additional related methods in the future. I find about 3x improvement for the case here vs. the one in #186.
A summary of the changes relative to #186 are:
Enhancements
channel_axis
argument that can be used to specify which axis of theinput array corresponds to color channels.
General refactoring
the typical
cucim.skimage
style. The existing class-based interface waskept as well. There is a small amount of redundancy in providing both, so we
should decide if this is worth it.
separate functions.
Performance Related Changes
cupy.fuse
to fuse multiple kerenel operations needed for absorbancecalculations into a single GPU kernel. This gives ~4x improvement in
conversions to/from absorbance space.
image_type
argument that defaults to 'intensity', but can be setto 'absorbance' to indicate that the image is already in absorbance space.
This is used to avoid redundant conversions during stain normalization.
_covariance
function that is a simplified and optimized version ofcupy.cov
. It runs 4x faster for me for the float32 test case I tried on aroughly size 2000x2000 image.
concentrations. The
_complement_stain_matrix
helper adds a third columnthat is orthogonal to the two estimated stain vectors, so that a standard
matrix inverse can be used. This is much faster in practice than calling
cupy.linalg.lstsq
and gives identical result for almost all voxels intest images. A tiny fraction of voxels differed in uint8 intensity by a
magnitude of 1, but this is likely just due differences in the rounding
result of finite precision floating point values. This approach is based on
the one used by HistomicsTK software (Apache 2.0 licensed).
Test Changes
pytest.mark.parametrize
instead of adding a dependency on parameterizedfloat32
back touint8
during color normalization.