Stain Normalizer (v2) #273

grlee77 · 2022-05-11T20:08:22Z

closes #96

This PR resumes work that was started in #186. Given the large overall refactoring, it was not feasible to make the suggestions there as individual comments.

Overall the approach is the same Macenko method that was was proposed in #186.

I spent quite a bit of time refactoring for performance and to separate out some aspects so that it will be easier to add additional related methods in the future. I find about 3x improvement for the case here vs. the one in #186.

A summary of the changes relative to #186 are:

Enhancements

added a channel_axis argument that can be used to specify which axis of the
input array corresponds to color channels.
more descriptive docstrings

General refactoring

Now also provides a function-based interface as well to more closely match
the typical cucim.skimage style. The existing class-based interface was
kept as well. There is a small amount of redundancy in providing both, so we
should decide if this is worth it.
Aspects like conversion to/from absorbance units was separated out into
separate functions.

Performance Related Changes

Used cupy.fuse to fuse multiple kerenel operations needed for absorbance
calculations into a single GPU kernel. This gives ~4x improvement in
conversions to/from absorbance space.
Added an image_type argument that defaults to 'intensity', but can be set
to 'absorbance' to indicate that the image is already in absorbance space.
This is used to avoid redundant conversions during stain normalization.
Added a _covariance function that is a simplified and optimized version of
cupy.cov. It runs 4x faster for me for the float32 test case I tried on a
roughly size 2000x2000 image.
Added a second 'ortho' method aside from least-squares for estimating the raw
concentrations. The _complement_stain_matrix helper adds a third column
that is orthogonal to the two estimated stain vectors, so that a standard
matrix inverse can be used. This is much faster in practice than calling
cupy.linalg.lstsq and gives identical result for almost all voxels in
test images. A tiny fraction of voxels differed in uint8 intensity by a
magnitude of 1, but this is likely just due differences in the rounding
result of finite precision floating point values. This approach is based on
the one used by HistomicsTK software (Apache 2.0 licensed).

Test Changes

refactored to use pytest.mark.parametrize instead of adding a dependency on parameterized
There is an increase by 1 in some expected pixel values in the "expected" result for some test cases due to replacement of casting to integers (floor operation) with the use of rounding upon conversion of float32 back to uint8 during color normalization.

add channel_axis kwarg to allow specifying which axis of the image corresponds to channels. Many performance-related updates.

This avoids the need to add a dependency on the parameterize package. Note that some expected results have been incremented by 1 due to a change from use of floor->round during float32->uint8 conversion

…lizer_v2

grlee77 · 2022-05-11T22:06:21Z

@drbeh, please see this refactored version of the stain normalization proposed previously. If there is no desire for a function-based interface we can remove the normalize_colors_macenko method and just keep the StainNormalizer class.

drbeh · 2022-05-12T12:59:52Z

Thank you very much @grlee77 for this upgraded version. I'll review it today.

grlee77 · 2022-05-12T15:08:54Z

Also, @thewtex pointed me to this method that was implemented in ITK:
https://github.com/InsightSoftwareConsortium/ITKColorNormalization

I can try to take a closer look at it at some point to estimate how easy it is to adapt to the GPU.

grlee77 · 2022-05-12T15:12:18Z

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

+    # flip to ensure positive first coordinate so arctan2 angles are about 0
+    if ev[0, 0] < 0:
+        ev[:, 0] *= -1
+    if ev[0, 1] < 0:
+        ev[:, 1] *= -1
+


This flipping was added here based on an MIT-licensed implementation in:
https://github.com/Peter554/StainTools/blob/2089900d11173ee5ea7de95d34532932afd3181a/staintools/stain_extraction/macenko_stain_extractor.py#L29-L37

grlee77 · 2022-05-12T15:13:36Z

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

+    )
+
+    # channels_axis=0 for the shape (3, n_pixels) absorbance matrix
+    src_stain_coeff = stain_decomposition_macenko(


should perhaps change "decomposition"->"extraction". not sure why I had named it this way

Both are fine to me since you need to decompose stains to extract them.
However, speaking on naming, do you think that we can give it a more representative name to this method? Rather than the family name of the first author of that paper, maybe we can come use the name of underlying method, since it is not a new method per se, and the paper just showed how they have used this method in histopathology.

grlee77 · 2022-05-12T15:20:21Z

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

+                      RuntimeWarning, stacklevel=2)
+        fact = 0.0
+
+    X -= X.mean(axis=1, keepdims=True)


In this application, X here is always shape (3, n_pixels).

As an example justifying enforcing C-contiguous order above is given here.

Reduction along the last axis of an array of shape (3, 10_000_000) is much faster when this axis is contiguous in memory. It also benefits from enabling CUB in the environment via
CUPY_ACCELERATORS="cub"

dtype, order duration (s)

float64, order=F 0.15337538

float32, order=F 0.14860819

float64, order=C 0.00744207 (CUB disabled)

float32, order=C 0.00718318 (CUB disabled)

float64, order=C 0.00226073 (CUB enabled)

float32, order=C 0.00080919 (CUB enabled)

These two pathology tools were used as a reference when developing the Macenko color normalization algorithm.

drbeh

@grlee77 thank you very much for refactoring this stain normalizer. Overall it looks great but I left some comments in the code.

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

drbeh · 2022-05-12T13:57:57Z

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

+        #       slower for float64, which seems odd. Should further validate on
+        #       additional hardware.
+        X = cp.asfortranarray(X)
+    out = X.dot(X.T.conj())


In this use case we shouldn't have any complex number so we should be able to remove .conj.

If we remove rowvar and consider it True, then we can save the transpose on line 221 (with additional consideration on the following lines) and make this like X.T.dot(X).

I think in practice .conj() is near-instantaneous for real-valued inputs. Calling .T similarly is cheap as it doesn't make a copy, but just modifies the strides. Example

import cupy as cp a = cp.ones((3, 1000000)) d = cp.cuda.Device() %timeit a.T.conj(); d.synchronize()

gives:
1.21 µs ± 4.21 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

whereas just making a copy of a takes ~125 µs

drbeh · 2022-05-12T13:58:35Z

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

+        #       additional hardware.
+        X = cp.asfortranarray(X)
+    out = X.dot(X.T.conj())
+    out *= 1 / cp.float64(fact)


Is there any reason to separate it into a new line?

It was just to force the multiplication to be done in-place. Otherwise, I think another temporary array is created?

Right but anyways you have a division, so there isn't any gain over:

out = X.dot(X.T.conj()) / cp.float64(fact)

Okay, I will do the division on the host, so there is just one in-place multiplication

out *= 1 / float(fact)

drbeh · 2022-05-12T14:09:32Z

python/cucim/src/cucim/core/operations/color/__init__.py

 __all__ = [
    "color_jitter",
    "rand_color_jitter"
+    "absorbance_to_image",
+    "image_to_absorbance",
+    "StainNormalizer",
+    "HEStainExtractor",
 ]


Don't we want to add functional interfaces here? stain_decomposition_macenko and normalize_color_macenko

Yes, and I still need to change the tests to use those as well.

Okay, I updated this now and removed the class-based implementation. Does that seem fine? (I think we should chose only one or the other rather than providing both)

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

drbeh · 2022-05-12T15:16:31Z

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

+        # This approach relies on a square stain coeffs matrix as used by
+        # HistomicsTK. In practice, it gives nearly identical results to the
+        # least-squares approach.
+        coeff_inv = cp.linalg.inv(src_stain_coeff)


Instead of creating the extra column for statin coefficients, can't we use pseudo-inverse (for rectangular matrices)? {X^T.X}^{-1}X^T

Yes, we can do that. Two ways to compute are:
Directly computing via

coeff_inv = cp.dot(cp.linalg.inv(cp.dot(src_stain_coeff.T, src_stain_coeff)), src_stain_coeff.T)

is faster than calling cp.linalg.pinv

I think the only reason we might potentially want to the version with an additional column would be if we wanted to not discard that channel and use it to visualize what ended up NOT in the H or E channel.

I will go ahead and remove that for now and just use this pseudo-inverse

Isn't this pinv-based solution just "least squares"? Perhaps we should just remove the method argument for now and always use the psuedo-inverse code path.

I was thinking method could be useful later if we were to add additional methods like LASSO or a non-negative matrix factorization that would ensure non-negative concentrations.

I like that you added different methods but if the gain is not that much for now, we can keep only one of the methods and add the argument if needed in the future.

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

drbeh · 2022-05-12T15:56:52Z

@drbeh, please see this refactored version of the stain normalization proposed previously. If there is no desire for a function-based interface we can remove the normalize_colors_macenko method and just keep the StainNormalizer class.

Either way is fine but I think it depends on cucim's approach for transforms. The previous ones had functional interfaces so it makes sense to keep only functional interfaces here.

Co-authored-by: Behrooz Hashemian <3968947+drbeh@users.noreply.github.com>

A threshold of 0.15 for a log10 scale is equivalent to a threshold of ~0.345 on a natural log scale.

Co-authored-by: Behrooz Hashemian <3968947+drbeh@users.noreply.github.com>

…tions

rename channel_shape -> spatial_shape for clarity

…normalizer_v2

gigony

Thank you so much @grlee77 ! It's been a long time since this implementation has been around.
Neha(author of original algorithm implementation, @nsrivathsa) would love this work :)

And thank you @drbeh for the great feedback!

gigony · 2022-05-19T07:16:47Z

python/cucim/src/cucim/core/operations/color/__init__.py


 __all__ = [
    "color_jitter",
    "rand_color_jitter"
+    "absorbance_to_image",
+    "image_to_absorbance",
+    'stain_extraction_macenko',


Nit: quotation mark is not consistent. May want to use " instead of ' in line 11 and 12.
Or, please stick to one and apply the same rule to other cases.

python/cucim/tests/unit/core/test_stain_normalizer.py

LICENSE-3rdparty.md

python/cucim/src/cucim/core/operations/color/stain_normalizer.py

Co-authored-by: Gigon Bae <gigony@gmail.com>

…tain_normalizer_v2

drbeh and others added 4 commits December 18, 2021 01:23

Implement H&E statin extractor and normalizer

74c4059

Implement unittests

0c679cb

Refactor stain normalizer

93afe55

add channel_axis kwarg to allow specifying which axis of the image corresponds to channels. Many performance-related updates.

refactor tests cases to use pytest.mark.parametrize

3b8edeb

This avoids the need to add a dependency on the parameterize package. Note that some expected results have been incremented by 1 due to a change from use of floor->round during float32->uint8 conversion

grlee77 requested a review from a team as a code owner May 11, 2022 20:08

grlee77 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels May 11, 2022

grlee77 added 2 commits May 11, 2022 16:15

flake8 and typing fixes

85c2725

Merge remote-tracking branch 'upstream/branch-22.06' into stain_norma…

01683ac

…lizer_v2

grlee77 added this to the v22.06.00 milestone May 11, 2022

drbeh mentioned this pull request May 12, 2022

Stain Normalizer #186

Closed

grlee77 commented May 12, 2022

View reviewed changes

update 3rd party license list

a04f44a

These two pathology tools were used as a reference when developing the Macenko color normalization algorithm.

drbeh reviewed May 12, 2022

View reviewed changes

grlee77 and others added 9 commits May 13, 2022 09:17

remove unused rowvar kwarg from _covariance

392667e

Apply suggestions from code review

85a47e5

Co-authored-by: Behrooz Hashemian <3968947+drbeh@users.noreply.github.com>

condense _image_to_absorbance for readability

a43d221

rescale default beta to a value equivalent to that in Macenko et. al.

e20d1d0

A threshold of 0.15 for a log10 scale is equivalent to a threshold of ~0.345 on a natural log scale.

consistently use channel_axis=0 as the default

4582b5d

rename decomposition->extraction

9e28c96

Apply suggestions from code review

e653c5b

Co-authored-by: Behrooz Hashemian <3968947+drbeh@users.noreply.github.com>

fix dtype comparison in _image_to_absorbance

7014f62

condense code in _absorbance_to_image* to improve readability

70ee701

grlee77 added 7 commits May 13, 2022 10:54

simplification: always use pseudo-inverse for computing raw concentra…

cb0419a

…tions

remove 2-channel assumption from _normalized_from_concentrations

a415fb5

rename channel_shape -> spatial_shape for clarity

fix mistake in _absorbance_to_image refactor

410a052

flake8 fix

8e0ed8c

Merge remote-tracking branch 'origin/stain_normalizer_v2' into stain_…

26cf9fd

…normalizer_v2

keep only the functional interface

661f5d4

isort

fc61eef

gigony approved these changes May 19, 2022

View reviewed changes

grlee77 and others added 5 commits May 24, 2022 13:45

rename functions *_macenko -> *_pca

d116acf

consistently use double quotes

23a4e6b

update copyright date

9fcec1a

Co-authored-by: Gigon Bae <gigony@gmail.com>

add 3rdparty StainTools license

b7d11e0

Merge branch 'stain_normalizer_v2' of github.com:grlee77/cucim into s…

a58a1d1

…tain_normalizer_v2

jakirkham self-requested a review June 1, 2022 22:36

jakirkham approved these changes Jun 4, 2022

View reviewed changes

ajschmidt8 merged commit e6856e5 into rapidsai:branch-22.06 Jun 6, 2022

gigony mentioned this pull request Jun 15, 2022

[DOC] Add example Jupyter notebooks with new features (stain normalization, batch-load API) #313

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stain Normalizer (v2) #273

Stain Normalizer (v2) #273

grlee77 commented May 11, 2022 •

edited

Loading

grlee77 commented May 11, 2022

drbeh commented May 12, 2022

grlee77 commented May 12, 2022

grlee77 May 12, 2022

grlee77 May 12, 2022

drbeh May 13, 2022

grlee77 May 12, 2022 •

edited

Loading

drbeh left a comment

drbeh May 12, 2022

grlee77 May 13, 2022

drbeh May 12, 2022

grlee77 May 13, 2022

drbeh May 13, 2022

grlee77 May 24, 2022

drbeh May 12, 2022

grlee77 May 13, 2022

grlee77 May 13, 2022

drbeh May 12, 2022

grlee77 May 13, 2022 •

edited

Loading

grlee77 May 13, 2022

drbeh May 13, 2022

drbeh commented May 12, 2022

gigony left a comment

gigony May 19, 2022

dtype, order	duration (s)
float64, order=F	0.15337538
float32, order=F	0.14860819
float64, order=C	0.00744207 (CUB disabled)
float32, order=C	0.00718318 (CUB disabled)
float64, order=C	0.00226073 (CUB enabled)
float32, order=C	0.00080919 (CUB enabled)

Stain Normalizer (v2) #273

Stain Normalizer (v2) #273

Conversation

grlee77 commented May 11, 2022 • edited Loading

Enhancements

General refactoring

Performance Related Changes

Test Changes

grlee77 commented May 11, 2022

drbeh commented May 12, 2022

grlee77 commented May 12, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grlee77 May 12, 2022 • edited Loading

Choose a reason for hiding this comment

drbeh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grlee77 May 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drbeh commented May 12, 2022

gigony left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grlee77 commented May 11, 2022 •

edited

Loading

grlee77 May 12, 2022 •

edited

Loading

grlee77 May 13, 2022 •

edited

Loading