Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stain Normalizer (v2) #273

Merged
merged 28 commits into from
Jun 6, 2022
Merged

Conversation

grlee77
Copy link
Contributor

@grlee77 grlee77 commented May 11, 2022

closes #96

This PR resumes work that was started in #186. Given the large overall refactoring, it was not feasible to make the suggestions there as individual comments.

Overall the approach is the same Macenko method that was was proposed in #186.

I spent quite a bit of time refactoring for performance and to separate out some aspects so that it will be easier to add additional related methods in the future. I find about 3x improvement for the case here vs. the one in #186.

A summary of the changes relative to #186 are:

Enhancements

  • added a channel_axis argument that can be used to specify which axis of the
    input array corresponds to color channels.
  • more descriptive docstrings

General refactoring

  • Now also provides a function-based interface as well to more closely match
    the typical cucim.skimage style. The existing class-based interface was
    kept as well. There is a small amount of redundancy in providing both, so we
    should decide if this is worth it.
  • Aspects like conversion to/from absorbance units was separated out into
    separate functions.

Performance Related Changes

  • Used cupy.fuse to fuse multiple kerenel operations needed for absorbance
    calculations into a single GPU kernel. This gives ~4x improvement in
    conversions to/from absorbance space.
  • Added an image_type argument that defaults to 'intensity', but can be set
    to 'absorbance' to indicate that the image is already in absorbance space.
    This is used to avoid redundant conversions during stain normalization.
  • Added a _covariance function that is a simplified and optimized version of
    cupy.cov. It runs 4x faster for me for the float32 test case I tried on a
    roughly size 2000x2000 image.
  • Added a second 'ortho' method aside from least-squares for estimating the raw
    concentrations. The _complement_stain_matrix helper adds a third column
    that is orthogonal to the two estimated stain vectors, so that a standard
    matrix inverse can be used. This is much faster in practice than calling
    cupy.linalg.lstsq and gives identical result for almost all voxels in
    test images. A tiny fraction of voxels differed in uint8 intensity by a
    magnitude of 1, but this is likely just due differences in the rounding
    result of finite precision floating point values. This approach is based on
    the one used by HistomicsTK software (Apache 2.0 licensed).

Test Changes

  • refactored to use pytest.mark.parametrize instead of adding a dependency on parameterized
  • There is an increase by 1 in some expected pixel values in the "expected" result for some test cases due to replacement of casting to integers (floor operation) with the use of rounding upon conversion of float32 back to uint8 during color normalization.

drbeh and others added 4 commits December 18, 2021 01:23
add channel_axis kwarg to allow specifying which axis of the image corresponds to channels.

Many performance-related updates.
This avoids the need to add a dependency on the parameterize package.

Note that some expected results have been incremented by 1 due to a change from
use of floor->round during float32->uint8 conversion
@grlee77 grlee77 requested a review from a team as a code owner May 11, 2022 20:08
@grlee77 grlee77 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels May 11, 2022
@grlee77 grlee77 added this to the v22.06.00 milestone May 11, 2022
@grlee77
Copy link
Contributor Author

grlee77 commented May 11, 2022

@drbeh, please see this refactored version of the stain normalization proposed previously. If there is no desire for a function-based interface we can remove the normalize_colors_macenko method and just keep the StainNormalizer class.

@drbeh
Copy link
Member

drbeh commented May 12, 2022

Thank you very much @grlee77 for this upgraded version. I'll review it today.

@drbeh drbeh mentioned this pull request May 12, 2022
@grlee77
Copy link
Contributor Author

grlee77 commented May 12, 2022

Also, @thewtex pointed me to this method that was implemented in ITK:
https://github.com/InsightSoftwareConsortium/ITKColorNormalization

I can try to take a closer look at it at some point to estimate how easy it is to adapt to the GPU.

Comment on lines +372 to +377
# flip to ensure positive first coordinate so arctan2 angles are about 0
if ev[0, 0] < 0:
ev[:, 0] *= -1
if ev[0, 1] < 0:
ev[:, 1] *= -1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)

# channels_axis=0 for the shape (3, n_pixels) absorbance matrix
src_stain_coeff = stain_decomposition_macenko(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should perhaps change "decomposition"->"extraction". not sure why I had named it this way

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are fine to me since you need to decompose stains to extract them.
However, speaking on naming, do you think that we can give it a more representative name to this method? Rather than the family name of the first author of that paper, maybe we can come use the name of underlying method, since it is not a new method per se, and the paper just showed how they have used this method in histopathology.

RuntimeWarning, stacklevel=2)
fact = 0.0

X -= X.mean(axis=1, keepdims=True)
Copy link
Contributor Author

@grlee77 grlee77 May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this application, X here is always shape (3, n_pixels).

As an example justifying enforcing C-contiguous order above is given here.

Reduction along the last axis of an array of shape (3, 10_000_000) is much faster when this axis is contiguous in memory. It also benefits from enabling CUB in the environment via
CUPY_ACCELERATORS="cub"

dtype, order duration (s)
float64, order=F 0.15337538
float32, order=F 0.14860819
float64, order=C 0.00744207 (CUB disabled)
float32, order=C 0.00718318 (CUB disabled)
float64, order=C 0.00226073 (CUB enabled)
float32, order=C 0.00080919 (CUB enabled)

These two pathology tools were used as a reference when developing the
Macenko color normalization algorithm.
Copy link
Member

@drbeh drbeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grlee77 thank you very much for refactoring this stain normalizer. Overall it looks great but I left some comments in the code.

# slower for float64, which seems odd. Should further validate on
# additional hardware.
X = cp.asfortranarray(X)
out = X.dot(X.T.conj())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this use case we shouldn't have any complex number so we should be able to remove .conj.

If we remove rowvar and consider it True, then we can save the transpose on line 221 (with additional consideration on the following lines) and make this like X.T.dot(X).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in practice .conj() is near-instantaneous for real-valued inputs. Calling .T similarly is cheap as it doesn't make a copy, but just modifies the strides. Example

import cupy as cp
a = cp.ones((3, 1000000))
d = cp.cuda.Device()
%timeit a.T.conj(); d.synchronize()

gives:
1.21 µs ± 4.21 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

whereas just making a copy of a takes ~125 µs

# additional hardware.
X = cp.asfortranarray(X)
out = X.dot(X.T.conj())
out *= 1 / cp.float64(fact)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to separate it into a new line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was just to force the multiplication to be done in-place. Otherwise, I think another temporary array is created?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but anyways you have a division, so there isn't any gain over:

 out = X.dot(X.T.conj()) / cp.float64(fact)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will do the division on the host, so there is just one in-place multiplication

    out *= 1 / float(fact)

Comment on lines 4 to 11
__all__ = [
"color_jitter",
"rand_color_jitter"
"absorbance_to_image",
"image_to_absorbance",
"StainNormalizer",
"HEStainExtractor",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to add functional interfaces here? stain_decomposition_macenko and normalize_color_macenko

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I still need to change the tests to use those as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I updated this now and removed the class-based implementation. Does that seem fine? (I think we should chose only one or the other rather than providing both)

Comment on lines 422 to 425
# This approach relies on a square stain coeffs matrix as used by
# HistomicsTK. In practice, it gives nearly identical results to the
# least-squares approach.
coeff_inv = cp.linalg.inv(src_stain_coeff)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating the extra column for statin coefficients, can't we use pseudo-inverse (for rectangular matrices)? {X^T.X}^{-1}X^T

Copy link
Contributor Author

@grlee77 grlee77 May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do that. Two ways to compute are:
Directly computing via

coeff_inv = cp.dot(cp.linalg.inv(cp.dot(src_stain_coeff.T, src_stain_coeff)), src_stain_coeff.T)

is faster than calling cp.linalg.pinv

I think the only reason we might potentially want to the version with an additional column would be if we wanted to not discard that channel and use it to visualize what ended up NOT in the H or E channel.

I will go ahead and remove that for now and just use this pseudo-inverse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this pinv-based solution just "least squares"? Perhaps we should just remove the method argument for now and always use the psuedo-inverse code path.

I was thinking method could be useful later if we were to add additional methods like LASSO or a non-negative matrix factorization that would ensure non-negative concentrations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you added different methods but if the gain is not that much for now, we can keep only one of the methods and add the argument if needed in the future.

@drbeh
Copy link
Member

drbeh commented May 12, 2022

@drbeh, please see this refactored version of the stain normalization proposed previously. If there is no desire for a function-based interface we can remove the normalize_colors_macenko method and just keep the StainNormalizer class.

Either way is fine but I think it depends on cucim's approach for transforms. The previous ones had functional interfaces so it makes sense to keep only functional interfaces here.

Copy link
Contributor

@gigony gigony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @grlee77 ! It's been a long time since this implementation has been around.
Neha(author of original algorithm implementation, @nsrivathsa) would love this work :)

And thank you @drbeh for the great feedback!


__all__ = [
"color_jitter",
"rand_color_jitter"
"absorbance_to_image",
"image_to_absorbance",
'stain_extraction_macenko',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: quotation mark is not consistent. May want to use " instead of ' in line 11 and 12.
Or, please stick to one and apply the same rule to other cases.

python/cucim/tests/unit/core/test_stain_normalizer.py Outdated Show resolved Hide resolved
LICENSE-3rdparty.md Show resolved Hide resolved
@jakirkham jakirkham self-requested a review June 1, 2022 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improves an existing functionality non-breaking Introduces a non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add Stain Normalization operation (in Python)
5 participants