Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stain Normalizer #186

Closed
wants to merge 2 commits into from
Closed

Conversation

drbeh
Copy link
Member

@drbeh drbeh commented Dec 18, 2021

Fix #96

This PR adds an stain extractor and normalizer for Hematoxylin and Eosin implemented in CuPy

@GPUtester
Copy link
Contributor

Can one of the admins verify this patch?

@quasiben
Copy link
Member

add to allowlist

@quasiben
Copy link
Member

Thanks for adding this @drbeh . However, I'm trying to find if there is an equivalent of this in scikit-image, do you know if there is ?

@drbeh
Copy link
Member Author

drbeh commented Dec 19, 2021

Hi @quasiben, I am not aware of an equivalent to this stain normalizer. The most relevant thin on scikit-image that I know is separate_stain, which is related to stain extractor part of this PR but as far as I know it is based on different methods.

It might be useful to see how we can connect these two and maybe leverage skimage for improving our stain normalizer.

@grlee77
Copy link
Contributor

grlee77 commented Mar 2, 2022

There has been a recent, related discussion on just having a normalization to Hematoxylin & Eosin in scikit-image at:
https://github.com/scikit-image/scikit-image/discussions/6065#discussioncomment-2278816

@drbeh
Copy link
Member Author

drbeh commented Mar 24, 2022

@gigony @quasiben taking a closer look at scikit-image, I realized that although the functionality of this PR is very related to stain separation in skimage, it is orthogonal/complementary to that.

  • In skimage, @grlee77 please correct me if I'm wrong, the current functionality is based on a given a Stain Matrix (hed_from_rgb), it separates the stains in an RGB image using skimage.color.rgb2hed.
  • However, here we first extract the Stain Matrix from the image (HEStainExtractor), and then use this matrix to normalize the RGB image (StainNormalizer).

@grlee77
Copy link
Contributor

grlee77 commented Mar 24, 2022

Yes, in scikit-image there is a fixed matrix that gets applied to do the stain normalization. The user could also apply their own matrix or use one of the other predefined ones with skimage.color.separate_stains/skimage.color.combine_stains. My understanding is that the same values are not ideal across different labs/experiments and need to be adjusted to some extent? (my research background is in medical imaging / radiology as opposed to microscopy)

@grlee77
Copy link
Contributor

grlee77 commented Mar 24, 2022

Ah, I see there is a reference to a Macenko et. al. paper in the docstrings. I will take a look at that to understand the method proposed here a bit better.

Comment on lines +72 to +73
absorbance = absorbance[cp.all(absorbance > self.beta, axis=1)]
if len(absorbance) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
absorbance = absorbance[cp.all(absorbance > self.beta, axis=1)]
if len(absorbance) == 0:
absorbance = absorbance[:, cp.all(absorbance > self.beta, axis=0)]
if absorbance.size == 0:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the prior line, it looks like this method expects an image with channels along the first axis.

I have been refactoring things while testing locally. I think we can support specification of a channel_axis argument to make it clear which axis of the input is expected to be channels.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear from the Marcenko et. al. publication whether this should ideally be all(absorbance > self.beta) as here or if some other operation like the mean or max absorbance would be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that everything that they have used for this filtering is based on heuristics. Even the proposed value for beta doesn't have any scientific justification.

"""
# calculate absorbance
image = image.astype(cp.float32, copy=False) + 1.0
absorbance = -cp.log(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marcenko et. al. uses log10 rather than log in the publication, but existing open source implementations I found all use the natural log. I guess either is valid and it only effects the absolute scale of the beta threshold. Empirically, computation of log is better optimized than log10, so we should probably keep this as is.

Copy link
Member Author

@drbeh drbeh May 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! It only affects beta and as long as we are using the correct inverse (exp for log and 10** for log10) both are fine.

@grlee77
Copy link
Contributor

grlee77 commented May 9, 2022

I agree that this Marcenko method is highly cited an seems like a good starting point. I see that in determining the stain vectors, least squares is used here, which is readily available in CuPy.

Other, related techniques use different solutions, but those can potentially be added later.

For example, there is a technique based on non-negative matrix factorization here:
https://github.com/InsightSoftwareConsortium/ITKColorNormalization

The Marcenko method is also implemented (under an MIT license) here: https://github.com/Peter554/StainTools
It differs from the one in this PR in that the normalization uses LASSO (via the SPAMS library) instead of least squares. I don't think we already have an equivalent solver on the GPU (at least it is not in CuPy or cuCIM, but I haven't fully checked other RAPIDS projects)

I will update this PR with specific suggestions later today after testing a bit more with some actual images (such as the ones used for the demo in the ITKColorNormalization repo).

@grlee77 grlee77 mentioned this pull request May 11, 2022
@drbeh
Copy link
Member Author

drbeh commented May 12, 2022

Closing this in favor of #273

@drbeh drbeh closed this May 12, 2022
ajschmidt8 pushed a commit that referenced this pull request Jun 6, 2022
closes #96

This PR resumes work that was started in #186. Given the large overall refactoring, it was not feasible to make the suggestions there as individual comments.

Overall the approach is the same Macenko method that was was proposed in #186. 

I spent quite a bit of time refactoring for performance and to separate out some aspects so that it will be easier to add additional related methods in the future. I find about 3x improvement for the case here vs. the one in #186.

A summary of the changes relative to #186 are:

### Enhancements
- added a `channel_axis` argument that can be used to specify which axis of the
  input array corresponds to color channels.
- more descriptive docstrings

### General refactoring
- Now also provides a function-based interface as well to more closely match
  the typical `cucim.skimage` style. The existing class-based interface was
  kept as well. There is a small amount of redundancy in providing both, so we
  should decide if this is worth it.
- Aspects like conversion to/from absorbance units was separated out into
  separate functions.

### Performance Related Changes

- Used `cupy.fuse` to fuse multiple kerenel operations needed for absorbance
  calculations into a single GPU kernel. This gives ~4x improvement in
  conversions to/from absorbance space.
- Added an `image_type` argument that defaults to 'intensity', but can be set
  to 'absorbance' to indicate that the image is already in absorbance space.
  This is used to avoid redundant conversions during stain normalization.
- Added a `_covariance` function that is a simplified and optimized version of
  `cupy.cov`. It runs 4x faster for me for the float32 test case I tried on a
  roughly size 2000x2000 image.
- Added a second 'ortho' method aside from least-squares for estimating the raw
  concentrations. The `_complement_stain_matrix` helper adds a third column
  that is orthogonal to the two estimated stain vectors, so that a standard
  matrix inverse can be used. This is much faster in practice than calling
  `cupy.linalg.lstsq` and gives identical result for almost all voxels in
  test images. A tiny fraction of voxels differed in uint8 intensity by a
  magnitude of 1, but this is likely just due differences in the rounding
  result of finite precision floating point values. This approach is based on
  the one used by HistomicsTK software (Apache 2.0 licensed).

### Test Changes
- refactored to use `pytest.mark.parametrize` instead of adding a dependency on [parameterized](https://github.com/wolever/parameterized)
- There is an increase by 1 in some expected pixel values in the "expected" result for some test cases due to replacement of casting to integers (floor operation) with the use of rounding upon conversion of `float32` back to `uint8` during color normalization.

Authors:
   - Behrooz Hashemian (https://github.com/drbeh)
   - Gregory Lee (https://github.com/grlee77)

Approvers:
   - Gigon Bae (https://github.com/gigony)
   - https://github.com/jakirkham
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add Stain Normalization operation (in Python)
5 participants