Multivariate curve resolution #2172

AndrewHerzing · 2019-05-16T20:21:00Z

Description of the change

This PR adds the capability to refine the results of an SVD decomposition via multivariate curve resolution plus a non-negativity constraint. This feature relies on an external package called PyMCR currently available via pip (documentation here: https://pages.nist.gov/pyMCR/). Similar to nmf and blind_source_separation, this algorithm is applied after the primary SVD decomposition has been performed and the proper number of components to fit has been determined. This latter value is passed via the output_dimension keyword. The only other option is given via the simplicity keyword which must be either spatial or spectral. If spatial (default) is selected, a Varimax rotation is calculated on the SVD loadings. This rotation is applied to the factors which are then either left as is or flipped to make them predominantly positive and then negative values are truncated. Finally, the rotated factors are fit to the input data using the PyMCR fitter. If spectral is chosen instead, the Varimax rotation is calculated on the factors themselves, which are then processed the same way as in the spatial case. A similar approach was published in the following paper:

M. R. Keenan, Exploiting Spatial Domain Simplicity in Spectral Image Analysis https://doi.org/10.1002/sia.2949

Progress of the PR

Minimal example of the bug fix or the new feature

>>> import hyperspy.api as hs
>>> import numpy as np
>>> s = hs.signals.Signal1D(np.arange(100,100,1000))
>>> s.decomposition(True)
>>> s.decomposition(algorithm='MCR', output_dimension=5, simplicity='spatial')

…= 1 for Varimax Cleaned up MCR code Added removal of NaN's from Poissonian normalization

Some small changes

…= 1 for Varimax Cleaned up MCR code Added removal of NaN's from Poissonian normalization

tjof2 · 2019-07-30T11:08:42Z

Is it worth documenting why you might use this over the other methods?

dnjohnstone · 2020-04-06T07:50:51Z

@AndrewHerzing and @jat255 are there any plans to get this to a point where it can be merged or should it be closed?

jat255 · 2020-04-09T15:20:36Z

We'll be looking at this in the next few days to get this wrapped up, so please leave it open for now

…

On Mon, Apr 6, 2020 at 1:51 AM Duncan N. Johnstone ***@***.***> wrote: @AndrewHerzing <https://github.com/AndrewHerzing> and @jat255 <https://github.com/jat255> are there any plans to get this to a point where it can be merged or should it be closed? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2172 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJYCXNMLMZP65RI4CZQGDLRLGCWTANCNFSM4HNP3VNA> .

-- Joshua Taillon www.joshuataillon.com

ericpre · 2020-04-24T18:42:26Z

@jat255, any luck with this PR?

tjof2 · 2020-05-01T10:31:06Z

@jat255 / @AndrewHerzing as a result of #2383 being merged, there will be some conflicts now as there were some significant changes to mva.py as well as the documentation.

Happy to help you tidy them up - let me know.

- resolve a few conflicts in mva.py - fix citation to pyMCR paper

tjof2 · 2020-05-06T17:41:00Z

Similar to nmf and blind_source_separation, this algorithm is applied after the primary SVD decomposition has been performed and the proper number of components to fit has been determined.

Maybe @francisco-dlp has some comments, but my feeling is that this belongs as an algorithm in s.blind_source_separation(algorithm=...) rather than s.decomposition(), since you have to do SVD first and then refine the result. This is similar to doing ICA or just a varimax rotation.

This is not the same as nmf since you don't need to do an SVD first.

CCampJr · 2020-05-06T17:46:03Z

@tjof2
I don't know how this is being incorporated exactly into hyperspy, but

There is no necessity to do SVD first
NMF is actually the same as MCR but MCR does not necessitate non-negativity (so in that regard, NMF is a constrained MCR)
MCR is a matrix decompositional method: it generates a spectra matrix and a concentration matrix

tjof2 · 2020-05-06T18:46:52Z

@CCampJr yes I completely agree with your comments on the method in general, it's in reference to the implementation here. From the doc in this PR

In MCR, the factored output of an SVD decomposition is used as the initial
input for an alternative least squares fit to the original data.

Hence asking for comments from @francisco-dlp given prior discussion in #1159

Rather than forcing the user to call s.decomposition() twice, why not do the SVD step under the hood?

And then say that output_dimension must be specified, which is already true for several other methods.

CCampJr · 2020-05-06T18:49:11Z

@CCampJr yes I completely agree with your comments on the method in general, it's in reference to the implementation here. From the doc in this PR

In MCR, the factored output of an SVD decomposition is used as the initial
input for an alternative least squares fit to the original data.

Hence asking for comments from @francisco-dlp given prior discussion in #1159

I see. Thank you very much for the clarification (sorry: I'm the pyMCR author and have not worked with hyperspy)

tjof2 · 2020-05-06T19:46:41Z

hyperspy/learn/mva.py

-                loadings = W
-                factors = H.T
-
-            elif is_sklearn_like:


Something wrong with the merge @jat255 if this has been deleted, this looks very wrong

Oops, looks like you're right. This was a complicated merge, so I'm not surprised something went south. I'm working on getting the tests running again, which would have revealed this issue.

francisco-dlp · 2020-05-07T14:51:42Z

@tjof2, I agree with you in that MCR, as implemented here, is not that easy to classify. In my mind the issue is that this is not actually MCR but a particular workflow that involves MCR. If I understood this PR well, the method involves:

A decomposition step for dimensionality reduction purposes using e.g. SVD
A blind source separation step using orthomax
A MCR step using pyMCR

If that's the case, I think that this PR could be greatly simplified: steps 1 and 2 are already available in HyperSpy. What's not available is pyMCR. So why not simply making pyMCR work with HyperSpy so that users can follow this workflow and many other alternatives, rather than creating new methods to embed steps 2 and 3 into a single one?

If that sounds sensible, then the issue of where to put step 3 (pyMCR) remains. Confusingly, pyMCR doesn't actually decompose the matrix itself, but rather "refines" a prior decomposition using constraints. That sounds like a BSS method, but unlike BSS methods, it does not estimate a mixing matrix and, therefore, it doesn't separate sources from a mixtures since the output components are not a linear combination of the input components. In our framework, I think it fits best in decomposition since it relies on a fit to s.data to refine the components. The input spectra or maps can be SVD "components", BSS components (from orthomax here, but other options are obviously possible) or something else altogether.

So then, how to best integrate pyMCR in HyperSpy? pyMCR provides great flexibility, and, therefore, fully wrapping it in HyperSpy would be inconvenient. Happily, in #2383 @tjof2 added support for external decomposition algorithms that are "sklearn-like". pyMCR is just partly "sklearn-like": it does implement a fit method, but it does not have a transform method nor a components_ attribute. If that could be implemented on the side of pyMCR (@CCampJr, is that possible) then pyMCR would be fully compatible with HyperSpy 1.6 and with any other package that expects the sklearn API.

What do you all think?

jat255 · 2020-05-07T15:23:07Z

@francisco-dlp thanks for your input. While you are correct that steps 1 and 2 can already be done using HyperSpy's tools, I think even with good documentation, a lot of users are not going to realize this is necessary (or be confused by it), and will just see the MCR routine giving poor results. I think in general, we should strive to make these tools easy to use (since the machine learning capabilities are a lot of what brings in new users), but provide the flexibility to be powerful, where needed. I think in this case providing sensible defaults for the MCR operation makes sense from that viewpoint.

I don't want to speak for @AndrewHerzing, but I think the origins of this PR was the fact that many NIST staff use a non-openly licensed software named AXSIA, which was one of the first developments in this area for blind separation of EDS spectrum images. It is used at NIST by the microanalysis researchers frequently due to its ease of use, even by those who are not generally comfortable with programming. Implementing similar features within HyperSpy makes it easier to bring those researchers onboard, since we can point to the pyMCR implementation and say "look, it can do what your non-free Matlab program does".

tjof2 · 2020-05-07T16:16:47Z

I think in this case providing sensible defaults for the MCR operation makes sense from that viewpoint.

Could you not do the SVD automatically then to avoid calling s.decomposition() twice?

e.g. say you (a) require output_dimension to be set, and (b) introduce an argument init=None vs init='svd', and selecting 'svd' performs the necessary factorization and then runs the varimax and finally MCR? I think that's cleaner, if you then document it. For example, the docs say that for a good result, SVD initialization is provided as an option.

Then you call it once just like nmf or any other method.

jat255 · 2020-05-07T17:29:38Z

@tjof2 I think that approach sounds reasonable to me. @AndrewHerzing any thoughts?

AndrewHerzing · 2020-05-07T17:34:12Z

That seems like a reasonable approach to me. I only included the extra step originally as I was following the flow of how blind_source_separation is used.

Also, I wanted to mention that MCR-based decomposition is used very widely in microanalysis, and is by no means limited to NIST. I think this could see quite a bit of use if people are made aware that the capability exists.

jat255 · 2020-05-07T17:40:03Z

Also, I wanted to mention that MCR-based decomposition is used very widely in microanalysis, and is by no means limited to NIST. I think this could see quite a bit of use if people are made aware that the capability exists.

Yes, very true! Just speaking from personal experience from my time in the division :)

francisco-dlp · 2020-05-08T11:25:14Z

I fully agree that MCR is popular and useful. The whole point of HyperSpy is indeed making algorithms such as MCR more accessible. My concern is that this implementation of the feature is too constraining and too complex. Here there is a proof-of-concept of what I propose above. With those small changes to pyMCR it is possible to reproduce what's implemented here without modifying HyperSpy:

from pymcr.mcr import McrAR
s.decomposition(True)
s.blind_source_separation(3, algorithm="orthomax")
s.decomposition(algorithm=McrAR(ST=s.get_bss_factors())
s.plot_decomposition_results()

Of course it is a bit more verbose, but also a lot more flexible. In particular, in this way the user can choose the ST and C signals at will, making it possible to use MCR to refine the output of e.g. ICA.

There is a small difference: the MCR decomposition cannot be performed with normalize_poissonian_noise=True, unlike in this PR. I don't think that that's critical because I bet that the poisson scaling is far less important in the refinement step (MCR). Also, a better way to deal with non-gaussian noise would be to implement e.g. WLS in pyMCR.

AndrewHerzing · 2020-05-08T13:42:07Z

This seems like a very clean and straightforward implementation. I was unaware of the sklearn_like capability as well as the orthomax option for blind_source_separation. So, assuming @CCampJr could implement these changes in pyMCR, it seems like maybe the best solution would be to add new sections to the Machine Learning section of the user guide to explain how to perform this type of operation. Does this sound reasonable?

jat255 · 2020-05-11T17:53:36Z

I agree, I think this would be the cleanest way to do it, especially for limiting dependencies and maintainability going forward. We'll work with @CCampJr to see if we can get these changes made in a way that benefits HS and doesn't impact his existing userbase.

francisco-dlp · 2020-05-13T09:40:49Z

Great, I will pull request the changes then so that we can discuss then.

tjof2 · 2020-09-08T10:37:54Z

Pretty sure this can be closed in favour of usnistgov/pyMCR#33? Which would allow:

from pymcr.mcr import McrAR
s.decomposition(True)
s.blind_source_separation(3, algorithm="orthomax")
s.decomposition(algorithm=McrAR(fit_kwargs={'ST':s.get_bss_factors()})
s.plot_decomposition_results()

AndrewHerzing and others added 30 commits February 27, 2019 09:46

Initial implementation of MCR-ALS fitting of MVSA decomposition

d8d946e

Utilized Hyperspy's Orthomax function for factor rotation with gamma …

e8ace56

…= 1 for Varimax Cleaned up MCR code Added removal of NaN's from Poissonian normalization

Changed calles to pymcr due to new version 0.3 function names

459ec9e

Added function to merge images into color composite

f1e1fae

Added functions to plot and get MCR factors and loadings

61d6676

Removed pytest_cache directory and added it to the .gitignore file

676438f

Merge branch 'MergeChannels' into MultivariateCurveResolution

f351e14

Changed order of color cycle in merge_color_channels

faa1766

Added orange and gray to color choices in merge_color_channels

0fb59a0

Fixed type in image list of merge_color_channels

456a9e6

add some verbosity controls (dependent on updates to pymcr

3fc7a89

minor string formatting changes

972a9a7

Merge pull request #1 from jat255/MultivariateCurveResolution

143ecdc

Some small changes

Separated_MergeChannels_and_MCR_branches

962516b

Redirected_PyMCR_output _to_logger

d9e20fb

First attempt to refactor MCR into decomposition()

f854062

Initial implementation of MCR-ALS fitting of MVSA decomposition

a95d536

Utilized Hyperspy's Orthomax function for factor rotation with gamma …

0c4fc6f

…= 1 for Varimax Cleaned up MCR code Added removal of NaN's from Poissonian normalization

Added functions to plot and get MCR factors and loadings

26cad2e

Removed pytest_cache directory and added it to the .gitignore file

a298553

Added function to merge images into color composite

c31e56f

Changed order of color cycle in merge_color_channels

8849b3f

Added orange and gray to color choices in merge_color_channels

6af8153

Fixed type in image list of merge_color_channels

6014afc

add some verbosity controls (dependent on updates to pymcr

1a69651

minor string formatting changes

98912c4

Separated_MergeChannels_and_MCR_branches

fc76947

Redirected_PyMCR_output _to_logger

0c30728

Rebased on RELEASE_next_minor and resolved conflicts

4ffe9f6

Made MCR fitter a single function

08078db

tjof2 mentioned this pull request Apr 12, 2020

Release v1.6 #2251

Closed

17 tasks

tjof2 mentioned this pull request Apr 27, 2020

Improve machine learning API and documentation #2383

Merged

5 tasks

Merge branch 'RELEASE_next_minor' into MultivariateCurveResolution

d1914a6

- resolve a few conflicts in mva.py - fix citation to pyMCR paper

jat255 mentioned this pull request May 6, 2020

MCR Fails When All-zero Spectral Channels Present usnistgov/pyMCR#27

Open

tjof2 reviewed May 6, 2020

View reviewed changes

francisco-dlp mentioned this pull request May 13, 2020

Make the McrAR API more sklearn-like usnistgov/pyMCR#32

Open

CCampJr mentioned this pull request Jun 8, 2020

Sklearn-like functionality and HyperSpy compatibility usnistgov/pyMCR#33

Merged

tjof2 closed this Sep 8, 2020

ericpre removed the release: next minor label Apr 26, 2022

ericpre removed the status: WIP label May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multivariate curve resolution #2172

Multivariate curve resolution #2172

AndrewHerzing commented May 16, 2019 •

edited

Loading

tjof2 commented Jul 30, 2019

dnjohnstone commented Apr 6, 2020

jat255 commented Apr 9, 2020 via email

ericpre commented Apr 24, 2020

tjof2 commented May 1, 2020

tjof2 commented May 6, 2020

CCampJr commented May 6, 2020 •

edited

Loading

tjof2 commented May 6, 2020 •

edited

Loading

CCampJr commented May 6, 2020

tjof2 May 6, 2020 •

edited

Loading

jat255 May 6, 2020

francisco-dlp commented May 7, 2020 •

edited

Loading

jat255 commented May 7, 2020

tjof2 commented May 7, 2020 •

edited

Loading

jat255 commented May 7, 2020

AndrewHerzing commented May 7, 2020

jat255 commented May 7, 2020

francisco-dlp commented May 8, 2020 •

edited

Loading

AndrewHerzing commented May 8, 2020

jat255 commented May 11, 2020

francisco-dlp commented May 13, 2020

tjof2 commented Sep 8, 2020 •

edited

Loading

Multivariate curve resolution #2172

Multivariate curve resolution #2172

Conversation

AndrewHerzing commented May 16, 2019 • edited Loading

Description of the change

Progress of the PR

Minimal example of the bug fix or the new feature

tjof2 commented Jul 30, 2019

dnjohnstone commented Apr 6, 2020

jat255 commented Apr 9, 2020 via email

ericpre commented Apr 24, 2020

tjof2 commented May 1, 2020

tjof2 commented May 6, 2020

CCampJr commented May 6, 2020 • edited Loading

tjof2 commented May 6, 2020 • edited Loading

CCampJr commented May 6, 2020

tjof2 May 6, 2020 • edited Loading

Choose a reason for hiding this comment

jat255 May 6, 2020

Choose a reason for hiding this comment

francisco-dlp commented May 7, 2020 • edited Loading

jat255 commented May 7, 2020

tjof2 commented May 7, 2020 • edited Loading

jat255 commented May 7, 2020

AndrewHerzing commented May 7, 2020

jat255 commented May 7, 2020

francisco-dlp commented May 8, 2020 • edited Loading

AndrewHerzing commented May 8, 2020

jat255 commented May 11, 2020

francisco-dlp commented May 13, 2020

tjof2 commented Sep 8, 2020 • edited Loading

AndrewHerzing commented May 16, 2019 •

edited

Loading

CCampJr commented May 6, 2020 •

edited

Loading

tjof2 commented May 6, 2020 •

edited

Loading

tjof2 May 6, 2020 •

edited

Loading

francisco-dlp commented May 7, 2020 •

edited

Loading

tjof2 commented May 7, 2020 •

edited

Loading

francisco-dlp commented May 8, 2020 •

edited

Loading

tjof2 commented Sep 8, 2020 •

edited

Loading