-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multivariate curve resolution #2172
Multivariate curve resolution #2172
Conversation
…= 1 for Varimax Cleaned up MCR code Added removal of NaN's from Poissonian normalization
Some small changes
…= 1 for Varimax Cleaned up MCR code Added removal of NaN's from Poissonian normalization
Is it worth documenting why you might use this over the other methods? |
@AndrewHerzing and @jat255 are there any plans to get this to a point where it can be merged or should it be closed? |
We'll be looking at this in the next few days to get this wrapped up, so
please leave it open for now
…On Mon, Apr 6, 2020 at 1:51 AM Duncan N. Johnstone ***@***.***> wrote:
@AndrewHerzing <https://github.com/AndrewHerzing> and @jat255
<https://github.com/jat255> are there any plans to get this to a point
where it can be merged or should it be closed?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2172 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJYCXNMLMZP65RI4CZQGDLRLGCWTANCNFSM4HNP3VNA>
.
--
Joshua Taillon
www.joshuataillon.com
|
@jat255, any luck with this PR? |
@jat255 / @AndrewHerzing as a result of #2383 being merged, there will be some conflicts now as there were some significant changes to Happy to help you tidy them up - let me know. |
- resolve a few conflicts in mva.py - fix citation to pyMCR paper
Maybe @francisco-dlp has some comments, but my feeling is that this belongs as an algorithm in This is not the same as |
@tjof2
|
@CCampJr yes I completely agree with your comments on the method in general, it's in reference to the implementation here. From the doc in this PR
Hence asking for comments from @francisco-dlp given prior discussion in #1159 Rather than forcing the user to call And then say that |
I see. Thank you very much for the clarification (sorry: I'm the pyMCR author and have not worked with hyperspy) |
loadings = W | ||
factors = H.T | ||
|
||
elif is_sklearn_like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something wrong with the merge @jat255 if this has been deleted, this looks very wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, looks like you're right. This was a complicated merge, so I'm not surprised something went south. I'm working on getting the tests running again, which would have revealed this issue.
@tjof2, I agree with you in that MCR, as implemented here, is not that easy to classify. In my mind the issue is that this is not actually MCR but a particular workflow that involves MCR. If I understood this PR well, the method involves:
If that's the case, I think that this PR could be greatly simplified: steps 1 and 2 are already available in HyperSpy. What's not available is pyMCR. So why not simply making pyMCR work with HyperSpy so that users can follow this workflow and many other alternatives, rather than creating new methods to embed steps 2 and 3 into a single one? If that sounds sensible, then the issue of where to put step 3 (pyMCR) remains. Confusingly, pyMCR doesn't actually decompose the matrix itself, but rather "refines" a prior decomposition using constraints. That sounds like a BSS method, but unlike BSS methods, it does not estimate a mixing matrix and, therefore, it doesn't separate sources from a mixtures since the output components are not a linear combination of the input components. In our framework, I think it fits best in So then, how to best integrate pyMCR in HyperSpy? pyMCR provides great flexibility, and, therefore, fully wrapping it in HyperSpy would be inconvenient. Happily, in #2383 @tjof2 added support for external decomposition algorithms that are "sklearn-like". pyMCR is just partly "sklearn-like": it does implement a What do you all think? |
@francisco-dlp thanks for your input. While you are correct that steps 1 and 2 can already be done using HyperSpy's tools, I think even with good documentation, a lot of users are not going to realize this is necessary (or be confused by it), and will just see the MCR routine giving poor results. I think in general, we should strive to make these tools easy to use (since the machine learning capabilities are a lot of what brings in new users), but provide the flexibility to be powerful, where needed. I think in this case providing sensible defaults for the MCR operation makes sense from that viewpoint. I don't want to speak for @AndrewHerzing, but I think the origins of this PR was the fact that many NIST staff use a non-openly licensed software named AXSIA, which was one of the first developments in this area for blind separation of EDS spectrum images. It is used at NIST by the microanalysis researchers frequently due to its ease of use, even by those who are not generally comfortable with programming. Implementing similar features within HyperSpy makes it easier to bring those researchers onboard, since we can point to the pyMCR implementation and say "look, it can do what your non-free Matlab program does". |
Could you not do the SVD automatically then to avoid calling e.g. say you (a) require Then you call it once just like |
@tjof2 I think that approach sounds reasonable to me. @AndrewHerzing any thoughts? |
That seems like a reasonable approach to me. I only included the extra step originally as I was following the flow of how Also, I wanted to mention that MCR-based decomposition is used very widely in microanalysis, and is by no means limited to NIST. I think this could see quite a bit of use if people are made aware that the capability exists. |
Yes, very true! Just speaking from personal experience from my time in the division :) |
I fully agree that MCR is popular and useful. The whole point of HyperSpy is indeed making algorithms such as MCR more accessible. My concern is that this implementation of the feature is too constraining and too complex. Here there is a proof-of-concept of what I propose above. With those small changes to pyMCR it is possible to reproduce what's implemented here without modifying HyperSpy: from pymcr.mcr import McrAR
s.decomposition(True)
s.blind_source_separation(3, algorithm="orthomax")
s.decomposition(algorithm=McrAR(ST=s.get_bss_factors())
s.plot_decomposition_results() Of course it is a bit more verbose, but also a lot more flexible. In particular, in this way the user can choose the There is a small difference: the MCR decomposition cannot be performed with |
This seems like a very clean and straightforward implementation. I was unaware of the |
I agree, I think this would be the cleanest way to do it, especially for limiting dependencies and maintainability going forward. We'll work with @CCampJr to see if we can get these changes made in a way that benefits HS and doesn't impact his existing userbase. |
Great, I will pull request the changes then so that we can discuss then. |
Pretty sure this can be closed in favour of usnistgov/pyMCR#33? Which would allow: from pymcr.mcr import McrAR
s.decomposition(True)
s.blind_source_separation(3, algorithm="orthomax")
s.decomposition(algorithm=McrAR(fit_kwargs={'ST':s.get_bss_factors()})
s.plot_decomposition_results() |
Description of the change
This PR adds the capability to refine the results of an SVD decomposition via multivariate curve resolution plus a non-negativity constraint. This feature relies on an external package called PyMCR currently available via pip (documentation here: https://pages.nist.gov/pyMCR/). Similar to
nmf
andblind_source_separation
, this algorithm is applied after the primary SVD decomposition has been performed and the proper number of components to fit has been determined. This latter value is passed via theoutput_dimension
keyword. The only other option is given via the simplicity keyword which must be eitherspatial
orspectral
. Ifspatial
(default) is selected, a Varimax rotation is calculated on the SVD loadings. This rotation is applied to the factors which are then either left as is or flipped to make them predominantly positive and then negative values are truncated. Finally, the rotated factors are fit to the input data using the PyMCR fitter. Ifspectral
is chosen instead, the Varimax rotation is calculated on the factors themselves, which are then processed the same way as in the spatial case. A similar approach was published in the following paper:M. R. Keenan, Exploiting Spatial Domain Simplicity in Spectral Image Analysis https://doi.org/10.1002/sia.2949
Progress of the PR
Minimal example of the bug fix or the new feature