Improve decomposition documentation #1159

thomasaarholt · 2016-07-15T14:40:38Z

Currently the documentation for decomposition is very disorganised. The docstring currently looks like following. I think it could be useful to separate out a single general section, and a section/groups for each algorithm so that it's clearer what arguments one should be adjusting to get a correct result.

Signature: s1.decomposition(
    normalize_poissonian_noise=False, 
    algorithm='svd', 
    output_dimension=None, 
    centre=None, 
    auto_transpose=True, 
    navigation_mask=None, 
    signal_mask=None, 
    var_array=None, 
    var_func=None, 
    polyfit=None, 
    reproject=None, 
    return_info=False, 
    **kwargs)
Docstring:
Decomposition with a choice of algorithms

The results are stored in self.learning_results

Parameters
----------
normalize_poissonian_noise : bool
    If True, scale the SI to normalize Poissonian noise
algorithm : 'svd' | 'fast_svd' | 'mlpca' | 'fast_mlpca' | 'nmf' |
    'sparse_pca' | 'mini_batch_sparse_pca' | 'RPCA_GoDec' | 'ORPCA'
output_dimension : None or int
    number of components to keep/calculate
centre : None | 'variables' | 'trials'
    If None no centring is applied. If 'variable' the centring will be
    performed in the variable axis. If 'trials', the centring will be
    performed in the 'trials' axis. It only has effect when using the
    svd or fast_svd algorithms
auto_transpose : bool
    If True, automatically transposes the data to boost performance.
    Only has effect when using the svd of fast_svd algorithms.
navigation_mask : boolean numpy array
    The navigation locations marked as True are not used in the
    decompostion.
signal_mask : boolean numpy array
    The signal locations marked as True are not used in the
    decomposition.
var_array : numpy array
    Array of variance for the maximum likelihood PCA algorithm
var_func : function or numpy array
    If function, it will apply it to the dataset to obtain the
    var_array. Alternatively, it can a an array with the coefficients
    of a polynomial.
reproject : None | signal | navigation | both
    If not None, the results of the decomposition will be projected in
    the selected masked area.
return_info: bool, default False
    The result of the decomposition is stored internally. However, some algorithms generate some extra
    information that is not stored. If True (the default is False) return any extra information if available

Returns
-------
(X, E) : (numpy array, numpy array)
    If 'algorithm' == 'RPCA_GoDec' or 'ORPCA' and 'return_info' is True,
    returns the low-rank (X) and sparse (E) matrices from robust PCA.

See also
--------
plot_decomposition_factors, plot_decomposition_loadings, plot_lev
File:      ~/Dropbox/0_Git/GitHub Desktop/hyperspy_aar/hyperspy/learn/mva.py
Type:      method

tjof2 · 2016-07-15T15:18:27Z

+1

Since apart from algorithm='NMF' all the decomposition algorithms are PCA-based and essentially denoise / reduce the dimensionality with orthogonality constraints, might now be the time to separate out NMF, which instead imposes non-negativity constraints? After all, ICA has its own functionality.

This would help with the docs, as dimensionality reduction & denoising (PCA/RPCA) are cleanly separated from NMF and ICA, which are what you'd actually use for interpretation!

@francisco-dlp @dnjohnstone ?

francisco-dlp · 2016-07-15T15:25:51Z

Not sure about taking NMF out of decomposition as performing a decomposition is what it actually does.

tjof2 · 2016-07-15T15:30:42Z

@francisco-dlp so does ICA?

Or at least, the distinction needs to be clearer

PCA - orthogonality constraint
NMF - non-negativity constraint
ICA - statistical independence constraint

After all, you can also have Sparse Component Analysis (SCA) for sparsity constraints.

francisco-dlp · 2016-07-15T16:00:23Z

ICA estimates a mixing matrix to unmix the sources. Those sources in our case are often the result of dimensionality reduction through matrix decomposition methods. But I see your point, for some applications NMF could be seen as a blind source separation method and users may benefit from making it clearer that just NMF may be enough in some cases.

tjof2 · 2016-07-15T16:03:27Z

Yes - that sort of distinction sounds sensible.

dnjohnstone · 2016-07-25T09:41:31Z

Just to flag this was discussed quite a bit in #969 - A key point is that basically any decomposition approach in scikit-learn is available and the documentation needs to at least direct people to the relevant pages there so that people can know what the algorithms are.

Personally I'm not so keen on the isolation of ICA as something different from the others and would push to remove the method blind_source_separation() completely.

In scikit-learn for example ICA just within decomposition as in sklearn.decomposition.FastICA just like sklearn.decomposition.PCA --- to me this is its proper place.

In HyperSpy the separation comes from a practical point of typical workflows originally having been do PCA then do ICA (I know its for good reasons regarding noise) but nevertheless as far as I'm concerned they're both decomposition approaches of different flavours and we should be consistent with sklearn.

tjof2 · 2016-08-01T07:54:49Z

I agree with @dnjohnstone's point about sklearn.decomposition. NMF and ICA do (from one point of view I accept...) just impose different constraints, and what if we wrap KernelPCA or introduce sparsity constraints ("SCA") in future? ICA starts to look increasingly out-of-place then in its own method.

francisco-dlp · 2016-08-01T08:15:56Z

Let's see if we agree with the following:

All the algorithms in decomposition perform matrix decomposition (exact or approximate).
Matrix decomposition algorithms sometimes produce directly interpretable components (when the constraints that they impose lead to interpretable results). In this sense they all can be regarded as BSS algorithms too.
Often, decomposition algorithms are useful for dimensionality reduction but introduce constraints that don't lead to interpretable results .
The algorithms in blind source separation perform BSS by calculating a mixing matrix that assumes that the original sources have some property e.g. independence. These algorithm can help unmixing the result of matrix decomposition algorithms. However, they don't perform decomposition, that's why they are not decomposition algorithms but "plain" BSS algorithms.

dnjohnstone · 2016-08-01T09:18:06Z

I think I see your point but it seems most of the argument here is about which bit actually constitutes ICA... It seems to me that the general trend has been towards including 'the stuff that you need to do to your data to get independent components out' under the ICA banner. Probably this is because even if less formally clear - it's what most people want in practice.

The situation seems to be summed up pretty well in Hyvarien's most recent review where he says "Most ICA algorithms divide the estimation of the model into two steps: a preliminary whitening and the actual ICA estimation. Whitening means that the data are first linearly transformed by a matrix V such that Z=VX is white.... Such a matrix V can be easily found by PCA..."

It seems you're keen to keep the "actual ICA estimation" in its own method, which may have formal merit - but this seems a bit out of step with much of the literature I'm reading, which follow Hyvarien's "most ICA algorithms". There are also many papers that just like to consider all these situations as matrix factorisations under alternative constraints and point out that the ICA version of that only works when you've done your preprocessing.

Perhaps most importantly though it is out of step with scikit-learn, which seems the natural thing to keep consistent with.

francisco-dlp · 2016-08-01T09:47:42Z

Of course whitening is a required pre-processing step for ICA and other BSS algorithms. Obviously, HyperSpy does whiten the data before passing it to ICA. However this does not make ICA a decomposition algorithm as the SVD is solely performed on the input of ICA for whitening purposes and this step is separate from the dimensionality reduction, which happens in decomposition.

I don't want to keeps things as they are just because I prioritise formal accuracy over practicality. It's just that matrix decomposition and mixing matrix estimation are two different things that need to be performed in two separated steps in order to keep the required flexibility.

I ignore why sklearn has decided to place ICA under the decomposition umbrella. However, in their case it doesn't really matter as performing decomposition and ICA remain a two step process. If we place ICA in decomposition, ICA becomes a 1 step process, and we'll be losing flexibility and clarity on the way.

dnjohnstone · 2016-08-01T09:54:25Z

Why do you ignore why sklearn has decided to place ICA under the decomposition umbrella? That seems to be the key point? What makes this different from the usual arguments that "machine learning things should be left in their proper place" i.e. sklearn?

tjof2 · 2016-08-01T10:04:07Z

In either case, this is all stuff that isn't clear in the docs at present! :-)

francisco-dlp · 2016-08-01T10:05:29Z

The point is that we shouldn't take our decision based on what others did without knowing their motivation. Currently, I don't know why they took their decision. If somebody finds it out, please bring it to this discussion. Otherwise, I don't think that it is acceptable to use "X did Y" as an argument in this discussion.

@tjof2, I fully agree, this isn't clear in the docs.

tjof2 · 2016-08-01T10:12:01Z

scikit-learn/scikit-learn#858 changed self.unmixing_matrix_ to self.components_ with little discussion about why... just to make it consistent with the other algorithms such as PCA etc.

dnjohnstone · 2016-08-01T10:40:39Z

Ok... another point then, is signal decomposition = matrix decomposition (taken as a synonym for matrix factorisation)?

Would anyone object to the sentence "the signal was decomposed into its independent components after dimensionality reduction using SVD"? Using decompose as a synonym for separate.

Also how would this reduce flexibility? Surely it would still be possible to do s.decomposition('svd') and then sm = s.get_decomposition_model() and then s.decomposition('ICA').

I think there are also issues with the suggestion that ICA is the only way to blindly separate sources.

tjof2 · 2020-03-10T10:51:16Z

Hoping I'll be able to address this soon.

thomasaarholt · 2020-03-16T18:34:35Z

Adding here since it's the most general issue:
Centering isn't documented well:

centre : None | 'variables' | 'trials'
    If None (default), no centering is applied.
    If 'variable', the centring will be performed in the variable axis.
    If 'trials', the centring will be performed in the 'trials' axis.
    It only has effect when using the svd or fast_svd algorithms

(what are the variables and trials axes?)

tjof2 · 2020-03-16T19:01:48Z

Agree @thomasaarholt , should be signal and navigation axes I think? Another way of saying np.mean(x, axis=?)

thomasaarholt · 2020-03-16T20:06:13Z

I believe you're right, yes. We could use a dictionary to convert signal and navigator to the appropriate names (I don't know which one is which!).

tjof2 · 2020-03-16T20:07:39Z

Pretty sure trials == samples in "sklearn" lingo, so variables == features.

thomasaarholt · 2020-03-16T20:41:25Z

aaaaaand samples=navigator? :p

tjof2 · 2020-03-16T21:09:03Z

I believe so!

thomasaarholt · 2020-03-30T11:41:55Z

As @jeinsle pointed out, it would be nice if we clarified the use and connections between the following terminologies, which I believe are two sets of words for the same thing. I am not sure that these are the correct way around:

loadings, trials, samples
factors, variables, features

tjof2 · 2020-03-31T15:28:51Z

Regarding centering, also worth documenting this in a neat way: https://gitter.im/hyperspy/hyperspy?at=5e8360d4f7cff9290c851d83

Francisco de la Peña @francisco-dlp
[...] the center feature is in HyperSpy just for completeness, but it doesn't make much sense to use it in our field. Most will say that without centering we are not performing PCA, and that's right: plain SVD works works better for our application.

francisco-dlp added type: proposal affects: documentation labels Jul 15, 2016

francisco-dlp added this to the RELEASE_next_patch milestone Jul 15, 2016

francisco-dlp added the release: next patch label Jul 18, 2016

francisco-dlp removed this from the RELEASE_next_patch milestone Jul 18, 2016

tjof2 mentioned this issue Mar 3, 2020

ENH: Add subspace tracking to ORNMF (consistent with ORPCA) #2035

Merged

tjof2 mentioned this issue Mar 13, 2020

mini_batch_sparse_pca not working #1361

Closed

tjof2 mentioned this issue Mar 16, 2020

Printing important decomposition information #2336

Merged

5 tasks

This was referenced Apr 6, 2020

s.decomposition() should raise an error on unsuccessful decomposition #1000

Closed

Fix maximum-likelihood PCA method #2352

Merged

This was referenced Apr 24, 2020

RPCA #2277

Closed

Release v1.6 #2251

Closed

Improve machine learning API and documentation #2383

Merged

francisco-dlp closed this as completed in #2383 Apr 30, 2020

tjof2 mentioned this issue May 6, 2020

Multivariate curve resolution #2172

Closed

5 tasks

ericpre removed the release: next patch label Jan 5, 2021

ericpre added this to the v1.6 milestone Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve decomposition documentation #1159

Improve decomposition documentation #1159

thomasaarholt commented Jul 15, 2016 •

edited

tjof2 commented Jul 15, 2016 •

edited

francisco-dlp commented Jul 15, 2016

tjof2 commented Jul 15, 2016 •

edited

francisco-dlp commented Jul 15, 2016

tjof2 commented Jul 15, 2016

dnjohnstone commented Jul 25, 2016 •

edited

tjof2 commented Aug 1, 2016 •

edited

francisco-dlp commented Aug 1, 2016 •

edited

dnjohnstone commented Aug 1, 2016

francisco-dlp commented Aug 1, 2016

dnjohnstone commented Aug 1, 2016

tjof2 commented Aug 1, 2016

francisco-dlp commented Aug 1, 2016

tjof2 commented Aug 1, 2016 •

edited

dnjohnstone commented Aug 1, 2016

tjof2 commented Mar 10, 2020

thomasaarholt commented Mar 16, 2020

tjof2 commented Mar 16, 2020 •

edited

thomasaarholt commented Mar 16, 2020

tjof2 commented Mar 16, 2020 •

edited

thomasaarholt commented Mar 16, 2020

tjof2 commented Mar 16, 2020

thomasaarholt commented Mar 30, 2020

tjof2 commented Mar 31, 2020

Improve decomposition documentation #1159

Improve decomposition documentation #1159

Comments

thomasaarholt commented Jul 15, 2016 • edited

tjof2 commented Jul 15, 2016 • edited

francisco-dlp commented Jul 15, 2016

tjof2 commented Jul 15, 2016 • edited

francisco-dlp commented Jul 15, 2016

tjof2 commented Jul 15, 2016

dnjohnstone commented Jul 25, 2016 • edited

tjof2 commented Aug 1, 2016 • edited

francisco-dlp commented Aug 1, 2016 • edited

dnjohnstone commented Aug 1, 2016

francisco-dlp commented Aug 1, 2016

dnjohnstone commented Aug 1, 2016

tjof2 commented Aug 1, 2016

francisco-dlp commented Aug 1, 2016

tjof2 commented Aug 1, 2016 • edited

dnjohnstone commented Aug 1, 2016

tjof2 commented Mar 10, 2020

thomasaarholt commented Mar 16, 2020

tjof2 commented Mar 16, 2020 • edited

thomasaarholt commented Mar 16, 2020

tjof2 commented Mar 16, 2020 • edited

thomasaarholt commented Mar 16, 2020

tjof2 commented Mar 16, 2020

thomasaarholt commented Mar 30, 2020

tjof2 commented Mar 31, 2020

thomasaarholt commented Jul 15, 2016 •

edited

tjof2 commented Jul 15, 2016 •

edited

tjof2 commented Jul 15, 2016 •

edited

dnjohnstone commented Jul 25, 2016 •

edited

tjof2 commented Aug 1, 2016 •

edited

francisco-dlp commented Aug 1, 2016 •

edited

tjof2 commented Aug 1, 2016 •

edited

tjof2 commented Mar 16, 2020 •

edited

tjof2 commented Mar 16, 2020 •

edited