Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve decomposition documentation #1159

Closed
thomasaarholt opened this issue Jul 15, 2016 · 24 comments · Fixed by #2383
Closed

Improve decomposition documentation #1159

thomasaarholt opened this issue Jul 15, 2016 · 24 comments · Fixed by #2383

Comments

@thomasaarholt
Copy link
Contributor

thomasaarholt commented Jul 15, 2016

Currently the documentation for decomposition is very disorganised. The docstring currently looks like following. I think it could be useful to separate out a single general section, and a section/groups for each algorithm so that it's clearer what arguments one should be adjusting to get a correct result.

Signature: s1.decomposition(
    normalize_poissonian_noise=False, 
    algorithm='svd', 
    output_dimension=None, 
    centre=None, 
    auto_transpose=True, 
    navigation_mask=None, 
    signal_mask=None, 
    var_array=None, 
    var_func=None, 
    polyfit=None, 
    reproject=None, 
    return_info=False, 
    **kwargs)
Docstring:
Decomposition with a choice of algorithms

The results are stored in self.learning_results

Parameters
----------
normalize_poissonian_noise : bool
    If True, scale the SI to normalize Poissonian noise
algorithm : 'svd' | 'fast_svd' | 'mlpca' | 'fast_mlpca' | 'nmf' |
    'sparse_pca' | 'mini_batch_sparse_pca' | 'RPCA_GoDec' | 'ORPCA'
output_dimension : None or int
    number of components to keep/calculate
centre : None | 'variables' | 'trials'
    If None no centring is applied. If 'variable' the centring will be
    performed in the variable axis. If 'trials', the centring will be
    performed in the 'trials' axis. It only has effect when using the
    svd or fast_svd algorithms
auto_transpose : bool
    If True, automatically transposes the data to boost performance.
    Only has effect when using the svd of fast_svd algorithms.
navigation_mask : boolean numpy array
    The navigation locations marked as True are not used in the
    decompostion.
signal_mask : boolean numpy array
    The signal locations marked as True are not used in the
    decomposition.
var_array : numpy array
    Array of variance for the maximum likelihood PCA algorithm
var_func : function or numpy array
    If function, it will apply it to the dataset to obtain the
    var_array. Alternatively, it can a an array with the coefficients
    of a polynomial.
reproject : None | signal | navigation | both
    If not None, the results of the decomposition will be projected in
    the selected masked area.
return_info: bool, default False
    The result of the decomposition is stored internally. However, some algorithms generate some extra
    information that is not stored. If True (the default is False) return any extra information if available

Returns
-------
(X, E) : (numpy array, numpy array)
    If 'algorithm' == 'RPCA_GoDec' or 'ORPCA' and 'return_info' is True,
    returns the low-rank (X) and sparse (E) matrices from robust PCA.

See also
--------
plot_decomposition_factors, plot_decomposition_loadings, plot_lev
File:      ~/Dropbox/0_Git/GitHub Desktop/hyperspy_aar/hyperspy/learn/mva.py
Type:      method
@tjof2
Copy link
Contributor

tjof2 commented Jul 15, 2016

+1

Since apart from algorithm='NMF' all the decomposition algorithms are PCA-based and essentially denoise / reduce the dimensionality with orthogonality constraints, might now be the time to separate out NMF, which instead imposes non-negativity constraints? After all, ICA has its own functionality.

This would help with the docs, as dimensionality reduction & denoising (PCA/RPCA) are cleanly separated from NMF and ICA, which are what you'd actually use for interpretation!

@francisco-dlp @dnjohnstone ?

@francisco-dlp
Copy link
Member

Not sure about taking NMF out of decomposition as performing a decomposition is what it actually does.

@francisco-dlp francisco-dlp added this to the RELEASE_next_patch milestone Jul 15, 2016
@tjof2
Copy link
Contributor

tjof2 commented Jul 15, 2016

@francisco-dlp so does ICA?

Or at least, the distinction needs to be clearer

  • PCA - orthogonality constraint
  • NMF - non-negativity constraint
  • ICA - statistical independence constraint

After all, you can also have Sparse Component Analysis (SCA) for sparsity constraints.

@francisco-dlp
Copy link
Member

ICA estimates a mixing matrix to unmix the sources. Those sources in our case are often the result of dimensionality reduction through matrix decomposition methods. But I see your point, for some applications NMF could be seen as a blind source separation method and users may benefit from making it clearer that just NMF may be enough in some cases.

@tjof2
Copy link
Contributor

tjof2 commented Jul 15, 2016

Yes - that sort of distinction sounds sensible.

@francisco-dlp francisco-dlp removed this from the RELEASE_next_patch milestone Jul 18, 2016
@dnjohnstone
Copy link
Contributor

dnjohnstone commented Jul 25, 2016

Just to flag this was discussed quite a bit in #969 - A key point is that basically any decomposition approach in scikit-learn is available and the documentation needs to at least direct people to the relevant pages there so that people can know what the algorithms are.

Personally I'm not so keen on the isolation of ICA as something different from the others and would push to remove the method blind_source_separation() completely.

In scikit-learn for example ICA just within decomposition as in sklearn.decomposition.FastICA just like sklearn.decomposition.PCA --- to me this is its proper place.

In HyperSpy the separation comes from a practical point of typical workflows originally having been do PCA then do ICA (I know its for good reasons regarding noise) but nevertheless as far as I'm concerned they're both decomposition approaches of different flavours and we should be consistent with sklearn.

@tjof2
Copy link
Contributor

tjof2 commented Aug 1, 2016

I agree with @dnjohnstone's point about sklearn.decomposition. NMF and ICA do (from one point of view I accept...) just impose different constraints, and what if we wrap KernelPCA or introduce sparsity constraints ("SCA") in future? ICA starts to look increasingly out-of-place then in its own method.

@francisco-dlp
Copy link
Member

francisco-dlp commented Aug 1, 2016

Let's see if we agree with the following:

  1. All the algorithms in decomposition perform matrix decomposition (exact or approximate).
  2. Matrix decomposition algorithms sometimes produce directly interpretable components (when the constraints that they impose lead to interpretable results). In this sense they all can be regarded as BSS algorithms too.
  3. Often, decomposition algorithms are useful for dimensionality reduction but introduce constraints that don't lead to interpretable results .
  4. The algorithms in blind source separation perform BSS by calculating a mixing matrix that assumes that the original sources have some property e.g. independence. These algorithm can help unmixing the result of matrix decomposition algorithms. However, they don't perform decomposition, that's why they are not decomposition algorithms but "plain" BSS algorithms.

@dnjohnstone
Copy link
Contributor

I think I see your point but it seems most of the argument here is about which bit actually constitutes ICA... It seems to me that the general trend has been towards including 'the stuff that you need to do to your data to get independent components out' under the ICA banner. Probably this is because even if less formally clear - it's what most people want in practice.

The situation seems to be summed up pretty well in Hyvarien's most recent review where he says "Most ICA algorithms divide the estimation of the model into two steps: a preliminary whitening and the actual ICA estimation. Whitening means that the data are first linearly transformed by a matrix V such that Z=VX is white.... Such a matrix V can be easily found by PCA..."

It seems you're keen to keep the "actual ICA estimation" in its own method, which may have formal merit - but this seems a bit out of step with much of the literature I'm reading, which follow Hyvarien's "most ICA algorithms". There are also many papers that just like to consider all these situations as matrix factorisations under alternative constraints and point out that the ICA version of that only works when you've done your preprocessing.

Perhaps most importantly though it is out of step with scikit-learn, which seems the natural thing to keep consistent with.

@francisco-dlp
Copy link
Member

Of course whitening is a required pre-processing step for ICA and other BSS algorithms. Obviously, HyperSpy does whiten the data before passing it to ICA. However this does not make ICA a decomposition algorithm as the SVD is solely performed on the input of ICA for whitening purposes and this step is separate from the dimensionality reduction, which happens in decomposition.

I don't want to keeps things as they are just because I prioritise formal accuracy over practicality. It's just that matrix decomposition and mixing matrix estimation are two different things that need to be performed in two separated steps in order to keep the required flexibility.

I ignore why sklearn has decided to place ICA under the decomposition umbrella. However, in their case it doesn't really matter as performing decomposition and ICA remain a two step process. If we place ICA in decomposition, ICA becomes a 1 step process, and we'll be losing flexibility and clarity on the way.

@dnjohnstone
Copy link
Contributor

Why do you ignore why sklearn has decided to place ICA under the decomposition umbrella? That seems to be the key point? What makes this different from the usual arguments that "machine learning things should be left in their proper place" i.e. sklearn?

@tjof2
Copy link
Contributor

tjof2 commented Aug 1, 2016

In either case, this is all stuff that isn't clear in the docs at present! :-)

@francisco-dlp
Copy link
Member

The point is that we shouldn't take our decision based on what others did without knowing their motivation. Currently, I don't know why they took their decision. If somebody finds it out, please bring it to this discussion. Otherwise, I don't think that it is acceptable to use "X did Y" as an argument in this discussion.

@tjof2, I fully agree, this isn't clear in the docs.

@tjof2
Copy link
Contributor

tjof2 commented Aug 1, 2016

scikit-learn/scikit-learn#858 changed self.unmixing_matrix_ to self.components_ with little discussion about why... just to make it consistent with the other algorithms such as PCA etc.

@dnjohnstone
Copy link
Contributor

Ok... another point then, is signal decomposition = matrix decomposition (taken as a synonym for matrix factorisation)?

Would anyone object to the sentence "the signal was decomposed into its independent components after dimensionality reduction using SVD"? Using decompose as a synonym for separate.

Also how would this reduce flexibility? Surely it would still be possible to do s.decomposition('svd') and then sm = s.get_decomposition_model() and then s.decomposition('ICA').

I think there are also issues with the suggestion that ICA is the only way to blindly separate sources.

@tjof2
Copy link
Contributor

tjof2 commented Mar 10, 2020

Hoping I'll be able to address this soon.

@thomasaarholt
Copy link
Contributor Author

Adding here since it's the most general issue:
Centering isn't documented well:

centre : None | 'variables' | 'trials'
    If None (default), no centering is applied.
    If 'variable', the centring will be performed in the variable axis.
    If 'trials', the centring will be performed in the 'trials' axis.
    It only has effect when using the svd or fast_svd algorithms

(what are the variables and trials axes?)

@tjof2
Copy link
Contributor

tjof2 commented Mar 16, 2020

Agree @thomasaarholt , should be signal and navigation axes I think? Another way of saying np.mean(x, axis=?)

@thomasaarholt
Copy link
Contributor Author

I believe you're right, yes. We could use a dictionary to convert signal and navigator to the appropriate names (I don't know which one is which!).

@tjof2
Copy link
Contributor

tjof2 commented Mar 16, 2020

Pretty sure trials == samples in "sklearn" lingo, so variables == features.

@thomasaarholt
Copy link
Contributor Author

aaaaaand samples=navigator? :p

@tjof2
Copy link
Contributor

tjof2 commented Mar 16, 2020

I believe so!

@thomasaarholt
Copy link
Contributor Author

As @jeinsle pointed out, it would be nice if we clarified the use and connections between the following terminologies, which I believe are two sets of words for the same thing. I am not sure that these are the correct way around:

  • loadings, trials, samples
  • factors, variables, features

@tjof2
Copy link
Contributor

tjof2 commented Mar 31, 2020

Regarding centering, also worth documenting this in a neat way: https://gitter.im/hyperspy/hyperspy?at=5e8360d4f7cff9290c851d83

Francisco de la Peña @francisco-dlp
[...] the center feature is in HyperSpy just for completeness, but it doesn't make much sense to use it in our field. Most will say that without centering we are not performing PCA, and that's right: plain SVD works works better for our application.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants