DOC Mention that loadings are found in the components_ attribute #3322

Closed
wants to merge 2 commits into
from

Projects

None yet

7 participants

@FedericoV
Contributor

Changed the docstring for the components_ attribute to mention that the loadings are found here.

@coveralls

Coverage Status

Coverage remained the same when pulling c9002f8 on FedericoV:pca-loadings into c29ef67 on scikit-learn:master.

@eickenberg eickenberg and 1 other commented on an outdated diff Jun 27, 2014
sklearn/decomposition/pca.py
@@ -144,7 +144,8 @@ class PCA(BaseEstimator, TransformerMixin):
Attributes
----------
`components_` : array, [n_components, n_features]
- Components with maximum variance.
+ How much the original components contribute to the new maximum \
+ variance basis. Also known as loadings
@eickenberg
eickenberg Jun 27, 2014 Contributor

I find this description rather misleading, since it suggests that it makes sense to examine single entries of a principal axis vector although the idea is to represent how features co vary "best". I would prefer something like

components_
    Principal axes, representing the directions (in feature space) of maximum variance in the data.
    These are also known as loading vectors.
@FedericoV
FedericoV Jun 27, 2014 Contributor

I agree, I think your way is a lot more clear.

@FedericoV
FedericoV Jun 30, 2014 Contributor

Changed the commit message to your suggestion.

@coveralls

Coverage Status

Coverage remained the same when pulling fadd2fc on FedericoV:pca-loadings into c29ef67 on scikit-learn:master.

@jnothman jnothman commented on the diff Jul 2, 2014
sklearn/decomposition/pca.py
@@ -144,7 +144,9 @@ class PCA(BaseEstimator, TransformerMixin):
Attributes
----------
`components_` : array, [n_components, n_features]
- Components with maximum variance.
+ Principal axes, representing the directions (in feature space) of \
+ maximum variance in the data. These are also known as loading \
@jnothman
jnothman Jul 2, 2014 Member

These \s force line-breaks and aren't appropriate here or below.

@jnothman jnothman commented on the diff Jul 2, 2014
sklearn/decomposition/pca.py
@@ -144,7 +144,9 @@ class PCA(BaseEstimator, TransformerMixin):
Attributes
----------
`components_` : array, [n_components, n_features]
- Components with maximum variance.
+ Principal axes, representing the directions (in feature space) of \
@jnothman
jnothman Jul 2, 2014 Member

Can we move "in feature space" to immediately after "Principal axes"? Then I think this is quite clear.

@vene
Member
vene commented Jul 2, 2014

Should we think what if somebody ctrl+f's for the word "components", "principal components" or "loadings" (which btw I found very confusing, some people use it to mean coefs in feature spaces, others in instance space)?

@eickenberg
Contributor

(+1,) * 2

Loadings is definitely confusing -- hearing the word I would have also imagined "to what extent a sample loads the PC representation" ..., but reading wikipedia it looks like this is an established term for the vectors themselves.

For people who want to grep for components, one could change the text to

Principal components, orthogonal axes in feature space representing directions of maximal variance in the data. These are also known as PCA weight vectors, loading vectors or loadings.

(I sincerely hope I am not making it worse ... :))

(I edited it a bit, taking into account all words found in the wikipedia entry.)

@eickenberg
Contributor

@vene apparently the coefs in PC feature space of a sample are called scores (again wiki)

@vene
Member
vene commented Jul 2, 2014

Given the ambiguity I'd not include loadings or scores at all. I'm not particularly crazy about the wikipedia page on PCA.

@mjbommar
Contributor
mjbommar commented Jul 2, 2014

@vene, I would +1 loadings but not scores. I recall textbooks from grad school that used "loadings" to refer to the vectors.

Additionally, we might consider what others coming from a language like SAS might be looking for. SAS uses "loading" in its documentation:
http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_factor_sect028.htm

@vene
Member
vene commented Jul 2, 2014

I won't be opposed to using "loadings" as long as it's only at the end, in an "also known as" part. My problem with the word is that I remember having encountered papers that use it in both ways. As long as there's no ambiguity, it's good to be more ctrl-f-able.

@vene
Member
vene commented Jul 14, 2014

Should we go with @eickenberg's proposal?

@GaelVaroquaux
Member

I agree that most often these are not known as loadings, as the loadings are the other side of the decomposition.

Also (as mentioned above) the "" is not necessary, and should be removed.

@GaelVaroquaux
Member

I agree that most often these are not known as loadings, as the loadings are the other side of the decomposition.

According to wikipedia I am wrong. So I am 👍 for the docstring the way it is written currently. We just need the '' to be removed.

@eickenberg
Contributor

On Mon, Jul 14, 2014 at 4:51 PM, Gael Varoquaux notifications@github.com
wrote:

I agree that most often these are not known as loadings, as the loadings
are the other side of the decomposition.

I think this may be a machine learner's bias. Stats seem to call it the
other way round

Also (as mentioned above) the "" is not necessary, and should be removed.


Reply to this email directly or view it on GitHub
#3322 (comment)
.

@vene
Member
vene commented Jul 14, 2014

I am irked when the same word is used to mean opposite things. We can either confuse half the readers (by referring to one of them as loadings) or all readers (by referring to both as loadings).

@amueller amueller added a commit to amueller/scikit-learn that referenced this pull request Jun 8, 2015
@amueller amueller Better docstring for PCA, closes #3322. 9a00673
@amueller amueller added a commit to amueller/scikit-learn that referenced this pull request Jun 9, 2015
@amueller amueller Better docstring for PCA, closes #3322. 41e10c4
@amueller amueller closed this in #4836 Jul 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment