Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize data before pca #86

Closed
gdkrmr opened this issue Jan 15, 2019 · 5 comments
Closed

standardize data before pca #86

gdkrmr opened this issue Jan 15, 2019 · 5 comments

Comments

@gdkrmr
Copy link
Contributor

gdkrmr commented Jan 15, 2019

Currently there is no way to standardize data when doing a PCA, only to centralize.

@wildart
Copy link
Collaborator

wildart commented Jan 18, 2019

Usually, you would standardize data prior to fitting and do not mix two operations together. It's better not to mix two operations together.

FYI, there is data transformation PR pending: JuliaStats/StatsBase.jl#85

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Jan 21, 2019

What about the centering step, are you going to remove that?

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Jan 21, 2019

you should probably allow something like:

x = rand(3, 10)
z = fit(ZScoreTransform, x)
p = fit(PCA, transform(z, x))
transform((z, p), xnew)

or maybe even make the objects callable:

xnew |> z |>  p

and for the reconstruction:

y |> inv(p) |> inv(z)

@wildart
Copy link
Collaborator

wildart commented Jan 21, 2019

What about the centering step, are you going to remove that?

No, not really.

Transformation and reconstructions are following:

x = rand(3, 10)
# standardize data
T = fit(ZScoreTransform, x)
xstd = transform(Z, x)
# perform PCA
M = fit(PCA, xstd)
ysub = transform(M, xstd)
# reconstruct to original space
ystd = reconstruct(M, ysub)
# reconstruct to original scale
y = reconstruct(T, ystd)
x  y

Even thought pipeline X |> ZScoreTransform |> PCA looks appealing. It is beyond the scope of this package.

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Jan 31, 2019

What about the centering step, are you going to remove that?

No, not really.

This seems like an arbitrary choice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants