MRG add MinMaxScaler, #1111 #1131

Merged
merged 0 commits into from Sep 23, 2012

Conversation

Projects
None yet
5 participants
@amueller
Member

amueller commented Sep 7, 2012

This adds a MinMax Scaler as described in #1111.

Unfortunately, I couldn't figure out how to get column-wise max and min in sparse matrices. Any hints?

Thanks,
Andy

@mblondel

View changes

sklearn/preprocessing.py
+ return X
+
+
+class UnitVarianceScaler(BaseEstimator, TransformerMixin):

This comment has been minimized.

@mblondel

mblondel Sep 9, 2012

Member

But it's not only scaling the variance, it's also centering. ZvalueScaler or StandardScaler maybe?

http://en.wikipedia.org/wiki/Standard_score

@mblondel

mblondel Sep 9, 2012

Member

But it's not only scaling the variance, it's also centering. ZvalueScaler or StandardScaler maybe?

http://en.wikipedia.org/wiki/Standard_score

This comment has been minimized.

@ogrisel

ogrisel Sep 9, 2012

Member

+1 for StandardScaler

@ogrisel

ogrisel Sep 9, 2012

Member

+1 for StandardScaler

@mblondel

View changes

sklearn/preprocessing.py
+ self.scale_ = np.max(X, axis=1) - self.min_
+ return self
+
+ def transform(self, X):

This comment has been minimized.

@mblondel

mblondel Sep 9, 2012

Member

Can you document the formula that you're applying to transform the data?

@mblondel

mblondel Sep 9, 2012

Member

Can you document the formula that you're applying to transform the data?

This comment has been minimized.

@amueller

amueller Sep 9, 2012

Member

will do

@amueller

amueller Sep 9, 2012

Member

will do

@mblondel

View changes

sklearn/preprocessing.py
+ # denominator is X.max after adding min
+ max_scale = feature_max / (feature_min + 1.)
+ if max_scale != 1:
+ X *= max_scale

This comment has been minimized.

@mblondel

mblondel Sep 9, 2012

Member

Your code is making several passes over the dataset. It would be more efficient to factor the additions / subtractions and the multiplications / divisions into one step.

@mblondel

mblondel Sep 9, 2012

Member

Your code is making several passes over the dataset. It would be more efficient to factor the additions / subtractions and the multiplications / divisions into one step.

This comment has been minimized.

@mblondel

mblondel Sep 9, 2012

Member

Also, this code doesn't seem to preserve sparsity. My intuition is that it may be necessary to treat positive sparse data as a special case (if the user-specified range doesn't require a translation of the data).

@mblondel

mblondel Sep 9, 2012

Member

Also, this code doesn't seem to preserve sparsity. My intuition is that it may be necessary to treat positive sparse data as a special case (if the user-specified range doesn't require a translation of the data).

This comment has been minimized.

@amueller

amueller Sep 9, 2012

Member

you are right, I have to redo this.

@amueller

amueller Sep 9, 2012

Member

you are right, I have to redo this.

@mblondel

This comment has been minimized.

Show comment
Hide comment
@mblondel

mblondel Sep 9, 2012

Member

I see that there's a test with a CSR matrix. So did you solve your problem with column-wise min / max?

Member

mblondel commented Sep 9, 2012

I see that there's a test with a CSR matrix. So did you solve your problem with column-wise min / max?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 9, 2012

Member

Nope, it just fails atm ;) do you have any idea?

Member

amueller commented Sep 9, 2012

Nope, it just fails atm ;) do you have any idea?

@mblondel

This comment has been minimized.

Show comment
Hide comment
@mblondel

mblondel Sep 9, 2012

Member

You might need to create your own utils in utils.sparsefuncs (supporting both CSR and CSC formats would be nice).

Member

mblondel commented Sep 9, 2012

You might need to create your own utils in utils.sparsefuncs (supporting both CSR and CSC formats would be nice).

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 12, 2012

Member

yuk maybe not doing this for the moment ;)

Member

amueller commented Sep 12, 2012

yuk maybe not doing this for the moment ;)

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 17, 2012

Member

@mblondel I think I got it in a single pass now. The attributes are not as interpretable any more but I think this is the way to go. I think I won't bother with the sparse case for the moment. wdyt?

Member

amueller commented Sep 17, 2012

@mblondel I think I got it in a single pass now. The attributes are not as interpretable any more but I think this is the way to go. I think I won't bother with the sparse case for the moment. wdyt?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 21, 2012

Member

Sorry, this should only be for the dense case. will fix.

Member

amueller commented Sep 21, 2012

Sorry, this should only be for the dense case. will fix.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 21, 2012

Member

@larsmans I saw your comment about scipy sparse in my inbox but I can't find it on github. The current PR doesn't have the code you commented any more. Not sure how you got there.

Member

amueller commented Sep 21, 2012

@larsmans I saw your comment about scipy sparse in my inbox but I can't find it on github. The current PR doesn't have the code you commented any more. Not sure how you got there.

@larsmans

This comment has been minimized.

Show comment
Hide comment
@larsmans

larsmans Sep 21, 2012

Member

2012/9/21 Andreas Mueller notifications@github.com

@larsmans https://github.com/larsmans I saw your comment about scipy
sparse in my inbox but I can't find it on github. The current PR doesn't
have the code you commented any more. Not sure how you got there.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/1131#issuecomment-8761909.

Retracted that when I saw I was commenting on an old commit; I did still
see it in the PR.

Member

larsmans commented Sep 21, 2012

2012/9/21 Andreas Mueller notifications@github.com

@larsmans https://github.com/larsmans I saw your comment about scipy
sparse in my inbox but I can't find it on github. The current PR doesn't
have the code you commented any more. Not sure how you got there.


Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/1131#issuecomment-8761909.

Retracted that when I saw I was commenting on an old commit; I did still
see it in the PR.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 22, 2012

Member

Any more comments on this one?

Member

amueller commented Sep 22, 2012

Any more comments on this one?

@ogrisel

This comment has been minimized.

Show comment
Hide comment
@ogrisel

ogrisel Sep 22, 2012

Member

LGTM: +1 for merging.

Member

ogrisel commented Sep 22, 2012

LGTM: +1 for merging.

sklearn/preprocessing.py
+ This standardization is often used as an alternative to zero mean,
+ unit variance scaling.
+ It is in particular useful for sparse positive data, as it retains the
+ sparsity structure of the data (if scaled between zero and some number).

This comment has been minimized.

@mblondel

mblondel Sep 22, 2012

Member

This comment is confusing if there's no support for sparse matrix.

@mblondel

mblondel Sep 22, 2012

Member

This comment is confusing if there's no support for sparse matrix.

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Sep 22, 2012

Member

Apart from the obvious comment that it would be very nice to have this for the sparse case, +1 for merge.

Member

GaelVaroquaux commented Sep 22, 2012

Apart from the obvious comment that it would be very nice to have this for the sparse case, +1 for merge.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 23, 2012

Member

Ok, I renamed the remaining instances of Scaler to StandardScaler. Merging now.

Member

amueller commented Sep 23, 2012

Ok, I renamed the remaining instances of Scaler to StandardScaler. Merging now.

@amueller amueller merged commit c0a0578 into scikit-learn:master Sep 23, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment