# MRG add MinMaxScaler, #1111 #1131

Merged
merged 0 commits into from Sep 23, 2012

## Conversation

Projects
None yet
5 participants
Member

### amueller commented Sep 7, 2012

 This adds a MinMax Scaler as described in #1111. Unfortunately, I couldn't figure out how to get column-wise max and min in sparse matrices. Any hints? Thanks, Andy

### mblondel reviewed Sep 9, 2012

sklearn/preprocessing.py
 + return X + + +class UnitVarianceScaler(BaseEstimator, TransformerMixin):

#### mblondel Sep 9, 2012

Member

But it's not only scaling the variance, it's also centering. `ZvalueScaler` or `StandardScaler` maybe?

http://en.wikipedia.org/wiki/Standard_score

#### ogrisel Sep 9, 2012

Member

+1 for `StandardScaler`

### mblondel reviewed Sep 9, 2012

sklearn/preprocessing.py
 + self.scale_ = np.max(X, axis=1) - self.min_ + return self + + def transform(self, X):

#### mblondel Sep 9, 2012

Member

Can you document the formula that you're applying to transform the data?

Member

will do

### mblondel reviewed Sep 9, 2012

sklearn/preprocessing.py
 + # denominator is X.max after adding min + max_scale = feature_max / (feature_min + 1.) + if max_scale != 1: + X *= max_scale

#### mblondel Sep 9, 2012

Member

Your code is making several passes over the dataset. It would be more efficient to factor the additions / subtractions and the multiplications / divisions into one step.

#### mblondel Sep 9, 2012

Member

Also, this code doesn't seem to preserve sparsity. My intuition is that it may be necessary to treat positive sparse data as a special case (if the user-specified range doesn't require a translation of the data).

#### amueller Sep 9, 2012

Member

you are right, I have to redo this.

Member

### mblondel commented Sep 9, 2012

 I see that there's a test with a CSR matrix. So did you solve your problem with column-wise min / max?
Member

### amueller commented Sep 9, 2012

 Nope, it just fails atm ;) do you have any idea?
Member

### mblondel commented Sep 9, 2012

 You might need to create your own utils in `utils.sparsefuncs` (supporting both CSR and CSC formats would be nice).
Member

### amueller commented Sep 12, 2012

 yuk maybe not doing this for the moment ;)
Member

### amueller commented Sep 17, 2012

 @mblondel I think I got it in a single pass now. The attributes are not as interpretable any more but I think this is the way to go. I think I won't bother with the sparse case for the moment. wdyt?
Member

### amueller commented Sep 21, 2012

 Sorry, this should only be for the dense case. will fix.
Member

### amueller commented Sep 21, 2012

 @larsmans I saw your comment about scipy sparse in my inbox but I can't find it on github. The current PR doesn't have the code you commented any more. Not sure how you got there.
Member

### larsmans commented Sep 21, 2012

 2012/9/21 Andreas Mueller notifications@github.com @larsmans https://github.com/larsmans I saw your comment about scipy sparse in my inbox but I can't find it on github. The current PR doesn't have the code you commented any more. Not sure how you got there. — Reply to this email directly or view it on GitHubhttps://github.com/scikit-learn/scikit-learn/pull/1131#issuecomment-8761909. Retracted that when I saw I was commenting on an old commit; I did still see it in the PR.
Member

### amueller commented Sep 22, 2012

 Any more comments on this one?
Member

### ogrisel commented Sep 22, 2012

 LGTM: +1 for merging.

### mblondel reviewed Sep 22, 2012

 + This standardization is often used as an alternative to zero mean, + unit variance scaling. + It is in particular useful for sparse positive data, as it retains the + sparsity structure of the data (if scaled between zero and some number).

#### mblondel Sep 22, 2012

Member

This comment is confusing if there's no support for sparse matrix.

Member

### GaelVaroquaux commented Sep 22, 2012

 Apart from the obvious comment that it would be very nice to have this for the sparse case, +1 for merge.
Member

### amueller commented Sep 23, 2012

 Ok, I renamed the remaining instances of Scaler to StandardScaler. Merging now.