MinMax Scaler and TypeErrors #1973

Closed
amaatouq opened this Issue May 19, 2013 · 6 comments

Projects

None yet

2 participants

@amaatouq

I am trying to use the MinMax() from sklearn

my code is quite simple
from sklearn.preprocessing import MinMaxScaler
followers = np.array(df['followers_count'].astype('float'))
scaled_followers = scaler.fit(followers)

I get the following error (I experimented with multiple numpy arrays, same problem)


TypeError Traceback (most recent call last)
in ()
2 followers = np.array(df['followers_count'].astype('float'))
3 print followers
----> 4 scaled_followers = scaler.fit(followers)

/home/amaatouq/anaconda/lib/python2.7/site-packages/sklearn/preprocessing.pyc in fit(self, X, y)
195 scale_ = np.max(X, axis=0) - min_
196 # Do not scale constant features
--> 197 scale_[scale_ == 0.0] = 1.0
198 self.scale_ = (feature_range[1] - feature_range[0]) / scale_
199 self.min_ = feature_range[0] - min_ / scale_

TypeError: 'numpy.float64' object does not support item assignment

@amueller
scikit-learn member

Thanks for the report. That is odd.
What is the shape of followers? And what arguments did you give to MinMaxScaler?\
Can you reproduce with random data?

@amaatouq

Thanks for your attention. I am using scikit-learn 0.13.1

followers.shape is (1076817,)

The following code should reproduce the error:

from sklearn.preprocessing import MinMaxScaler
import numpy as np
scaler = MinMaxScaler()
followers = np.array([1,1,2,3,4])
scaled_followers = scaler.fit(followers)

Note: I tried this on Canopy (EPD) and Anaconda with the same error message

@amaatouq

Ok it is solved if you add the brackets around the dataframe.

followers = np.array([df.followers_count.astype('float')])

I am new to python and I find this odd, I thought the dataframe columns are numpy array-like

@amueller
scikit-learn member

The shape doesn't really make sense for scikit-learn data. It should be n_samples, n_features. I guess n_features is one in your case. I think it should still work, though, or at least give a decent error message.
You can also np.vstack the data to make it (n_samples, 1).

@amaatouq

I see, I am using sklearn.preprocessing for my normal data analysis needs (not only machine learning), and I am loving it, so really, thanks for your efforts, amazing library.

I know this is probably not related to scikit-learn, but I'd like to know what you think about this problem.
So I'd like to scatter plot two variables in order to visualise the correlation and least-square line fit (for publication purpose). I have 7514022 observations,

My X variable has a mean of 770.7, standard deviation of 24687.7 and min 0 and max 17587402 and my Y has the same problem of very large variance.

I tried to use the z-scores (devising by the std) and MinMax scalers with no luck with the visualisation because of the variance in my dataset. Log scale doesn't work well as I have a lot of 0 values and log-scale doesn't show linear relationships well. I'd like to know what you think

Thanks alot

@amueller
scikit-learn member

Closing.

@amueller amueller closed this Jul 18, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment