New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Truncating in MinMaxScaler #3342
Conversation
"backported" from scikit-learn/scikit-learn#3342
Ping @untom On 3 July 2014 18:48, Coveralls notifications@github.com wrote:
|
Noticed a small doc error in the testing utils, so fixed that. I should say that I'm not totally satisfied with the fit_feature_range argument and would be happy to hear another way to handle that. (A "wiggle_room" parameter that shrinks the range by some portion?) |
I think the truncation is a nice new feature, but It seems to me that |
@untom Yeah, a sigmoid transformation might make sense, depending on the use case. I agree that |
Another -1 here for |
Okay, here's a new version without |
I like it! Is it only waiting for review? |
I think this needs work, but if anybody's interested, doesn't seem like too much work. |
Wait, I think this is already implemented using clip |
That's true @haiatn , and added in 0.24. Closing. |
The output of MinMaxScaler doesn't always lie within the passed feature range, if data that you transform() has values outside the range of the values that you fit() on. If you're using it just to make the scale of the data nicer, this probably doesn't matter, but if your algorithm actually relies on the data lying in a certain range (example) this is no good.
So, this PR adds optional support for truncation, so that values that would be transformed outside of feature_range are clipped to the ends of it. It also adds a fit_feature_range to make truncation less likely (e.g. if you need your data to lie in [0, 1], you can make your training data like in [.1, .9] and then test values have more of a range to avoid clipping).
Incidentally, I also add assert_array_{less_equal,greater,greater_equal} because my tests wanted them and it's silly that numpy only provides assert_array_less.