Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncating in MinMaxScaler #3342

Closed

Conversation

djsutherland
Copy link
Contributor

The output of MinMaxScaler doesn't always lie within the passed feature range, if data that you transform() has values outside the range of the values that you fit() on. If you're using it just to make the scale of the data nicer, this probably doesn't matter, but if your algorithm actually relies on the data lying in a certain range (example) this is no good.

So, this PR adds optional support for truncation, so that values that would be transformed outside of feature_range are clipped to the ends of it. It also adds a fit_feature_range to make truncation less likely (e.g. if you need your data to lie in [0, 1], you can make your training data like in [.1, .9] and then test values have more of a range to avoid clipping).

Incidentally, I also add assert_array_{less_equal,greater,greater_equal} because my tests wanted them and it's silly that numpy only provides assert_array_less.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 304b84b on dougalsutherland:truncating-minmax into 82611e8 on scikit-learn:master.

djsutherland added a commit to djsutherland/skl-groups that referenced this pull request Jul 3, 2014
@jnothman
Copy link
Member

jnothman commented Jul 4, 2014

Ping @untom

On 3 July 2014 18:48, Coveralls notifications@github.com wrote:

[image: Coverage Status] https://coveralls.io/builds/931619

Coverage increased (+0.01%) when pulling 304b84b
304b84b
on dougalsutherland:truncating-minmax
into 82611e8
82611e8
on scikit-learn:master
.


Reply to this email directly or view it on GitHub
#3342 (comment)
.

@djsutherland
Copy link
Contributor Author

Noticed a small doc error in the testing utils, so fixed that.

I should say that I'm not totally satisfied with the fit_feature_range argument and would be happy to hear another way to handle that. (A "wiggle_room" parameter that shrinks the range by some portion?)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 97099d9 on dougalsutherland:truncating-minmax into 82611e8 on scikit-learn:master.

@untom
Copy link
Contributor

untom commented Jul 5, 2014

I think the truncation is a nice new feature, but It seems to me that fit_feature_range has a very narrow use case, so I'm not sure that parameter is worth the added complexity -- users that really need such a behaviour would probably also be better off running the data through a sigmoid transformation as apreprocessing step, instead of having a "hard" cut-off, no?

@djsutherland
Copy link
Contributor Author

@untom Yeah, a sigmoid transformation might make sense, depending on the use case. I agree that fit_feature_range is probably more complex than it's worth.

@jnothman
Copy link
Member

jnothman commented Aug 3, 2014

Another -1 here for fit_feature_range, but truncate might still be useful.

@djsutherland
Copy link
Contributor Author

Okay, here's a new version without fit_feature_range.

@djsutherland djsutherland reopened this Aug 4, 2014
@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling b8fbc74 on dougalsutherland:truncating-minmax into 0a7bef6 on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling b8fbc74 on dougalsutherland:truncating-minmax into 0a7bef6 on scikit-learn:master.

@haiatn
Copy link
Contributor

haiatn commented Jul 29, 2023

I like it! Is it only waiting for review?

@adrinjalali adrinjalali added Stalled help wanted and removed Needs Decision Requires decision labels Jul 30, 2023
@adrinjalali
Copy link
Member

I think this needs work, but if anybody's interested, doesn't seem like too much work.

@haiatn
Copy link
Contributor

haiatn commented Aug 26, 2023

Wait, I think this is already implemented using clip self.clip in MinMaxScaler?

@adrinjalali
Copy link
Member

That's true @haiatn , and added in 0.24. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants