Truncating in MinMaxScaler #3342

djsutherland · 2014-07-03T22:36:31Z

The output of MinMaxScaler doesn't always lie within the passed feature range, if data that you transform() has values outside the range of the values that you fit() on. If you're using it just to make the scale of the data nicer, this probably doesn't matter, but if your algorithm actually relies on the data lying in a certain range (example) this is no good.

So, this PR adds optional support for truncation, so that values that would be transformed outside of feature_range are clipped to the ends of it. It also adds a fit_feature_range to make truncation less likely (e.g. if you need your data to lie in [0, 1], you can make your training data like in [.1, .9] and then test values have more of a range to avoid clipping).

Incidentally, I also add assert_array_{less_equal,greater,greater_equal} because my tests wanted them and it's silly that numpy only provides assert_array_less.

coveralls · 2014-07-03T22:48:40Z

Coverage increased (+0.01%) when pulling 304b84b on dougalsutherland:truncating-minmax into 82611e8 on scikit-learn:master.

"backported" from scikit-learn/scikit-learn#3342

jnothman · 2014-07-04T00:27:49Z

Ping @untom

On 3 July 2014 18:48, Coveralls notifications@github.com wrote:

[image: Coverage Status] https://coveralls.io/builds/931619

Coverage increased (+0.01%) when pulling 304b84b
304b84b
on dougalsutherland:truncating-minmax into 82611e8
82611e8
on scikit-learn:master.

—
Reply to this email directly or view it on GitHub
#3342 (comment)
.

djsutherland · 2014-07-04T03:50:40Z

Noticed a small doc error in the testing utils, so fixed that.

I should say that I'm not totally satisfied with the fit_feature_range argument and would be happy to hear another way to handle that. (A "wiggle_room" parameter that shrinks the range by some portion?)

coveralls · 2014-07-04T03:59:15Z

Coverage increased (+0.01%) when pulling 97099d9 on dougalsutherland:truncating-minmax into 82611e8 on scikit-learn:master.

untom · 2014-07-05T18:26:06Z

I think the truncation is a nice new feature, but It seems to me that fit_feature_range has a very narrow use case, so I'm not sure that parameter is worth the added complexity -- users that really need such a behaviour would probably also be better off running the data through a sigmoid transformation as apreprocessing step, instead of having a "hard" cut-off, no?

djsutherland · 2014-07-06T18:59:30Z

@untom Yeah, a sigmoid transformation might make sense, depending on the use case. I agree that fit_feature_range is probably more complex than it's worth.

jnothman · 2014-08-03T11:18:10Z

Another -1 here for fit_feature_range, but truncate might still be useful.

djsutherland · 2014-08-04T18:23:29Z

Okay, here's a new version without fit_feature_range.

coveralls · 2014-08-04T18:45:18Z

Coverage increased (+0.01%) when pulling b8fbc74 on dougalsutherland:truncating-minmax into 0a7bef6 on scikit-learn:master.

coveralls · 2014-08-04T18:55:09Z

Coverage increased (+0.01%) when pulling b8fbc74 on dougalsutherland:truncating-minmax into 0a7bef6 on scikit-learn:master.

haiatn · 2023-07-29T08:30:47Z

I like it! Is it only waiting for review?

adrinjalali · 2023-07-30T09:49:47Z

I think this needs work, but if anybody's interested, doesn't seem like too much work.

haiatn · 2023-08-26T19:49:57Z

Wait, I think this is already implemented using clip self.clip in MinMaxScaler?

adrinjalali · 2023-08-29T13:45:10Z

That's true @haiatn , and added in 0.24. Closing.

djsutherland added a commit to djsutherland/skl-groups that referenced this pull request Jul 3, 2014

add truncating MinMaxScaler

36ab4fa

"backported" from scikit-learn/scikit-learn#3342

djsutherland added 2 commits August 4, 2014 14:17

add assert_array_{less_equal,greater,greater_equal} testing utils

95afa3f

add support for truncation in MinMaxScaler

b8fbc74

djsutherland closed this Aug 4, 2014

djsutherland reopened this Aug 4, 2014

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

amueller added the Waiting for Reviewer label Oct 25, 2016

github-actions bot added module:preprocessing module:utils labels Mar 2, 2020

cmarmo added Needs Decision Requires decision and removed Waiting for Reviewer labels Oct 15, 2020

Base automatically changed from master to main January 22, 2021 10:48

adrinjalali added Stalled help wanted and removed Needs Decision Requires decision labels Jul 30, 2023

adrinjalali closed this Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Truncating in MinMaxScaler #3342

Truncating in MinMaxScaler #3342

djsutherland commented Jul 3, 2014

coveralls commented Jul 3, 2014

jnothman commented Jul 4, 2014

djsutherland commented Jul 4, 2014

coveralls commented Jul 4, 2014

untom commented Jul 5, 2014

djsutherland commented Jul 6, 2014

jnothman commented Aug 3, 2014

djsutherland commented Aug 4, 2014

coveralls commented Aug 4, 2014

coveralls commented Aug 4, 2014

haiatn commented Jul 29, 2023

adrinjalali commented Jul 30, 2023

haiatn commented Aug 26, 2023

adrinjalali commented Aug 29, 2023

Truncating in MinMaxScaler #3342

Truncating in MinMaxScaler #3342

Conversation

djsutherland commented Jul 3, 2014

coveralls commented Jul 3, 2014

jnothman commented Jul 4, 2014

djsutherland commented Jul 4, 2014

coveralls commented Jul 4, 2014

untom commented Jul 5, 2014

djsutherland commented Jul 6, 2014

jnothman commented Aug 3, 2014

djsutherland commented Aug 4, 2014

coveralls commented Aug 4, 2014

coveralls commented Aug 4, 2014

haiatn commented Jul 29, 2023

adrinjalali commented Jul 30, 2023

haiatn commented Aug 26, 2023

adrinjalali commented Aug 29, 2023